pdfwrite - ToUnicode revamp

Bug 695461 " Why does the search results changes after pdf optimzation in ghostscript?" The existing ToUnicode functions were written against the original ToUnicode specification, and assume that there will be no more than 2 unsigned shorts returned as the Unicode code point for any given glyph. Sadly Adobe have revised the ToUnicode CMap making it the same as a regular CMap, and then extending it still further. It is now possible for a single glyph to map to a string of up to 512 bytes. This commit revises the existing C 'decode_glyph's so that instead of returning a gs_char for the Unicode code point, we return a string of bytes. If the caller initially says that the string it is passing is zero bytes, then we do not copy the bytes, we just return the required size of the string (in bytes). A return value of 0 from a decode_glyph function means that the glyph was not in the map and so could not be 'decoded'. As a consequence of this change, and to further permit more than 2 unsigned shorts for ToUnicode CMaps, the CMap lookup enumerator now needs to be able to allocate memory, so the 'next_lookup' methods all now take a gs_memory_t pointer to make ths possible. The ToUnicode cmap table also has to change. Formerly it was a simple 4 bytes per code, either 255 or 65535 codes array. For simplicity I've chosen to keep it as a large continuous array, but each entry is now the number of bytes required to store the longest defined Unicode value for that font, plus 2 bytes. The 2 bytes give the length of the reserved space actually used by each Unicode code point. The bytes are stored immediately following the length (so a 2 byte length Pascal string if you like). This may possibly cause ToUnicode maps to use a lot of memory, in the short term we'll live with it because these only exist with pdfwrite, and that only really is expected to run on decent sized platforms. May need to do something better in future.
author: Ken Sharp <ken.sharp@artifex.com> 2016-05-25 16:51:20 +0100
committer: Ken Sharp <ken.sharp@artifex.com> 2016-05-26 09:23:34 +0100
commit: 9dba57f0f9a53c130ec2771c0ed1d7bd6bbef6ab (patch)
tree: 76ed1bc0ba4a053f93bfaae981d8883d130c036a /base/gsfont.c
parent: a60087bafbaabf7052e65eb7f08548ae596d91d4 (diff)
download: ghostpdl-9dba57f0f9a53c130ec2771c0ed1d7bd6bbef6ab.tar.gz
1 files changed, 2 insertions, 2 deletions
diff --git a/base/gsfont.c b/base/gsfont.c
index d9e0958b1..cf674684c 100644
--- a/base/gsfont.c
+++ b/base/gsfont.c
@@ -981,8 +981,8 @@ gs_no_encode_char(gs_font *pfont, gs_char chr, gs_glyph_space_t glyph_space)
 }
 
 /* Dummy glyph decoding procedure */
-gs_char
-gs_no_decode_glyph(gs_font *pfont, gs_glyph glyph, int ch)
+int
+gs_no_decode_glyph(gs_font *pfont, gs_glyph glyph, int ch, ushort *unicode_return, unsigned int length)
 {
     return GS_NO_CHAR;
 }
author	Ken Sharp <ken.sharp@artifex.com>	2016-05-25 16:51:20 +0100
committer	Ken Sharp <ken.sharp@artifex.com>	2016-05-26 09:23:34 +0100
commit	9dba57f0f9a53c130ec2771c0ed1d7bd6bbef6ab (patch)
tree	76ed1bc0ba4a053f93bfaae981d8883d130c036a /base/gsfont.c
parent	a60087bafbaabf7052e65eb7f08548ae596d91d4 (diff)
download	ghostpdl-9dba57f0f9a53c130ec2771c0ed1d7bd6bbef6ab.tar.gz