diff options
author | Ken Sharp <ken.sharp@artifex.com> | 2016-05-25 16:51:20 +0100 |
---|---|---|
committer | Ken Sharp <ken.sharp@artifex.com> | 2016-05-26 09:23:34 +0100 |
commit | 9dba57f0f9a53c130ec2771c0ed1d7bd6bbef6ab (patch) | |
tree | 76ed1bc0ba4a053f93bfaae981d8883d130c036a /base/gsfont.c | |
parent | a60087bafbaabf7052e65eb7f08548ae596d91d4 (diff) | |
download | ghostpdl-9dba57f0f9a53c130ec2771c0ed1d7bd6bbef6ab.tar.gz |
pdfwrite - ToUnicode revamp
Bug 695461 " Why does the search results changes after pdf optimzation in ghostscript?"
The existing ToUnicode functions were written against the original
ToUnicode specification, and assume that there will be no more than
2 unsigned shorts returned as the Unicode code point for any given
glyph. Sadly Adobe have revised the ToUnicode CMap making it the same
as a regular CMap, and then extending it still further.
It is now possible for a single glyph to map to a string of up to
512 bytes.
This commit revises the existing C 'decode_glyph's so that instead of
returning a gs_char for the Unicode code point, we return a string of
bytes. If the caller initially says that the string it is passing is
zero bytes, then we do not copy the bytes, we just return the required
size of the string (in bytes).
A return value of 0 from a decode_glyph function means that the glyph
was not in the map and so could not be 'decoded'.
As a consequence of this change, and to further permit more than
2 unsigned shorts for ToUnicode CMaps, the CMap lookup enumerator
now needs to be able to allocate memory, so the 'next_lookup'
methods all now take a gs_memory_t pointer to make ths possible.
The ToUnicode cmap table also has to change. Formerly it was a simple
4 bytes per code, either 255 or 65535 codes array. For simplicity
I've chosen to keep it as a large continuous array, but each entry
is now the number of bytes required to store the longest defined
Unicode value for that font, plus 2 bytes. The 2 bytes give the length
of the reserved space actually used by each Unicode code point. The
bytes are stored immediately following the length (so a 2 byte length
Pascal string if you like). This may possibly cause ToUnicode maps
to use a lot of memory, in the short term we'll live with it because
these only exist with pdfwrite, and that only really is expected
to run on decent sized platforms. May need to do something better in
future.
Diffstat (limited to 'base/gsfont.c')
-rw-r--r-- | base/gsfont.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/base/gsfont.c b/base/gsfont.c index d9e0958b1..cf674684c 100644 --- a/base/gsfont.c +++ b/base/gsfont.c @@ -981,8 +981,8 @@ gs_no_encode_char(gs_font *pfont, gs_char chr, gs_glyph_space_t glyph_space) } /* Dummy glyph decoding procedure */ -gs_char -gs_no_decode_glyph(gs_font *pfont, gs_glyph glyph, int ch) +int +gs_no_decode_glyph(gs_font *pfont, gs_glyph glyph, int ch, ushort *unicode_return, unsigned int length) { return GS_NO_CHAR; } |