Old PDF interpreter - improved CMap handling

The file supplied for this came from a customer who asked that we delete the file after working on it, so there is no bug report and no reproducer for this. The problem is that the PDF file has a font using a CMap, and that CMap uses /UseCMap (and the usecmap operator) to read a different (Horizontal) CMap and then modify it with Vertical glyph positions. The CMap does not have a begincodespacerange, it simply inherits the ranges from the child CMap. This causes the code added to work around Bug 690737 to read off the end of the CMap in read_CMap_stream. Since there is nothing left to process, this causes errors in the CMap processing. Following a suggestion from Chris this commit first attempts the same 'discard up to the begincodespacerange) hackery, then checks to see if the stream has any bytes left. If it does we proceed as before. If there are no bytes left, then we have discarded all of the content. So we rewind the stream to the point we were at before we tried to discard the header, impose a different SubFileDecode, looking this time for a 'begincmap' and then attempt to process the CMap as normal.
author: Ken Sharp <ken.sharp@artifex.com> 2022-03-02 09:59:47 +0000
committer: Ken Sharp <ken.sharp@artifex.com> 2022-03-02 14:50:31 +0000
commit: 6702b327c89ba10e1fce475f372c2fbbcd3f3210 (patch)
tree: 6e9c99cd31cd4fba1bb6573ed524cf266168b052 /Resource
parent: 670c12a7b65ade7f8d60388b143083f5b273c6de (diff)
download: ghostpdl-6702b327c89ba10e1fce475f372c2fbbcd3f3210.tar.gz
1 files changed, 24 insertions, 3 deletions
diff --git a/Resource/Init/pdf_font.ps b/Resource/Init/pdf_font.ps
index b42cd4cfb..1be2871b1 100644
--- a/Resource/Init/pdf_font.ps
+++ b/Resource/Init/pdf_font.ps
@@ -1,4 +1,4 @@
-% Copyright (C) 2001-2021 Artifex Software, Inc.
+% Copyright (C) 2001-2022 Artifex Software, Inc.
 % All Rights Reserved.
 %
 % This software is provided AS-IS with no warranty, either express or
@@ -1622,7 +1622,6 @@ currentdict end readonly def
 % Following Acrobat we ignore everything outside
 % begincodespacerange .. endcmap.
 /read_CMap_stream {  % <info> <wmode> <name> <stream> read_CMap <CMap>
-  dup 0 (begincodespacerange) /SubFileDecode filter flushfile
   //CMap_read_dict begin
   /CIDInit /ProcSet findresource begin
   12 dict begin
@@ -1641,7 +1640,29 @@ currentdict end readonly def
   dup //null eq { pop } { /CIDSystemInfo exch def } ifelse
   /CMapType 1 def
   /.last_CMap_def currentdict def % establish binding
-  mark exch % emulate 'begincodespacerange'
+
+  % The stream may not be seekable, push a ReusableStream to make it seekable.
+  /ReusableStreamDecode filter
+
+  % Try to skip past the 'header' portion of the PDF file to the body
+  % If the CMap doesn't have a begincodespacerange (eg it uses /UseCMap)
+  % then this will leave us at EOF.
+  dup 0 (begincodespacerange) /SubFileDecode filter flushfile
+  % See if we have anything left in the file
+  dup bytesavailable 0 eq
+  {
+    % We discarded all the file contents, so there was no begincodespacerange
+    % Rewind the file to the beginning
+    dup 0 setfileposition
+    % And try again, this time only discarding to the 'begincmap'
+    dup 0 (begincmap) /SubFileDecode filter flushfile
+  }
+  {
+    % Everything worked as expected, because we have consumed the
+    % begincodespacerange from the file, emulate it
+    mark exch
+  } ifelse
+
   0 (endcmap) /SubFileDecode filter cvx /begincmap cvx exch 2 .execn
   currentdict /o.endmapvalue undef
   endcmap
author	Ken Sharp <ken.sharp@artifex.com>	2022-03-02 09:59:47 +0000
committer	Ken Sharp <ken.sharp@artifex.com>	2022-03-02 14:50:31 +0000
commit	6702b327c89ba10e1fce475f372c2fbbcd3f3210 (patch)
tree	6e9c99cd31cd4fba1bb6573ed524cf266168b052 /Resource
parent	670c12a7b65ade7f8d60388b143083f5b273c6de (diff)
download	ghostpdl-6702b327c89ba10e1fce475f372c2fbbcd3f3210.tar.gz