summaryrefslogtreecommitdiff
path: root/pdf/pdf_check.c
Commit message (Collapse)AuthorAgeFilesLines
* GhostPDF - fix reported embedding status of fontsKen Sharp2023-05-031-1/+1
| | | | | | | | | When using -dPDFINFO to get information about PDF files one of the pieces of information is the embedding status of any fonts used on each page. Unfortunately I messed up when coding and any non-error status for a font reported it as being embedded. Fix that here.
* Update postal address in file headersChris Liddell2023-04-041-2/+2
|
* Fix compiler warnings in pdf/Chris Liddell2023-01-041-2/+2
|
* Coverity ID 302418 - initialise a variable.Ken Sharp2022-12-171-1/+1
|
* GhostPDF - Be prepared for fonts to be fonts, not dictionariesKen Sharp2022-12-161-17/+20
| | | | | | | | | | | | | | | | | | | | | | | | | tests_private/pdf/sumatra/x_-_maybe_crashes_under_XP.pdf When we turn a PDF font dictionary into a PDF font we replace the cached object in the cache. This means that code retrieving the font after that can get back a PDF font object not a PDF dictionary. The code for checking fonts for transparency wasn't expecting that. In addition a logic flaw in the loop meant that any non-dictionary objects we encountered were left on the 'loop detection' stack. Taken together these were causing spurious circular reference errors, while at the same time not properly checking fonts for transparency (if they had previously been dereferenced and cached). I've reworked the code so that it handled PDF font objects if they are retrieved from the Resource dictioanry, and properly checks them, correctly removes objects (including objects of the wrong type) from the loop detection stack, and flags a warning if a Font Resource is of the wrong type (neither dictionary nor font). A Font of the wrong type in the Resources dictionary is not necessarily an error, the Page may never actually use the named font resource.
* GhostPDF - detect circular references when checking Resources for transparencyKen Sharp2022-12-151-0/+5
| | | | | | | | | | | | | | | | | | | | | My test file "Pages from Saddle_reduced.pdf" The file has a circular reference in objects 1663 and 1664: 1663 0 obj<</XObject<</I0 1664 0 R>>/ProcSet[/PDF/ImageB/ImageC/ImageI/Text]>>endobj 1664 0 obj<</Length 0/Type/XObject/BBox[0 0 684 864]/Resources 1663 0 R/Subtype/Form/FormType 1/Matrix[1 0 0 1 0 0]>> When retrieving objects from a dictionary we mark/clear the loop detection around any required dereference, which means we don't detect circular references. This seems wrong to me, but when I tried to change it lots of files started generating errors..... The whole circular reference mechanism is somewhat flaky and probably needs a rethink. In the meantime we can detect the error by manually adding the Resources dictionary to the loop detection code before we recursively check the dictionary.
* GhostPDF - report bad object types when checking for transparency/spotsKen Sharp2022-12-121-0/+12
| | | | | | | Specimen file Bug690364.pdf The old PS-based interpreter used to raise an error when XObjects had the wrong type. We now do this too.
* GhostPDF - Fix crash with -dPDFINFOKen Sharp2022-09-221-1/+1
| | | | | | | | | | | | | Stems from bug #705897 (which is not a GS bug report). The file for that report has a Resource with a Font which points to a non-Font dictionary. The font resource checking was incorrectly using a return value of >= 0 from pdfi_dict_knownget_type() to indicate that a Subtype key is present in the dictionary, the function actually returns > 0 if it is present and 0 if it is not. Which caused the code to try and use the non-existent key.
* GhostPDF - revise PageLabel handlingKen Sharp2022-08-251-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | This arises from bug #705783 although it does not address the bug directly. We should not be processing PageLabels for devices that do not support writing PageLabel output (basically only pdfwrite). But we had not picked up the 'WantsPageLabels' parameter from the old PostScript implementation. So add that to the device_state structure, and add code to get it from the device. After doing that, it transpired that we did not call pdfi_device_set_flags() until after we had processed the trailer dict. So add a call nice and early on, in ghostpdf.c, pdfi_init_file(). It then turns out there were several other places where the function was called, which almost certainly shouldn't be done. It seems likely that we've moved the call to the function earlier and earlier in the order of execution as we've discovered we need it, but we really only (I think) should call it once. So remove all the other calls.
* OSS_fuzz #48564Ken Sharp2022-07-011-1/+1
| | | | | | | | | | | Yet more fallout from the change to objects not always being pointers to structures. using object->type doesn't work if object isn't a pointer to a struct, we need to use pdfi_type_of instead. Since this keeps happening, audit all occurrences of ->type and .type in the pdfi code and make sure we use the pdfi_type_of() function where needed. A couple of these are in code not compiled in due to #if 0 pre-processor directives, but fix them anyway.
* oss-fuzz 47490: Check type before trying to enumerate dictionaryChris Liddell2022-05-191-0/+43
| | | | | | when pre-scanning ExtGstate objects Also, add the same type checking to all the pdfi_check_* functions.
* GhostPDF - Fix Coverity warningsKen Sharp2022-05-111-0/+4
| | | | | | | A couple of cases forgetting to check return codes pdf_page.c - CID 378406 pdf_check.c - CID 378407
* GhostPDF - revamp PDF information extractionKen Sharp2022-05-101-18/+231
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A customer requested that we make pdf_info.ps work with the new PDF interpreter, and generate the same information. This commit modifies the way we extract information on a page-by-page basis to potentially include the names of spot inks and information about fonts used on the page. This is now returned to the PostScript environment using a PDF dictionary instead of a C structure. The pdf_info.ps program has been updated so that it use the new information in broadly the same way as the information from the old PDF interpreter. There are differences; pdf_info.ps extracts font information itself, rather than having the interpreter do it. This is not possible with the new interpreter which is why we have the PDF interpreter do it for us. In addition the pdf_info.ps program only descended to the page level whereas the new PDF interpreter evaluates all objects on the page, potentially meaning that more fonts (and technically spot inks) might be detected. We now have an additional PostScript operator '.PDFPageInfoExt' which returns 'extended' information about a page. This is the same as .PDFPageInfo but includes the font and spot ink information. Running with -dPDFINFO using either Ghostscript or GhostPDF will print more information than before, including the spot inks and considerably more information about fonts than the pdf_info.ps program emits, including embedding status, descendant fonts (and their membedding status) and the presence of ToUnicode CMaps. Updated documentation for all of the above.
* PDFI: Use TRUE/FALSE/NULL objects as 'fast' objects.Robin Watts2022-05-051-3/+5
| | | | No allocation/deallocation required.
* pdfi: Introduce pdfi_type_of.Robin Watts2022-05-051-19/+23
| | | | | All code now reads pdf obj types using pdfi_type_of rather than directly accessing obj->type. This will help us in the next optimisation step.
* oss-fuzz 46672: Avoid PS stack extensions from pdfi errorChris Liddell2022-04-141-1/+1
| | | | | | | | | | | | | | pdfi was using the standard gs_error_stackoverflow error code when the pdfi operand stack overflowed. Returning that to the Postscript interpreter caused the interpreter to attempt to extend the Postscript op stack with a new block with zero requested new elements. This, in turn, caused the garbage collector to traverse the previous op stack block, and find no longer valid objects. Leading to trying to mark objects freed by a restore. The solution is to add a specific gs_error_pdf_stackoverflow, so we can still signal the appropriate error, but avoid confusing the Postscript interpreter.
* Eliminate pdf_overprint_control_t in favour of gs_overprint_control_tChris Liddell2022-03-031-1/+1
| | | | Fixes compiler warning comparing different enum types.
* pdfi: High level devices and overprintMichael Vrhel2022-02-151-0/+4
| | | | | High level devices should not render overprinting. This is the way the old PDF interpreter behaves.
* pdfi overprint: -dOverprint setting is a device paramMichael Vrhel2022-02-111-1/+2
| | | | | | | | The setting of -dOverprint=/simulate /enable /disable is handled as a device parameter and not a command line option. Make the changes so that gpdf and gs with -dNEWPDF=true provide the proper rendering. Tested with RGB, CMYK, and separation devices for each of the three settings.
* OSS Fuzz 42916.pdf: Fix cleanup SEGV found when memory squeezing.Robin Watts2022-01-251-0/+2
| | | | Failing to check a return code.
* oss-fuzz 43615: Keep a reference to the current pdfi fontChris Liddell2022-01-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | in the pdfi graphics state. Previously, we relied on pdfi_gsave/pdfi_grestore to keep the reference count correct for pdfi font from which the current gs_font in the graphics state is derived. This was, at best, a compromised approach, since it meant the lifespan of the font object was not directly tied to the graphics state which referenced it. We opted for this because, at the time, we wanted to avoid the upheaval of implementing a pdfi specific graphics state. That approach also couldn't account for graphics state copies created and destroyed by means other than gs_gsave/gs_grestore - such as saving the graphics state for subsequent use when evaluating an SMask group. Subsequently, other requirements made it clear a pdfi specific graphics state was absolutely required. As such, it makes sense to store a reference to the current (pdfi) font in the pdfi graphics state and, since the pdfi graphics state lifespan is tied to the gs_gstate lifespan, thus we can now connect the font objects' reference count to the graphics state(s) that refernce them.
* GhostPDF - Implement -dNOTRANSPARENCYKen Sharp2022-01-111-1/+2
| | | | | | | | | | | | This is slightly different (I think) to the old scheme. It turns out that the transparency implementation is contingent on the result of the checking done on each page. The code check ctx->page.has_transparency to see if it should implement transparency. Rather than add an additional test in all the places where this is checked it is simpler to test the setting of NOTRANSPARENCY in the per-page spot colour/transparency checking code and not set the page 'has_transparency' flag if -dNOTRANSPARENCY is set.
* GhostPDF - Fix psdcmyk device when page has too many spotsKen Sharp2021-11-111-7/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is realted to the commit which sends SeparationColorNames to the device using put_params(). The reason we pass SpotColorNames to the device is to cope with Zoltan's file which sets a halftone with a spot colour, then draws content in that spot colour. Because we have not added the spot colour at the time the halftone is set, we discard the component. Later when we add the spot colour due to actually using it, we use the default halftone, not the correct one. By setting the spot colour up front we avoid this problem. However, perf-testing/pdf/J12_Acrobat.pdf has too many spots (it has 16) and when we try to set that spot array we get an error. Ghostscript's zputparams() operator simply ignores the errors (!!) while pdfi passes them back, which causes it to essentially abort the page content. But, if we send multiple parameters to the device, and one causes an error, then it is possible that the remaining parameters are not processed. So ignoring the error is not, by itself, sufficient. This commit breaks out the SpotColorNames processing separately and sends it separately too. If that fails we just ignore it. Otherwise we properly handle the device being closed and reopen it in all cases. This solves the problem with J12_Acrobat for me. I have a suspicion that pdfi is still not properly handling psdcmyk with spot files though.
* GhostPDF - fix compiler warningKen Sharp2021-10-271-0/+1
| | | | | | The compiler was warning that gs_setdevice_no_erase() had no prototype. The warning was in pdf_device.c but it seems to be missing in pdf_check.c as well so fix it there too.
* GhostPDF - generate SeparationColorNames and send to deviceKen Sharp2021-10-211-3/+46
| | | | | | | | | | | | | | | | | | | | Bug #704546 " Bug 703898 requires changes to the NEWPDF (pdfi) interpreter for SeparationColorNames" The old PostScript-based PDF interpreter was modified to create an array of names containing each Separation colour found when scanning the PDF file Resources and send that array to the device. This commit adds the same functionality to the new C-based PDF interpreter. Unfortunately there are no instructions in bug #704546, or the bug for the PostScript-based interpreter, bug #703898, on how to actually test this. This causes a few differences with the psdcmyk device, mostly due to creating extra spot colours I believe. I'm going to assume for now that this works (I tried a file with 6 spot colours and it rendered correctly to tiffsep) and if there is a problem, it'll show up later.
* GhostPDF - fix a typo in a commentKen Sharp2021-08-231-1/+1
|
* Add header include to fix missing prototype error/warningChris Liddell2021-08-191-0/+1
|
* Fix pdfi+GS with spot capable devicesKen Sharp2021-08-161-0/+6
| | | | | | | | | | | | | | | | | | | | This is a chain of problems. Firstly we were unconditionally setting the 'init_graphics' parameter to pdfi_check_page() which caused it to close and reopen the device. This throws away the Media Size, CTM and any cropping/rotation etc which we set up in the PostScript world. Resolving that then showed that we were not resetting the number of spots on each page. Resolving that then showed up a failure to determine that a device was spot colour capable, because we were doing the pdfi_check_page() call to retrieve the per-page transparency and spot information when the PDF graphics state wasn't pointing to the PostScript graphics state, which meant we were using the nulldevice. This commit fixes all of the above problems. The various spot-capable devices now render their content to the correct size/orientation/clip and get the correct number of spots.
* Coverity ID 372312Ken Sharp2021-08-141-5/+1
| | | | | | | | Coverity noticed that 'uses_transparency' was constant in this function, this meant that the gs_abort_pdf14trans_device() call was unreachable. Since it was never being called I've chosen to just remove the code and the boolean.
* GhostPDF - Tidy up headers in ghostpdf.hKen Sharp2021-08-131-0/+1
| | | | | | | | | | | | | This file was originally copied from the XPS interpreter verbatim, and included a load of .h files which were then included needlessly in all of the C files. This removes almost all the .h files from ghostpdf.h; we keep gserrors.h since so many C files do actually use it to report errors, and gxgstate.h which is used in ghostpdf.h The various C and H files have been updated to pull in the include files they actually need.
* Commit pdfi to master.Robin Watts2021-08-121-0/+1186
This is a commit of the pdfi branch to master, eliminating the traditional merge step. The full history of the pdfi branch can be seen in the repo, and that branch is effectively frozen from this point onwards. This commit actually differs from pdfi in a small number of whitespace changes (trailing spaces etc).