summaryrefslogtreecommitdiff
path: root/pdf/pdf_errors.h
Commit message (Collapse)AuthorAgeFilesLines
* Update postal address in file headersChris Liddell2023-04-041-3/+3
|
* GhostPDF - reword some more errorsKen Sharp2022-12-211-2/+2
| | | | Make it clearer that 'repair' means 'rebuilt the xref'
* GhostPDF - warn if BBox missing in a Form XObjectKen Sharp2022-12-201-0/+1
| | | | | | | "ccs digital-1-unc.pdf" The file has a number of Form XObjects which are missing the required /BBox. The old interpreter used to warn about that, so we do too now.
* GhostPDF log error when ICCBased /N value does not match profileKen Sharp2022-12-161-0/+1
| | | | | | | | Bug691941.pdf When this happens we do the same as the old interpreter and use the number of components from the ICC profile, but we were doing it silently. Log an error for reporting.
* GhostPDF - Log an error for fonts with bad Subtype valuesKen Sharp2022-12-161-0/+1
| | | | | | | | Bug691872.pdf The fonts have a /Subtype of '/' which is of course illegal. Doesn't matter for pdfi which doesn't trust the subtype anyway but the old interpreter raised a warning.
* GhostPDF - do not abort on ExtGState errors (unless PDFSTOPONERROR)Ken Sharp2022-12-151-0/+3
| | | | | | | | | | My test file PATTYP1-red.pdf but also tests_private/pdf/PDFIA1.7_SUBSET/IA3Z0970.pdf If an ExtGState contains a BG or UCR key and the value is a name rather than a function, then instead of exiting the ExtGState with an error log the error as normal and carry on. Similarly if a TR has a value which is a name other than /Identity.
* GhostPDF - report bad object types when checking for transparency/spotsKen Sharp2022-12-121-0/+2
| | | | | | | Specimen file Bug690364.pdf The old PS-based interpreter used to raise an error when XObjects had the wrong type. We now do this too.
* GhostPDF - improve warnings when dealing with XObjectsKen Sharp2022-12-081-0/+2
| | | | | | | | | | | | | | | Test file Bug688485.pdf If a SMask is missing the Group (/G) raise an error rather than a dmprintf. Same for a missing or unrecognised Subtype (/S) And for XObjects with the /PS (PostScript) subtype. Errors and warnings are preferred to printf because we only report them once at the end rather than on each occurrence which can be annoyingly verbose.
* GhostPDF - Detect errors in XRefStm value in Hybrid filesKen Sharp2022-12-081-0/+1
| | | | | | | | If a hybrid file (one with an xref and an XRefStm) XRefStm value in the trailer dictionary did not point to a valid cross reference stream dictionary we were silently ignoring the error. Report the error at end of job.
* GhostPDF - improve error reporting with annotationsKen Sharp2022-12-081-1/+2
| | | | | | | | | A typo in pdfi_annot_draw_Link meant we were swallowing an error. Fix that. In addition; report errors when handling annotations in a debug build and store them for end of job reporting. Add a new error for the end of job report.
* GhostPDF - Apply optional content dictionaries in XObjects.Ken Sharp2022-12-051-1/+3
| | | | | | | | | | | | | | | | | | | | | | This started off from the pdfa.org 'safedocs' file utf16LE-test.pdf The file has two XObjects, both of which have optional content dictionaries which are initally 'OFF'. We were not applying the OC dictionaries for form or image XObjects which meant that we were drawing the XObjects when we should not. After that, two test files showed differences; /tests_private/comparefiles/BFAX35P5.pdf /tests_private/pdf/sumatra/1348_-_support_Additional_Actions.pdf I had noticed BFAX35P5 during the deveopment of pdfi but because Acrobat draws the gray watermark I had assumed this was a progression when in fact it was a regression. We should not draw the watermark when we are being a printer (Printed=true). The same is true for the Sumatra file. Added a couple of new error types, and fixed a spelling error in an existing one.
* GhostPDF - permit the Page dictionary to actually be a streamKen Sharp2022-12-021-0/+1
| | | | | | | | | | | | | | We differentiate between streams and dictionaries which some consumers apparently do not do. When looking at the pdfa.org 'safedocs' collection the file Dialect-DictIsStream.pdf threw an error. I don't plan to change all the places that we differentiate between streams and dictionaries (there are more than 200) unless it should become apparent that Acrobat does so, but I'll do this one. Should we later choose to do more cases then this makes a useful example of how we could do so. We now process the dictionary and log an error on exit.
* GhostPDF - warn when a stream length is incorrect.Ken Sharp2022-12-021-1/+2
| | | | | | | The old PDF interpreter specifically mentioned this, our code predated the better error handling and so only logged a warning in debug builds. Add a specific error so we can report on it at the end.
* GhostPDF - imporove an error messageKen Sharp2022-11-301-1/+1
| | | | | | | | | | We could get silly sequences where we would report 'PDF file was repaired' followed by 'couldn't repair PDF file' which doesn't make sense. Change the wording of the message to indicate that its because we attempted a second repair of the file (ie something was still wrong after our repair attempt).
* GhostPDF fix the spelling in an error message....Ken Sharp2022-11-251-1/+1
|
* GhostPDF - change wording of an errorKen Sharp2022-11-251-1/+1
| | | | | | | | I found the wording of this message confusing, we aren't counting down a reference at all, we are trying to dereference an indirect reference and the xref table has the object marked as free. Changed the wording to make this more obvious.
* GhostPDF - minor change with broken named destinationsKen Sharp2022-11-181-0/+1
| | | | | | | | | | | | | | | | | Follow-on from commit1042469a5225ff063c465e61dbd3ebb50c770006, the customer didn't like the fact that we were dropping the Link annotations from the document. This happened because the value associated with the /Dest key was neither a dictionary nor an array (as it is specified to be), but was a null object. Which caused an error, so we dropped the annotation. Previously (old interpreter and prior versions of pdfi) we retained the Link annotation and dropped the Dest, which still makes it invalid of course.... This commit simply restores that behaviour, while noting the error for reporting at the end.
* GhostPDF - check for self-referencing named ColorSpacesKen Sharp2022-11-161-0/+1
| | | | | | | | | | | | | | | | | | | | OSS-fuzz 53393 The fuzzed file corrupts a named ColorSpace in the Resources dictionary of the page. The name should be /DefaultRGB with an indirect reference to the space, what we actually get is : /ColorSpace<</De////////faultRGB 3 0 R>> The odd characters are 0x0F bytes. So that defines a named ColorSpace /De as being the name <0x2f0x0F> and then defines the named space /<0x2F0x0F> as being /<0x2F0x0F>, ie it is self-referencing. We detect that for indirect objects, when dereferencing them, but there's no convenient central place to do that for self-referencing names, we just have to watch out for it when trying to get the object. Added a new error to cater for this case.
* GhostPDF - Treat an invalid /Type for Catalog as if it were missingKen Sharp2022-10-291-0/+2
| | | | | | | | | | | | | | | | | | Bug #706038 " Can't read some PDF files with the new PDF interpreter: "Couldn't initialise file" / object lacks an endobj" The PDF file her is invalid; the Catalog dictionary has a /Type of '/Calalog' instead of '/Catalog'. The Type is a required entry and must be /Catalog to be valid. However we already treat a missing /Type as an error but continue if the purported Catalog dictionary contains a /Pages key (the only other required key). So do the same here; if the trailer dictionary points us to a dictionary which has a /Type which is not /Catalog, but it does have a /Pages key whose value is a dictionary, then treat it as a valid Catalog. We do, of course, issue an error message at the end of processing the file.
* GhostPDF - detect integer overflow when parsing numbersKen Sharp2022-10-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This stems from OSS-fuzz 52535 Part of the problem is that an image in the PDF file has a /Height which is -2103029590437764, which breaks even a 64-bit value. We did not detect that, but carried on, which resulted in the number becoming negative, before we had applied the negation from the '-' sign. Overall this results in a very large negative number becoming a very large positive number. The xpswrite device uses libtiff to write images into XPS files as TIFF format images. The libtiff library tries to allocate enough memory to hold the entire image in memory which, with the broken Height value, results in trying to allocate many terabytes of memory, which causes the error. We've tried getting the libtiff maintainers to provide us with a hook so that we can use our memeory manager instead of the system malloc, on two occasions providing patches to do so, but have been rebuffed. Obviously if we cannot use our memory manager then we cannot apply our limits to memory use. This commit does not address that, it detects the integer overflow and clamps the value to the last digit before it overflows. Arguably the value could be reset to 0 but we have no guidance in the spec on this so clamping it is as good as zeroing it. For this specific case it solves the problem, because the Height remains negative and we don't try to render image with Height less than 1. While it doesn't solve the general case its worth having the overflow detection anyway.
* GhostPDF - improve unmatchedmark handlingKen Sharp2022-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | OSS-fuzz #51318 Commit 78e237d60bf7ba01af6258bc75e250fbcad554e0 to fix OSS-fuzz bug #51038 caused us to ignore unmatched array or dictionary marks and simply return, without an error. That caused a problem with read_dict, which expected either an object on the stack or an error (not unreasonably) and tried to use a non-existent object. This commit amends the previous one. We still ignore unmatched marks, so we still get the progression noted in the original commit, and the original OSS-fuzz bug is still fixed, but we now try and find another token. If we succeed we'll return that, and if we fail we'll return an error, which satisfies this bug. Also fix the error message which didn't make grammatical sense.
* GhostPDF - Alter the COS object depth counting to catch unmatched marksKen Sharp2022-09-101-0/+1
| | | | | | | | | | OSS-fuzz #50138 The original fix for this was again incomplete. This commit correct that and adds the ability to detect unmatched closing marks. This creates a progression with test suite file /tests_private/pdf/sumatra/1127_-_prints_wrong_characters.pdf
* GhostPDF - early sanity check on Pages tree root node /CountKen Sharp2022-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A number of OSS-fuzz files timeout due to having an excessively large /Count (eg 213804087) in the root node of the Pages tree. This doesn't cause us any real problems, except that it takes a long time to fail to render that many pages. On the other hand, validating the entire pages tree for a large file with many nodes could take a reasonable amount of time. Not huge but it would mean a performance hit on sensible files just to avoid this penalty on broken files. As a compromise; this commit checks the root node /Kids array, if it has sufficient entries to match the /Count then we assume its a flat tree and the Count is correct. If it has fewer then we assume it's a tree and we check each of the entries in the /Kids array. If its a leaf node then we add 1 to the running count. If it is an intermediate node then we add the /Count of the node to the running count. If the one-level check matches the root node /Count then we assume the Count is correct. If it does not, then we assume that the count of the first level is correct. We could, if desired, validate the entire tree instead at this point but I don't think it's worth it. If the tree is really broken then the file is going to fail. This commit prevents the timeout and if the corruption is limited to the Count of the root node then it will recover from that.
* GhostPDF - limit the nesting of arrays and dictionariesKen Sharp2022-09-061-0/+1
| | | | | | | | | | | | OSS-fuzz 50138 If a file was sufficiently insanely broken to nest arrays or dictionaries to a sufficient depth, we could exhaust the C execution stack when trying to free the top level one. This limits the nesting to a compile-time level (currently 100). If we exceed that we simply stop nesting the objects but store an error for reporting on exit.
* GhostPDF - cater for Encrypt entry being the null objectKen Sharp2022-07-261-0/+2
| | | | | | | | | | | bugzilla_pdf\0687117_00346_pdf031031.pdf has an Encrypt entry in the Root dictionary, but it is a null object which was causing us to error trying to interpret it. Just ignore null objects, but throw an error if the Encrypt entry is neither a dictionary nor null. Add an error entry, and one ofr the following commit to use too.
* GhostPDF - add more warnings for invalid PDF filesKen Sharp2022-05-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | Starting with: Bug #704948 "Warn about xref stream pointing to xref table" I've chosen for now at least to make this an error, but continue and read the xref table anyway unless PDFSTOPONERROR is set. In any event flag the error. Also check 'stream' keywords to see that they are terminated by a linefeed as per the specification. We've seen files where a carriage return on it's own is used, and if we ignore the problem then the file runs to completion so this is just a warning currently. And finally; if we encounter standard xref table entries which are not exactly 20 bytes long, flag a warning. Again we've seen plenty of examples where the entry is terminated with a linefeed instead of a carriage return and linefeed or padding space and linefeed, and if we ignore the error then the file runs properly. So again this is a warning not an error.
* GhostPDF - ignore errors in setting halftonesKen Sharp2022-05-031-0/+1
| | | | | | | | | | | | | | | Originally reported against the old PDF interpreter in bug #702077, the file has a ridiculous halftone which has a Frequency of 1, at higher resolutions we cannot create a halftone cell with such a low frequency and generate an error. Originally we altered the old PostScript-based to note the error and carry on. GhostPDF does the same, but aborts the entire GState sequence leading to missing content. This commit instead swallows the graphics library error and substitutes a PDF-specific error, so that the error is not completely ignored. If PDFSTOPONERROR is set then we do return the underlying error.
* GhostPDF - Don't abort the page with errors on /GroupKen Sharp2022-04-071-0/+1
| | | | | | | | | Bug #705026 "Error when reading latex-pdf" The PDF file has a page dictionary which has a /Group key where the value is an indirect reference to a free object. Previously we would abort the page processing at that point, this commit instead flags the error but continues to process the page Content stream.
* pdfwrite - fix group attributes ColorSpaceKen Sharp2022-04-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bug #705079 "Blending issue with the drop-shadow image" This turns out to be because the new PDF interpreter was not setting the current colour space to be the /CS from the XObject group attributes dictionary. Attempting to do this turned out to be well nigh impossible. We can't (not sure why but it certainly doesn't work) use gsave/grestore to preserve the colour space. Attempting to replicate the old PDF interpreter in PostScript behaviour and copying the current colour and spaces led to seg faults. On closer inspection, however, it turns out that the only reason for setting the current colour space is because the pdf14 compositor discards the ColorSpace passed in from the interpreter. The interpreter passes a gs_transparency_group_params_t structure, which cotnains an entry for the ColorSpace, the graphics library however copies the entries into a gs_pdf14trans_params_t which does *not* contain a ColorSpace. I've no idea what the point of this was, but it seems mad. This commit adds a new member 'ColorSpace' to the gs_pdf14trans_params_t structure, and initialises it from the gs_transparency_group_params_t structure. We then modify pdfwrite to use that instead of the current colour space. The PostScript and C based PDF interpreters both already initialised that member so nothing further needed to be done there. The XPS interpreter did not initialise that member, and so we also update it to do so. This fixes the bug and shows no diffs on the cluster. Finally update the PDF interpreter, it seems that the old code accepted a /ColorSpace in place of a /CS in the group attributes dictionary even though this is technically invalid. Fall back to the current colour space if no /CS or /ColorSpace is present in the group attributes dictionary (this is illegal but it might 'work'). Finally add a check, as per the old code, to see that the number of components for the colour space matches the number of components in the /BC array if one is supplied. Unlike Acrobat we choose not to ignore the SMask if they don't match. We do emit warnings/errors for all the above conditions.
* OSS-fuzz #44983 - Apply limits to decryption /LengthKen Sharp2022-02-241-0/+1
| | | | | | | | | | | | | We were using the /Length value supplied in the Encrypt dictionary even for encryption types where that is not valid. This commit fixes the KeyLen value for those encryption types for which it is not a variable (most of them), checks the minimum/maximum and multiple of 8 for the on really variable type, and flags warnings if the /Length is supplied for an inappropriate filter. It's possible we will see a load of files encrypted with V 3-6 which supply a /Length where they technically shouldn't, and will raise a warning. We might want to not do that in future if it proves irksome.
* Move pdfi warning and error definitions into their own file.Robin Watts2022-02-161-0/+55
This avoids having 2 tables of errors (enum and error strings) defined in 2 separate files that need to be kept in sync. Same for warnings.