| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Make it clearer that 'repair' means 'rebuilt the xref'
|
|
|
|
|
|
|
| |
"ccs digital-1-unc.pdf"
The file has a number of Form XObjects which are missing the required
/BBox. The old interpreter used to warn about that, so we do too now.
|
|
|
|
|
|
|
|
| |
Bug691941.pdf
When this happens we do the same as the old interpreter and use the
number of components from the ICC profile, but we were doing it
silently. Log an error for reporting.
|
|
|
|
|
|
|
|
| |
Bug691872.pdf
The fonts have a /Subtype of '/' which is of course illegal. Doesn't
matter for pdfi which doesn't trust the subtype anyway but the old
interpreter raised a warning.
|
|
|
|
|
|
|
|
|
|
| |
My test file PATTYP1-red.pdf but also tests_private/pdf/PDFIA1.7_SUBSET/IA3Z0970.pdf
If an ExtGState contains a BG or UCR key and the value is a name
rather than a function, then instead of exiting the ExtGState with an
error log the error as normal and carry on.
Similarly if a TR has a value which is a name other than /Identity.
|
|
|
|
|
|
|
| |
Specimen file Bug690364.pdf
The old PS-based interpreter used to raise an error when XObjects had
the wrong type. We now do this too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test file Bug688485.pdf
If a SMask is missing the Group (/G) raise an error rather than
a dmprintf.
Same for a missing or unrecognised Subtype (/S)
And for XObjects with the /PS (PostScript) subtype.
Errors and warnings are preferred to printf because we only report them
once at the end rather than on each occurrence which can be annoyingly
verbose.
|
|
|
|
|
|
|
|
| |
If a hybrid file (one with an xref and an XRefStm) XRefStm value in the
trailer dictionary did not point to a valid cross reference stream
dictionary we were silently ignoring the error.
Report the error at end of job.
|
|
|
|
|
|
|
|
|
| |
A typo in pdfi_annot_draw_Link meant we were swallowing an error. Fix
that.
In addition; report errors when handling annotations in a debug build
and store them for end of job reporting. Add a new error for the end of
job report.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This started off from the pdfa.org 'safedocs' file utf16LE-test.pdf
The file has two XObjects, both of which have optional content
dictionaries which are initally 'OFF'. We were not applying the OC
dictionaries for form or image XObjects which meant that we were
drawing the XObjects when we should not.
After that, two test files showed differences;
/tests_private/comparefiles/BFAX35P5.pdf
/tests_private/pdf/sumatra/1348_-_support_Additional_Actions.pdf
I had noticed BFAX35P5 during the deveopment of pdfi but because
Acrobat draws the gray watermark I had assumed this was a progression
when in fact it was a regression. We should not draw the watermark
when we are being a printer (Printed=true). The same is true for
the Sumatra file.
Added a couple of new error types, and fixed a spelling error in an
existing one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We differentiate between streams and dictionaries which some consumers
apparently do not do. When looking at the pdfa.org 'safedocs' collection
the file Dialect-DictIsStream.pdf threw an error.
I don't plan to change all the places that we differentiate between
streams and dictionaries (there are more than 200) unless it should
become apparent that Acrobat does so, but I'll do this one. Should we
later choose to do more cases then this makes a useful example of how
we could do so.
We now process the dictionary and log an error on exit.
|
|
|
|
|
|
|
| |
The old PDF interpreter specifically mentioned this, our code predated
the better error handling and so only logged a warning in debug builds.
Add a specific error so we can report on it at the end.
|
|
|
|
|
|
|
|
|
|
| |
We could get silly sequences where we would report 'PDF file was
repaired' followed by 'couldn't repair PDF file' which doesn't make
sense.
Change the wording of the message to indicate that its because we
attempted a second repair of the file (ie something was still wrong
after our repair attempt).
|
| |
|
|
|
|
|
|
|
|
| |
I found the wording of this message confusing, we aren't counting down
a reference at all, we are trying to dereference an indirect reference
and the xref table has the object marked as free.
Changed the wording to make this more obvious.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Follow-on from commit1042469a5225ff063c465e61dbd3ebb50c770006, the
customer didn't like the fact that we were dropping the Link annotations
from the document.
This happened because the value associated with the /Dest key was neither
a dictionary nor an array (as it is specified to be), but was a null
object. Which caused an error, so we dropped the annotation.
Previously (old interpreter and prior versions of pdfi) we retained the
Link annotation and dropped the Dest, which still makes it invalid of
course....
This commit simply restores that behaviour, while noting the error for
reporting at the end.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OSS-fuzz 53393
The fuzzed file corrupts a named ColorSpace in the Resources dictionary
of the page. The name should be /DefaultRGB with an indirect reference
to the space, what we actually get is :
/ColorSpace<</De////////faultRGB 3 0 R>>
The odd characters are 0x0F bytes. So that defines a named ColorSpace
/De as being the name <0x2f0x0F> and then defines the named space
/<0x2F0x0F> as being /<0x2F0x0F>, ie it is self-referencing.
We detect that for indirect objects, when dereferencing them, but
there's no convenient central place to do that for self-referencing
names, we just have to watch out for it when trying to get the object.
Added a new error to cater for this case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bug #706038 " Can't read some PDF files with the new PDF interpreter: "Couldn't initialise file" / object lacks an endobj"
The PDF file her is invalid; the Catalog dictionary has a /Type of
'/Calalog' instead of '/Catalog'. The Type is a required entry and must
be /Catalog to be valid.
However we already treat a missing /Type as an error but continue if the
purported Catalog dictionary contains a /Pages key (the only other
required key). So do the same here; if the trailer dictionary points us
to a dictionary which has a /Type which is not /Catalog, but it does
have a /Pages key whose value is a dictionary, then treat it as a valid
Catalog.
We do, of course, issue an error message at the end of processing the
file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This stems from OSS-fuzz 52535
Part of the problem is that an image in the PDF file has a /Height which
is -2103029590437764, which breaks even a 64-bit value. We did not
detect that, but carried on, which resulted in the number becoming
negative, before we had applied the negation from the '-' sign.
Overall this results in a very large negative number becoming a very
large positive number.
The xpswrite device uses libtiff to write images into XPS files as TIFF
format images. The libtiff library tries to allocate enough memory to
hold the entire image in memory which, with the broken Height value,
results in trying to allocate many terabytes of memory, which causes
the error. We've tried getting the libtiff maintainers to provide us
with a hook so that we can use our memeory manager instead of the
system malloc, on two occasions providing patches to do so, but have
been rebuffed. Obviously if we cannot use our memory manager then we
cannot apply our limits to memory use.
This commit does not address that, it detects the integer overflow
and clamps the value to the last digit before it overflows. Arguably the
value could be reset to 0 but we have no guidance in the spec on this so
clamping it is as good as zeroing it.
For this specific case it solves the problem, because the Height remains
negative and we don't try to render image with Height less than 1.
While it doesn't solve the general case its worth having the overflow
detection anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OSS-fuzz #51318
Commit 78e237d60bf7ba01af6258bc75e250fbcad554e0 to fix OSS-fuzz bug
#51038 caused us to ignore unmatched array or dictionary marks and
simply return, without an error.
That caused a problem with read_dict, which expected either an object
on the stack or an error (not unreasonably) and tried to use a
non-existent object.
This commit amends the previous one. We still ignore unmatched marks,
so we still get the progression noted in the original commit, and the
original OSS-fuzz bug is still fixed, but we now try and find another
token. If we succeed we'll return that, and if we fail we'll return an
error, which satisfies this bug.
Also fix the error message which didn't make grammatical sense.
|
|
|
|
|
|
|
|
|
|
| |
OSS-fuzz #50138
The original fix for this was again incomplete. This commit correct that
and adds the ability to detect unmatched closing marks.
This creates a progression with test suite file
/tests_private/pdf/sumatra/1127_-_prints_wrong_characters.pdf
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A number of OSS-fuzz files timeout due to having an excessively large
/Count (eg 213804087) in the root node of the Pages tree.
This doesn't cause us any real problems, except that it takes a long
time to fail to render that many pages. On the other hand, validating
the entire pages tree for a large file with many nodes could take a
reasonable amount of time. Not huge but it would mean a performance hit
on sensible files just to avoid this penalty on broken files.
As a compromise; this commit checks the root node /Kids array, if it has
sufficient entries to match the /Count then we assume its a flat tree
and the Count is correct. If it has fewer then we assume it's a tree and
we check each of the entries in the /Kids array. If its a leaf node then
we add 1 to the running count. If it is an intermediate node then we add
the /Count of the node to the running count.
If the one-level check matches the root node /Count then we assume the
Count is correct. If it does not, then we assume that the count of
the first level is correct.
We could, if desired, validate the entire tree instead at this point
but I don't think it's worth it. If the tree is really broken then the
file is going to fail. This commit prevents the timeout and if the
corruption is limited to the Count of the root node then it will
recover from that.
|
|
|
|
|
|
|
|
|
|
|
|
| |
OSS-fuzz 50138
If a file was sufficiently insanely broken to nest arrays or
dictionaries to a sufficient depth, we could exhaust the C execution
stack when trying to free the top level one.
This limits the nesting to a compile-time level (currently 100). If we
exceed that we simply stop nesting the objects but store an error for
reporting on exit.
|
|
|
|
|
|
|
|
|
|
|
| |
bugzilla_pdf\0687117_00346_pdf031031.pdf has an Encrypt entry in the
Root dictionary, but it is a null object which was causing us to error
trying to interpret it.
Just ignore null objects, but throw an error if the Encrypt entry is
neither a dictionary nor null.
Add an error entry, and one ofr the following commit to use too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Starting with:
Bug #704948 "Warn about xref stream pointing to xref table"
I've chosen for now at least to make this an error, but continue and
read the xref table anyway unless PDFSTOPONERROR is set. In any event
flag the error.
Also check 'stream' keywords to see that they are terminated by a
linefeed as per the specification. We've seen files where a carriage
return on it's own is used, and if we ignore the problem then the
file runs to completion so this is just a warning currently.
And finally; if we encounter standard xref table entries which are not
exactly 20 bytes long, flag a warning. Again we've seen plenty of
examples where the entry is terminated with a linefeed instead of a
carriage return and linefeed or padding space and linefeed, and if we
ignore the error then the file runs properly. So again this is a warning
not an error.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally reported against the old PDF interpreter in bug #702077, the
file has a ridiculous halftone which has a Frequency of 1, at higher
resolutions we cannot create a halftone cell with such a low frequency
and generate an error.
Originally we altered the old PostScript-based to note the error and
carry on. GhostPDF does the same, but aborts the entire GState sequence
leading to missing content.
This commit instead swallows the graphics library error and substitutes
a PDF-specific error, so that the error is not completely ignored.
If PDFSTOPONERROR is set then we do return the underlying error.
|
|
|
|
|
|
|
|
|
| |
Bug #705026 "Error when reading latex-pdf"
The PDF file has a page dictionary which has a /Group key where the
value is an indirect reference to a free object. Previously we would
abort the page processing at that point, this commit instead flags the
error but continues to process the page Content stream.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bug #705079 "Blending issue with the drop-shadow image"
This turns out to be because the new PDF interpreter was not setting the
current colour space to be the /CS from the XObject group attributes
dictionary.
Attempting to do this turned out to be well nigh impossible. We can't
(not sure why but it certainly doesn't work) use gsave/grestore to
preserve the colour space. Attempting to replicate the old PDF interpreter
in PostScript behaviour and copying the current colour and spaces led
to seg faults.
On closer inspection, however, it turns out that the only reason for
setting the current colour space is because the pdf14 compositor
discards the ColorSpace passed in from the interpreter. The interpreter
passes a gs_transparency_group_params_t structure, which cotnains an
entry for the ColorSpace, the graphics library however copies the entries
into a gs_pdf14trans_params_t which does *not* contain a ColorSpace.
I've no idea what the point of this was, but it seems mad. This commit
adds a new member 'ColorSpace' to the gs_pdf14trans_params_t
structure, and initialises it from the gs_transparency_group_params_t
structure. We then modify pdfwrite to use that instead of the current
colour space.
The PostScript and C based PDF interpreters both already initialised
that member so nothing further needed to be done there. The XPS
interpreter did not initialise that member, and so we also update it
to do so.
This fixes the bug and shows no diffs on the cluster.
Finally update the PDF interpreter, it seems that the old code
accepted a /ColorSpace in place of a /CS in the group attributes
dictionary even though this is technically invalid.
Fall back to the current colour space if no /CS or /ColorSpace is
present in the group attributes dictionary (this is illegal but
it might 'work').
Finally add a check, as per the old code, to see that the number
of components for the colour space matches the number of components
in the /BC array if one is supplied. Unlike Acrobat we choose not
to ignore the SMask if they don't match.
We do emit warnings/errors for all the above conditions.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were using the /Length value supplied in the Encrypt dictionary
even for encryption types where that is not valid. This commit
fixes the KeyLen value for those encryption types for which it is not
a variable (most of them), checks the minimum/maximum and multiple of 8
for the on really variable type, and flags warnings if the /Length is
supplied for an inappropriate filter.
It's possible we will see a load of files encrypted with V 3-6 which
supply a /Length where they technically shouldn't, and will raise a
warning. We might want to not do that in future if it proves irksome.
|
|
This avoids having 2 tables of errors (enum and error strings) defined
in 2 separate files that need to be kept in sync. Same for warnings.
|