summaryrefslogtreecommitdiff
path: root/result
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix parse failure when 4-byte character in UTF-16 BE is split across a chunkDavid Kilzer2022-01-1628-0/+292
| | | | | | | | | | | | | | | | | | | | | | | This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8(). * encoding.c: (UTF16LEToUTF8): - Fix comment to describe what the code does. (UTF16BEToUTF8): - Fix undefined behavior which was applied to UTF16LEToUTF8() in 2f9382033e. - Add bounds check to while() loop which was applied to UTF16LEToUTF8() in be803967db. - Do not return -2 when (in >= inend) to fix the bug. This was applied to UTF16LEToUTF8() in 496a1cf592. - Inline (<< 8) statements to match UTF16LEToUTF8(). Add the following tests and results: test/text-4-byte-UTF-16-BE-offset.xml test/text-4-byte-UTF-16-BE.xml test/text-4-byte-UTF-16-LE-offset.xml test/text-4-byte-UTF-16-LE.xml
* Fix regression parsing public IDs literals in HTMLNick Wellnhofer2022-01-103-0/+28
| | | | | | | Fix regression introduced when reworking htmlParsePubidLiteral in commit 93ce33c2. Fixes #318.
* Fix handling of unexpected EOF in xmlParseContentNick Wellnhofer2021-05-082-2/+2
| | | | | | | | Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was removed in commit 62150ed2. This commit also introduced a regression for direct users of xmlParseContent. Unclosed tags weren't checked.
* Fix line numbers in error messages for mismatched tagsNick Wellnhofer2021-05-072-4/+4
| | | | | | | | | Commit 62150ed2 introduced a small regression in the error messages for mismatched tags. This typically only affected messages after the first mismatch, but with custom SAX handlers all line numbers would be off. This also fixes line numbers in the SAX push parser which were never handled correctly.
* Check for invalid redeclarations of predefined entitiesNick Wellnhofer2021-02-089-2/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.
* Fix timeout when handling recursive entitiesNick Wellnhofer2020-12-181-149/+29
| | | | | | | Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.
* use new htmlParseLookupCommentEnd to find comment endsMike Dalessio2020-12-162-2/+2
| | | | | | | | | Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
* htmlParseComment: treat `--!>` as if it closed the commentMike Dalessio2020-12-166-11/+30
| | | | | | See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
* add test coverage for incorrectly-closed commentsMike Dalessio2020-12-166-0/+108
| | | | | this establishes the baseline behavior so that subsequent commits which modify this behavior are clear about what's being changed.
* Fix regression introduced with commit 74dcc10bNick Wellnhofer2020-08-192-0/+33
| | | | | | The code wasn't dead after all, but I can see no reason in delaying the XPointer evaluation. This could lead to nodes included earlier appearing in XPointer results.
* Fix corner case with empty xi:fallbackNick Wellnhofer2020-08-172-0/+1
| | | | | xi:fallback could become empty after recursive expansion. Use a flag to track whether nodes should be skipped.
* Fix exponential runtime and memory in xi:fallback processingNick Wellnhofer2020-08-072-0/+167
| | | | | | | | | When creating XML_XINCLUDE_START nodes, the children of the original xi:include node must be freed, otherwise fallback content is copied twice, doubling runtime and memory consumption for each nested xi:fallback/xi:include pair. Found with libFuzzer.
* Don't recurse into xi:include children in xmlXIncludeDoProcessNick Wellnhofer2020-08-066-0/+72
| | | | | | | Otherwise, nested xi:include nodes might result in a use-after-free if XML_PARSE_NOXINCNODE is specified. Found with libFuzzer and ASan.
* Fix several quadratic runtime issues in HTML push parserNick Wellnhofer2020-07-2312-55/+26
| | | | | | | | | | | | | | | | | | | | | | | Fix a few remaining cases where the HTML push parser would scan more content during lookahead than being parsed later. Make sure that htmlParseDocTypeDecl consumes all content up to the final '>' in case of errors. The old comment said "We shouldn't try to resynchronize", but ignoring invalid content is also what the HTML5 spec mandates. Likewise, make htmlParseEndTag skip to the final '>' in invalid end tags even if not in recovery mode. This is probably the most visible change in practice and leads to different output for some tests but is also more in line with HTML5. Make sure that htmlParsePI and htmlParseComment don't abort if invalid characters are encountered but log an error and ignore the character. Change some other end-of-buffer checks to test for a zero byte instead of relying on IS_CHAR. Fix usage of IS_CHAR macro in htmlParseScript.
* Add regexp regression testsDavid Kilzer2020-07-064-0/+9
| | | | | | | | | | | | | - Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup <https://bugzilla.gnome.org/show_bug.cgi?id=757711> - Bug 783015 - Integer-overflow in xmlFAParseQuantExact <https://bugzilla.gnome.org/show_bug.cgi?id=783015> (Regexptests): Add support for checking stderr output when running regexp tests. This makes it possible to check in test cases that fail and not see false-positive error output when running the tests. Unlike other libxml2 test suites, if there is no stderr output, no *.err file needs to be created.
* Fix quadratic runtime in HTML parserNick Wellnhofer2020-07-063-0/+87
| | | | | | | | | | | | | Commit eeb99329 removed an important optimization avoiding quadratic runtime when repeatedly scanning the input buffer for terminating characters in the HTML push parser. The related bug is https://bugzilla.gnome.org/show_bug.cgi?id=444994 Make sure that ctxt->checkIndex is always written and store additional parser state in ctxt->inSubset which is unused in the HTML parser. Found by OSS-Fuzz.
* Add test case for recursive external parsed entitiesNick Wellnhofer2020-02-114-0/+248
|
* Enable error tests with entity substitutionNick Wellnhofer2020-02-1119-0/+202
|
* Don't load external entity from xmlSAX2GetEntityNick Wellnhofer2020-02-111-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Despite the comment, I can't see a reason why external entities must be loaded in the SAX handler. For external entities, the handler is typically first invoked via xmlParseReference which will later load the entity on its own if it wasn't loaded yet. The old code also lead to duplicated SAX events which makes it basically impossible to reuse xmlSAX2GetEntity for a custom SAX parser. See the change to the expected test output. Note that xmlSAX2GetEntity was loading the entity via xmlParseCtxtExternalEntity while xmlParseReference uses xmlParseExternalEntityPrivate. In the previous commit, the two functions were merged, trying to compensate for some slight differences between the two mostly identical implementations. But the more urgent reason for this change is that xmlParseReference has the facility to abort early when recursive entities are detected, avoiding what could practically amount to an infinite loop. If you want to backport this change, note that the previous three commits are required as well: f9ea1a24 Fix copying of entities in xmlParseReference 5c7e0a9a Copy some XMLReader option flags to parser context 1a3e584a Merge code paths loading external entities Found by OSS-Fuzz.
* Fix copying of entities in xmlParseReferenceNick Wellnhofer2020-02-112-6/+3
| | | | | | | | | Before, reader mode would end up in a branch that didn't handle entities with multiple children and failed to update ent->last, so the hack copying the "extra" reader data wouldn't trigger. Consequently, some empty nodes in entities are correctly detected now in the test suite. (The detection of empty nodes in entities is still buggy, though.)
* Large batch of typo fixesJared Yanovich2019-09-304-4/+4
| | | | Closes #109.
* Disallow conditional sections in internal subsetNick Wellnhofer2019-09-302-26/+24
| | | | | Conditional sections are only allowed in *external* parameter entities referenced from the internal subset.
* Make xmlParseConditionalSections non-recursiveNick Wellnhofer2019-09-308-20/+27
| | | | | | Avoid call stack overflow in deeply nested conditional sections. Found by OSS-Fuzz.
* Fix RegextestsNick Wellnhofer2019-09-251-1/+1
| | | | | | - One of the bug316338 test cases is expected to succeed. - Memory leak in testRegexp.c. - Refcount handling in xmlExpHashGetEntry.
* Fix empty branch in regexNick Wellnhofer2019-09-251-0/+9
| | | | | | | Fixes bug 649244: https://bugzilla.gnome.org/show_bug.cgi?id=649244 Closes #57.
* Make xmlParseContent and xmlParseElement non-recursiveNick Wellnhofer2019-09-232-3/+3
| | | | | | | Split xmlParseElement into subfunctions. Use nameNsPush to store prefix, URI and nsNr on the heap, similar to the push parser. Closes #84.
* Remove executable bit from non-executable filesNick Wellnhofer2019-09-168-0/+0
|
* Fix expected output of test/schemas/any4Nick Wellnhofer2019-09-162-1/+1
| | | | | | | | libxml2 correctly rejects any4_0.xsd as invalid schema. I can't figure out what the intent behind this test case was. Simply adjust the expected output to match the current behavior. Closes #92.
* Fix Schema determinism check of ##other namespacesNick Wellnhofer2019-09-162-0/+1
| | | | | | | Non-compound (##local) and compound string atoms are always disjoint regardless of whether the compound atom is negated (##other). Closes #40.
* Misleading error message with xs:{min|max}Inclusivebettermanzzy2019-08-255-11/+0
| | | | Closes #53.
* Fix unability to RelaxNG-validate grammar with choice-based name classJan Pokorný2019-08-252-0/+1
| | | | | | | | | | | | Previously, test/relaxng/ambig_name-class2.xml would fail to validate against test/relaxng/ambig_name-class2.rng: > test/relaxng/ambig_name-class2.rng:4: > element attribute: Relax-NG parser error : > Found anyName attribute without oneOrMore ancestor > Relax-NG schema test/relaxng/ambig_name-class2.rng failed to compile Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
* Fix unability to validate ambiguously constructed interleave for RelaxNGJan Pokorný2019-08-252-0/+1
| | | | | | | | | Previously, test/relaxng/ambig_name-class.xml would fail to validate for a simple reason -- interleave within "open-name-class" context is supposed to be fine with whatever else is pending the consumption, since effectively, it's unrelated from a higher parsing perspective. Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
* Fix unsigned integer overflowNick Wellnhofer2019-05-202-2/+2
| | | | | It's defined behavior but -fsanitize=unsigned-integer-overflow is useful to discover bugs.
* Improve XPath predicate and filter evaluationNick Wellnhofer2019-04-221-0/+28
| | | | | | | | | | | | | | | | Consolidate code paths evaluating XPath predicates and filters. Don't push context node on stack when evaluating predicates. I have no idea why this was done. It seems completely useless and trying to pop the context node from a corrupted stack has already caused security issues. Filter nodesets in-place and don't create node sets with NULL gaps which allows to simplify merging a great deal. Simply move matched nodes backward and create a compact node set. Merge xmlXPathCompOpEvalPositionalPredicate into xmlXPathCompOpEvalPredicate.
* Fix float casts in xmlXPathSubstringFunctionNick Wellnhofer2019-03-081-0/+8
| | | | | | | | | | | | | Rewrite conversion of double to int in xmlXPathSubstringFunction, adding range checks to avoid undefined behavior. Make sure to add start and length as floating-point numbers before converting to int. Fix a bug when rounding negative start indices. Remove unneeded calls to xmlXPathIs{Inf,NaN} and rely on IEEE math instead. Avoid computing the string length. xmlUTF8Strsub works as expected if the length of the requested substring exceeds the input. Found with libFuzzer and UBSan.
* Remove redefined starts and defines inside include elementsNikolai Weibull2018-11-294-0/+2
| | | | | | When including a grammar from another grammar, we need to make sure that any redefines of starts and includes that that grammar does inside any of its include elements are also removed.
* Allow choice within choice in nameClass in RELAX NGNikolai Weibull2018-11-294-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pattern nameClass allows for nested choice elements, for example <name> <choice> <choice> <name>a</name> <name>b</name> </choice> <name>c</name> </choice> </name> which is semantically equivalent to <name> <choice> <name>a</name> <name>b</name> <name>c</name> </choice> </name> The old code didn’t handle this correctly, as it never expected a choice inside another choice. This patch fixes this by flattening any nested choices. This pattern of nested choice elements comes up in RELAX NG simplification, where all choice elements are rewritten in this nested manner, see section 4.12 of the RELAX NG specification.
* Look inside divs for starts and defines inside includeNikolai Weibull2018-11-294-0/+2
| | | | | | RELAX NG allows for div elements inside of include elements. We need to look inside those div elements for start and define elements that may be redefining start and define elements in the included grammar.
* Free input buffer in xmlHaltParserNick Wellnhofer2018-09-111-10/+7
| | | | | | | | This avoids miscalculation of available bytes. Thanks to Yunho Kim for the report. Closes: #26
* Add test for ICU flush and pivot bufferNick Wellnhofer2017-11-047-0/+81
|
* Fix comparison of nodesets to stringsNick Wellnhofer2017-10-071-0/+13
| | | | | | | | | | | | | | Fix two bugs in xmlXPathNodeValHash which could lead to errors when comparing nodesets to strings: - Only use contents of text nodes to compute the hash for element nodes. Comments, PIs, and other node types don't affect the string-value and must be ignored. - Reset `string` to NULL for node types other than text. Reported by Aleksei on the mailing list: https://mail.gnome.org/archives/xml/2017-September/msg00016.html
* Revert "Print error messages for truncated UTF-8 sequences"v2.9.5-rc2Nick Wellnhofer2017-08-309-35/+0
| | | | | | | | | | This reverts commit 79c8a6b which caused a serious regression in streaming mode. Also reverts part of commit 52ceced "Fix infinite loops with push parser in recovery mode". Fixes bug 786554.
* Detect infinite recursion in parameter entitiesNick Wellnhofer2017-07-253-0/+13
| | | | | | | | | When expanding a parameter entity in a DTD, infinite recursion could lead to an infinite loop or memory exhaustion. Thanks to Wei Lei for the first of many reports. Fixes bug 759579.
* Get rid of "blanks wrapper" for parameter entitiesNick Wellnhofer2017-06-2010-75/+75
| | | | | | Now that replacement of parameter entities goes exclusively through xmlSkipBlankChars, we can account for the surrounding space characters there and remove the "blanks wrapper" hack.
* Fix xmlHaltParserNick Wellnhofer2017-06-202-13/+16
| | | | | | | | | | | | Pop all extra input streams before resetting the input. Otherwise, a call to xmlPopInput could make input available again. Also set input->end to input->cur. Changes the test output for some error tests. Unfortunately, some fuzzed test cases were added to the test suite without manual cleanup. This makes it almost impossible to review the impact of later changes on the test output.
* Spelling and grammar fixesNick Wellnhofer2017-06-172-2/+2
| | | | | Fixes bug 743172, bug 743489, bug 769632, bug 782400 and a few other misspellings.
* Rework entity boundary checksNick Wellnhofer2017-06-177-45/+62
| | | | | | | | | | | | | | | | Make sure to finish all entities in the internal subset. Nevertheless, readd a sanity check in xmlParseStartTag2 that was lost in my previous commit. Also add a sanity check in xmlPopInput. Popping an input unexpectedly was the source of many recent memory bugs. The check doesn't mitigate such issues but helps with diagnosis. Always base entity boundary checks on the input ID, not the input pointer. The pointer could have been reallocated to the old address. Always throw a well-formedness error if a boundary check fails. In a few places, a validity error was thrown. Fix a few error codes and improve indentation.
* Test SAX2 callbacks with entity substitutionNick Wellnhofer2017-06-16117-0/+46042
| | | | This detects regressions like bug 760367.
* Misc fixes for 'make tests'Nick Wellnhofer2017-06-123-1/+2
| | | | | | | | | | | | - Silence test output. - Clean up after doc/examples tests. - Adjust expected output for script tests. - Add missing results for relaxng/pattern3 There are still two test failures I can't comment on: - regexp/bug316338 - schemas/any4_0
* Initialize keepBlanks in HTML parserNick Wellnhofer2017-06-1228-136/+136
| | | | | This caused failures in the HTML push tests but the fix required to change the expected output of the HTML SAX tests.