Check for invalid redeclarations of predefined entities

Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.
author: Nick Wellnhofer <wellnhofer@aevum.de> 2021-02-08 20:58:32 +0100
committer: Nick Wellnhofer <wellnhofer@aevum.de> 2021-02-08 21:51:26 +0100
commit: 01411e7c5ea0fff181271e092f46a2138c3720ec (patch)
tree: 48ec7a433fabdc5293c70149e976afc16e0f6a9f /tree.c
parent: 07920b4381873187c02df53fa9b5d44aff3a7041 (diff)
download: libxml2-01411e7c5ea0fff181271e092f46a2138c3720ec.tar.gz
1 files changed, 13 insertions, 0 deletions
diff --git a/tree.c b/tree.c
index d2347dfd..d6ea7049 100644
--- a/tree.c
+++ b/tree.c
@@ -1310,6 +1310,16 @@ xmlStringLenGetNodeList(const xmlDoc *doc, const xmlChar *value, int len) {
 		else
 		    tmp = 0;
 		while (tmp != ';') { /* Non input consuming loop */
+                    /*
+                     * If you find an integer overflow here when fuzzing,
+                     * the bug is probably elsewhere. This function should
+                     * only receive entities that were already validated by
+                     * the parser, typically by xmlParseAttValueComplex
+                     * calling xmlStringDecodeEntities.
+                     *
+                     * So it's better *not* to check for overflow to
+                     * potentially discover new bugs.
+                     */
 		    if ((tmp >= '0') && (tmp <= '9'))
 			charval = charval * 16 + (tmp - '0');
 		    else if ((tmp >= 'a') && (tmp <= 'f'))
@@ -1338,6 +1348,7 @@ xmlStringLenGetNodeList(const xmlDoc *doc, const xmlChar *value, int len) {
 		else
 		    tmp = 0;
 		while (tmp != ';') { /* Non input consuming loops */
+                    /* Don't check for integer overflow, see above. */
 		    if ((tmp >= '0') && (tmp <= '9'))
 			charval = charval * 10 + (tmp - '0');
 		    else {
@@ -1517,6 +1528,7 @@ xmlStringGetNodeList(const xmlDoc *doc, const xmlChar *value) {
 		cur += 3;
 		tmp = *cur;
 		while (tmp != ';') { /* Non input consuming loop */
+                    /* Don't check for integer overflow, see above. */
 		    if ((tmp >= '0') && (tmp <= '9'))
 			charval = charval * 16 + (tmp - '0');
 		    else if ((tmp >= 'a') && (tmp <= 'f'))
@@ -1539,6 +1551,7 @@ xmlStringGetNodeList(const xmlDoc *doc, const xmlChar *value) {
 		cur += 2;
 		tmp = *cur;
 		while (tmp != ';') { /* Non input consuming loops */
+                    /* Don't check for integer overflow, see above. */
 		    if ((tmp >= '0') && (tmp <= '9'))
 			charval = charval * 10 + (tmp - '0');
 		    else {
author	Nick Wellnhofer <wellnhofer@aevum.de>	2021-02-08 20:58:32 +0100
committer	Nick Wellnhofer <wellnhofer@aevum.de>	2021-02-08 21:51:26 +0100
commit	01411e7c5ea0fff181271e092f46a2138c3720ec (patch)
tree	48ec7a433fabdc5293c70149e976afc16e0f6a9f /tree.c
parent	07920b4381873187c02df53fa9b5d44aff3a7041 (diff)
download	libxml2-01411e7c5ea0fff181271e092f46a2138c3720ec.tar.gz