delta/libxml2.git - gitlab.gnome.org: GNOME/libxml2.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix xmlGetNodePath with invalid node types	Nick Wellnhofer	2021-03-13	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	Make xmlGetNodePath return NULL instead of invalid XPath when hitting unsupported node types like DTD content. Reported here: https://mail.gnome.org/archives/xml/2021-January/msg00012.html Original report: https://bugs.php.net/bug.php?id=80680
*	Clarify xmlNewDocProp documentation	Nick Wellnhofer	2021-03-02	1	-0/+5
\|
*	Stop checking attributes for UTF-8 validity	Nick Wellnhofer	2021-03-02	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I can't see a reason to check attribute content for UTF-8 validity. Other parts of the API like xmlNewText have always assumed valid UTF-8 as extra checks only slow down processing. Besides, setting doc->encoding to "ISO-8859-1" seems pointless, and not freeing the old encoding would cause a memory leak. Note that this was last changed in 2008 with commit 6f8611fd which removed unnecessary encoding/decoding steps. Setting attributes should be even faster now. Found by OSS-Fuzz.
*	Fix quadratic behavior when looking up xml:* attributes	Nick Wellnhofer	2021-03-01	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a special case for the predefined XML namespace when looking up DTD attribute defaults in xmlGetPropNodeInternal to avoid calling xmlGetNsList. This fixes quadratic behavior in - xmlNodeGetBase - xmlNodeGetLang - xmlNodeGetSpacePreserve Found by OSS-Fuzz.
*	Check for invalid redeclarations of predefined entities	Nick Wellnhofer	2021-02-08	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.
*	Add the copy of type from original xmlDoc in xmlCopyDoc()	SVGAnimate	2021-02-08	1	-0/+1
\| \| \| \| \| \| \| \| \|	A bug related to php DOMDocument: https://bugs.php.net/bug.php?id=80665 When copy/clone an html document, the xmlDoc->type goes from XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.
*	Fix null deref in xmlStringGetNodeList	Nick Wellnhofer	2020-12-18	1	-0/+4
\| \| \| \| \| \|	Check for malloc failure to avoid null deref. Found with libFuzzer.
*	Fix typos	Nick Wellnhofer	2020-03-08	1	-7/+7
\| \| \| \|	Resolves #133.
*	Fix integer overflow in xmlBufferResize	Nick Wellnhofer	2020-01-10	1	-2/+7
\| \| \| \|	Found by OSS-Fuzz.
*	Fix freeing of nested documents	Nick Wellnhofer	2019-12-06	1	-0/+5
\| \| \| \| \| \| \| \|	Apparently, some libxslt RVTs can contain nested document nodes, see issue #132. I'm not sure how this happens exactly but it can cause a segfault in xmlFreeNodeList after the changes in commit 0762c9b6. Make sure not to touch the (nonexistent) `content` member of xmlDocs.
*	Enable more undefined behavior sanitizers	Nick Wellnhofer	2019-11-02	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	Minor fix to xmlStringLenGetNodeList to avoid a pointer overflow during API test. Enable pointer-overflow and unsigned-integer-overflow sanitizers in CI tests. Technically, unsigned integer overflows aren't undefined behavior, but they typically indicate programming errors. Some hash functions that really require unsigned integer overflows have already been annotated.
*	Large batch of typo fixes	Jared Yanovich	2019-09-30	1	-28/+28
\| \| \| \|	Closes #109.
*	Make xmlFreeNodeList non-recursive	Nick Wellnhofer	2019-09-23	1	-5/+21
\| \| \| \|	Avoid call stack overflow when freeing deeply nested documents.
*	Fix typos: tree: move{ -> s}, reconcil{i -> }ed, h{o -> e}ld by...	Jan Pokorný	2019-08-25	1	-4/+4
\| \| \| \| \| \|	...seems to { -> be to} add. Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
*	Fix libz and liblzma detection	Nick Wellnhofer	2017-11-27	1	-1/+1
\| \| \| \| \| \| \| \| \|	If libz or liblzma are detected with pkg-config, AC_CHECK_HEADERS must not be run because the correct CPPFLAGS aren't set. It is actually not required have separate checks for LIBXML_ZLIB_ENABLED and HAVE_ZLIB_H. Only check for LIBXML_ZLIB_ENABLED and remove HAVE_ZLIB_H macro. Fixes bug 764657, bug 787041.
*	Fix -Wimplicit-fallthrough warnings	J. Peter Mugaas	2017-10-21	1	-3/+3
\| \| \| \| \|	Add "falls through" comments to quench implicit-fallthrough warnings which are enabled by -Wextra under GCC 7.
*	Fix pointer/int cast warnings on 64-bit Windows	Nick Wellnhofer	2017-10-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	On 64-bit Windows, `long` is 32 bits wide and can't hold a pointer. Switch to ptrdiff_t instead which should be the same size as a pointer on every somewhat sane platform without requiring C99 types like intptr_t. Fixes bug 788312. Thanks to J. Peter Mugaas for the report and initial patch.
*	Fix a couple of misleading indentation errors	Daniel Veillard	2017-08-28	1	-2/+2
\| \| \| \| \|	Raised by gcc as potential error, no semantic change needed but fixed the indentation
*	Porting libxml2 on zOS encoding of code	Stéphane Michaut	2017-08-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	First set of patches for zOS - entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c: ask conversion of code to ISO Latin 1 to avoid having the compiler assume EBCDIC codepoint for characters. - xmlmodule.c: make sure we have support for modules - xmlIO.c: zOS path names are special avoid dsome of the expectstions from Unix/Windows
*	Documentation fixes	Nick Wellnhofer	2017-06-18	1	-3/+3
\| \| \| \|	Fixes bug 347465, bug 599433, bug 624550, bug 698253.
*	Fix memory leak in xmlStringLenGetNodeList	Nick Wellnhofer	2017-06-07	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Avoid expanding the entity recursively. Use the same prevention mechanism as in xmlStringGetNodeList. xmlStringGetNodeList on the other hand wasn't fixing up the 'last' pointer. I think the memory leak can only be triggered in recovery mode. Found with libFuzzer and ASan.
*	Avoid building recursive entitiesCVE-2016-3627	Daniel Veillard	2016-05-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	For https://bugzilla.gnome.org/show_bug.cgi?id=762100 When we detect a recusive entity we should really not build the associated data, moreover if someone bypass libxml2 fatal errors and still tries to serialize a broken entity make sure we don't risk to get ito a recursion * parser.c: xmlParserEntityCheck() don't build if entity loop were found and remove the associated text content * tree.c: xmlStringGetNodeList() avoid a potential recursion
*	Fix typos: dictio{ nn -> n }ar{y,ies}	Jan Pokorný	2016-04-15	1	-7/+7
\| \| \| \|	Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
*	Don't add IDs in xmlSetTreeDoc	Nick Wellnhofer	2014-12-23	1	-0/+8
\| \| \| \|	This partially reverts my previous commit fixing bug #741919.
*	Account for ID attributes in xmlSetTreeDoc	Nick Wellnhofer	2014-12-19	1	-0/+11
\|
*	Remove various unused value assignments	Philip Withnall	2014-10-27	1	-4/+4
\| \| \| \| \| \|	As detected by Coverity (CIDs 60467–60472). https://bugzilla.gnome.org/show_bug.cgi?id=739220
*	Fix and add const qualifiers	Kurt Roeckx	2014-10-13	1	-47/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For https://bugzilla.gnome.org/show_bug.cgi?id=689483 It seems there are functions that do use the const qualifier for some of the arguments, but it seems that there are a lot of functions that don't use it and probably should. So I created a patch against 2.9.0 that makes as much as possible const in tree.h, and changed other files as needed. There were a lot of cases like "const xmlNodePtr node". This doesn't actually do anything, there the pointer is constant not the object it points to. So I changed those to "const xmlNode *node". I also removed some consts, mostly in the Copy functions, because those functions can actually modify the doc or node they copy from
*	Unreachable code in tree.c	Gaurav Gupta	2014-10-06	1	-2/+1
\| \| \| \| \| \|	For https://bugzilla.gnome.org/show_bug.cgi?id=705392 Cut out an unused block
*	Support element node traversal in document fragments.	Kyle VanderBeek	2014-08-05	1	-0/+3
\| \| \| \|	https://bugzilla.gnome.org/show_bug.cgi?id=733900
*	Add couple of missing Null checks	Daniel Veillard	2014-07-26	1	-0/+4
\| \| \| \| \|	For https://bugzilla.gnome.org/show_bug.cgi?id=733710 Reported by Gaurav but with slightly different fixes
*	xmlNodeSetName: Allow setting the name to a substring of the currently set name	Tristan Van Berkom	2014-04-23	1	-2/+7
\| \| \| \| \| \| \|	Avoid freeing the currently set name until after having assigned the new name, this allows one to call xmlNodeSetName (node, node->name + 1) to set the new name of the node to a substring of the current name without introducing any crash and without requiring an extra strdup().
*	Fix a doc typo	Daniel Veillard	2014-03-28	1	-1/+1
\| \| \| \|	Raised by Blasius Bieselbert on IRC
*	Fix compilation with minimum and xinclude.	Nicolas Le Cam	2014-02-10	1	-1/+1
\| \| \| \| \| \|	xinclude needs xmlAddNextSibling(). Compile out use of xmlLocationSetPtr when xptr is disabled. Include xpath header.
*	Legacy needs xmlSAX2StartElement() and xmlSAX2EndElement().	Nicolas Le Cam	2014-02-10	1	-1/+1
\| \| \| \|	Fix compilation with minimum and legacy.
*	Fix typos in {tree,xpath}.c (errror)	Jan Pokorný	2014-02-06	1	-1/+1
\| \| \| \|	Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
*	Fix a couple of missing NULL checks	Gaurav	2013-11-29	1	-0/+2
\| \| \| \|	For https://bugzilla.gnome.org/show_bug.cgi?id=708681
*	Fix a potential NULL dereference in tree code	Daniel Veillard	2013-09-11	1	-1/+2
\| \| \| \| \| \| \|	https://bugzilla.gnome.org/show_bug.cgi?id=707750 Also reported by Gaurav, simple fix to check the pointer before dereference
*	Two smal namespace tweaks	Daniel Veillard	2013-07-22	1	-1/+6
\| \| \| \| \|	An improvement of the documentation, and an extra safety check for xmlSetNs()
*	Fix spelling of "length".	Michael Wood	2012-10-30	1	-1/+1
\|
*	Improve HTML escaping of attribute on output	Daniel Veillard	2012-09-05	1	-1/+10
\| \| \| \| \| \| \|	Handle special cases of &{...} constructs as hinted in the spec http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1 and special values as comment <!-- ... --> used for server side includes This is limited to attribute values in HTML content.
*	Add support for big line numbers in error reporting	Daniel Veillard	2012-08-13	1	-10/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the lack of line number as reported by Johan Corveleyn <jcorvel@gmail.com> * parser.c include/libxml/parser.h: add an XML_PARSE_BIG_LINES parser option not switch on by default, it's an opt-in * SAX2.c: if XML_PARSE_BIG_LINES is set store the long line numbers in the psvi field of text nodes * tree.c: expand xmlGetLineNo to extract those informations, also make sure we can't fail on recursive behaviour * error.c: in __xmlRaiseError, if a node is provided, call xmlGetLineNo() if we can't get a valid line number. * xmllint.c: switch on XML_PARSE_BIG_LINES in xmllint
*	Regenerating docs and API files	Daniel Veillard	2012-08-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Various cleanups * configure.in: force regeneration of APIs in my environment * buf.c buf.h enc.h encoding.c include/libxml/tree.h include/libxml/xmlerror.h save.h tree.c: various comment cleanups pointed by apibuild * doc/apibuild.py: added the 3 new internal headers in the excludes * doc/libxml2-api.xml doc/libxml2-refs.xml: regenerated the API * doc/symbols.xml: listing new entry points for 2.9.0 * doc/devhelp/*: regenerated
*	Adding various checks on node type though the API	Daniel Veillard	2012-08-09	1	-24/+56
\| \| \| \| \|	Specifially checking against namespace nodes before accessing node pointers
*	Namespace nodes can't be unlinked with xmlUnlinkNode	Daniel Veillard	2012-08-08	1	-0/+4
\|
*	Avoid using xmlBuffer for serialization	Daniel Veillard	2012-08-07	1	-53/+50
\| \| \| \| \|	Mostly an optimization to avoid xmlBuffer->xmlBuf conversions and use the new code.
*	Provide new xmlBuf based saving functions	Daniel Veillard	2012-07-23	1	-9/+40
\| \| \| \| \| \| \| \|	* include/libxml/tree.h: adds xmlBufGetNodeContent and xmlBufNodeDump as xmlBuf based equivalents of xmlNodeGetContent and xmlNodeDump * tree.c: implements one new routine and converts xmlNodeBufGetContent to use the xmlBuf equivalent. It should behave better as a result in case of data larger than 2GB.
*	Fix various bugs in new code raised by the API checking	Daniel Veillard	2012-05-15	1	-2/+6
\| \| \| \| \| \|	* testapi.c: regenerated and covering new APIs * tree.c: xmlBufferDetach can't work on immutable buffers * xzlib.c: fix a deallocation error
*	Fix various problems with "make dist"	Daniel Veillard	2012-05-15	1	-2/+4
\| \| \| \| \| \| \|	* tree.c: missing documentation for xmlBufferDetach * doc/symbols.xml: add two new symbols xmlTextReaderRelaxNGValidateCtxt and xmlBufferDetach * doc/apibuild.py: ignore internal header xzlib.h
*	Use a hybrid allocation scheme in xmlNodeSetContent	Conrad Irwin	2012-05-14	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard@redhat.com> wrote: > Hi Conrad, > > that's interesting ! I was initially afraid of a sudden explosion of > memory allocations for building a tree since by default buffers tend to > "waste" memory by using doubling allocations, but that's not the case. > xmllint --noout doc/libxml2-api.xml > when compiled with memory debug produce > > paphio:~/XML -> cat .memdump > MEMORY ALLOCATED : 0, MAX was 12756699 > > and without your patch 12755657, i.e. the increase is minimal. Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the same amount of space as XML_ALLOC_DOUBLEIT on this example (see below). So it turns out that the default realloc() on my system actually handles this case really well — and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the underlying realloc() after all (sorry for misleading you). If you replace the realloc() with a bad one (like valgrind's), then the performance degrades severely. This patch implements a HYBRID allocator which has the behaviour you describe (it's like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT after 4kb) — that gets the memory back down to 12755657, with no noticeable impact on the performance of the synthetic pathological example under valgrind. In summary: max_memory on ./xmllint --noout doc/libxml2-api.xml, valgrind time on https://gist.github.com/2656940 max_memory valgrind time before \| 12755657 \| 29:18.2 EXACT \| 12756699 \| 2:58.6 <-- this is the state after the first patch. DOUBLEIT \| 12756727 \| 0:02.7 HYBRID \| 12755754 \| 0:02.7 <-- this is the state with both patches. > > There is also the cost of creating the buffers all the time. > I need to read the code and check but I may be interested in an hybrid > approach where we switch to buffer only when the text node starts to > become too big (4k would remove nearly all usuall types of "document" > usage, i.e. not blocks of data) I tried to avoid too much buffer creation by introducing the xmlBufferDetach function, which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack" in API terms though I thought the gains would be worth it. Conrad ------8<------ To keep memory usage tight in normal conditions it's desirable to only allocate as much space as is needed. Unfortunately this can lead to problems when constructing a long string out of small chunks, because every chunk you add will need to resize the buffer. To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big) from using exact allocations to doubling buffer size every time it is full. This limits the number of buffer resizes to O(log n) (down from O(n)), and thus greatly increases the performance of constructing very large strings in this manner.
*	Use buffers when constructing string node lists.	Conrad Irwin	2012-05-14	1	-81/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hi Veillard and all, Firstly, thanks for libxml: it's awesome! I noticed recently that libxml was taking a surprisingly long time to perform some operations (many minutes instead of milliseconds), and so I did some digging. It turns out that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can be called many (many) times when assigning some content into a node. For background, I'm dealing with XML that contains emails, these can have large attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends with . This means that xmlNodeAddContentLen() is being called about 200,000 times, and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic example of this at https://gist.github.com/2656940) The attached patch works around that problem by using the existing buffer API to merge the strings together before even creating the text node, this keeps the number of realloc()s at a managable level. I'd love feedback on the patch, and am happy to fix problems with it, or explore other solutions if you think that this is barking up the wrong tree :). Thanks, Conrad P.S. Should I create a bug for this too? ------8<------ Before this change xmlStringGetNodeList would perform a realloc() of the entire new content for every XML entity in the assigned text in order to merge together adjacent text nodes. This had the effect of making xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on inputs that contained a large number of XML entities. After this change the memory management is done by the buffer API, avoiding the need to continually re-measure and realloc() the string. For my test data (6MB of 80 character lines, each ending with ) this takes the time to xmlSetNodeContent from about 500 seconds to around 50ms. I have not profiled smaller cases, though I tried to minimize the performance impact of my change by avoiding unnecessary string copying. Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>