summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorStefan Behnel <stefan_ml@behnel.de>2019-08-13 19:49:54 +0200
committerStefan Behnel <stefan_ml@behnel.de>2019-08-13 19:49:54 +0200
commit2f64a0c52ff57c6116be436ddf7953895c344399 (patch)
tree02b6e7f658527d4cf078ccdffb70501ec644c679
parent1781e48f8e51bb3eba8e31c3d7fbc47b4acfae26 (diff)
downloadpython-lxml-2f64a0c52ff57c6116be436ddf7953895c344399.tar.gz
Clarify the usage of "element.clear(keep_tail=True)" in some examples.
-rw-r--r--CHANGES.txt6
-rw-r--r--doc/parsing.txt6
-rw-r--r--doc/tutorial.txt9
3 files changed, 12 insertions, 9 deletions
diff --git a/CHANGES.txt b/CHANGES.txt
index dc9f33ad..f157b6ea 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -20,9 +20,9 @@ Bugs fixed
Features added
--------------
-* ``Element.clear()`` accepts a new keyword argument ``keep_tail=True`` to
- clear everything but the tail text. This is helpful in some document-style
- use cases.
+* ``Element.clear()`` accepts a new keyword argument ``keep_tail=True`` to clear
+ everything but the tail text. This is helpful in some document-style use cases
+ and for clearing the current element in ``iterparse()`` and pull parsing.
* When creating attributes or namespaces from a dict in Python 3.6+, lxml now
preserves the original insertion order of that dict, instead of always sorting
diff --git a/doc/parsing.txt b/doc/parsing.txt
index a9664d67..a271dc03 100644
--- a/doc/parsing.txt
+++ b/doc/parsing.txt
@@ -654,14 +654,14 @@ that are no longer needed:
>>> parser.feed('<element><child /></element>')
>>> for action, elem in events:
... print('%s: %d' % (elem.tag, len(elem))) # processing
- ... elem.clear() # delete children
+ ... elem.clear(keep_tail=True) # delete children
element: 0
child: 0
element: 1
>>> parser.feed('<empty-element xmlns="http://testns/" /></root>')
>>> for action, elem in events:
... print('%s: %d' % (elem.tag, len(elem))) # processing
- ... elem.clear() # delete children
+ ... elem.clear(keep_tail=True) # delete children
{http://testns/}empty-element: 0
root: 3
@@ -688,7 +688,7 @@ of the current element:
>>> for event, element in parser.read_events():
... # ... do something with the element
- ... element.clear() # clean up children
+ ... element.clear(keep_tail=True) # clean up children
... while element.getprevious() is not None:
... del element.getparent()[0] # clean up preceding siblings
diff --git a/doc/tutorial.txt b/doc/tutorial.txt
index 18c4e97c..b98d3b4f 100644
--- a/doc/tutorial.txt
+++ b/doc/tutorial.txt
@@ -1004,7 +1004,10 @@ that the Element has been parsed completely.
It also allows you to ``.clear()`` or modify the content of an Element to
save memory. So if you parse a large tree and you want to keep memory
usage small, you should clean up parts of the tree that you no longer
-need:
+need. The ``keep_tail=True`` argument to ``.clear()`` makes sure that
+(tail) text content that follows the current element will not be touched.
+It is highly discouraged to modify any content that the parser may not
+have completely read through yet.
.. sourcecode:: pycon
@@ -1016,7 +1019,7 @@ need:
... print(element.text)
... elif element.tag == 'a':
... print("** cleaning up the subtree")
- ... element.clear()
+ ... element.clear(keep_tail=True)
data
** cleaning up the subtree
None
@@ -1041,7 +1044,7 @@ for data extraction.
>>> for _, element in etree.iterparse(xml_file, tag='a'):
... print('%s -- %s' % (element.findtext('b'), element[1].text))
- ... element.clear()
+ ... element.clear(keep_tail=True)
ABC -- abc
MORE DATA -- more data
XYZ -- xyz