summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorLeonard Richardson <leonardr@segfault.org>2015-06-28 15:39:36 -0400
committerLeonard Richardson <leonardr@segfault.org>2015-06-28 15:39:36 -0400
commitd72ccb86dd2ff3dac56beb8fedcfaab7f804ce3a (patch)
tree367d6bad8b8690490d546e0581817c73bb7569fd /doc
parentc6592463794bf0f03f28f4d52ebcbaff5ecd9741 (diff)
downloadbeautifulsoup4-d72ccb86dd2ff3dac56beb8fedcfaab7f804ce3a.tar.gz
Changed the way soup objects work under copy.copy(). Copying a
NavigableString or a Tag will give you a new NavigableString that's equal to the old one but not connected to the parse tree. Patch by Martijn Peters. [bug=1307490]
Diffstat (limited to 'doc')
-rw-r--r--doc/source/index.rst62
1 files changed, 59 insertions, 3 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index f6d3e38..81659ed 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -1787,7 +1787,6 @@ attributes, and delete attributes::
tag
# <blockquote>Extremely bold</blockquote>
-
Modifying ``.string``
---------------------
@@ -2419,8 +2418,9 @@ as ``exclude_encodings``::
soup.original_encoding
'WINDOWS-1255'
-(This isn't 100% correct, but Windows-1255 is a compatible superset of
-ISO-8859-8, so it's close enough.)
+Windows-1255 isn't 100% correct, but that encoding is a compatible
+superset of ISO-8859-8, so it's close enough. (``exclude_encodings``
+is a new feature in Beautiful Soup 4.4.0.)
In rare cases (usually when a UTF-8 document contains text written in
a completely different encoding), the only way to get Unicode may be
@@ -2609,6 +2609,62 @@ document is Windows-1252, and the document will come out looking like
``UnicodeDammit.detwingle()`` is new in Beautiful Soup 4.1.0.
+
+Comparing objects for equality
+==============================
+
+Beautiful Soup says that two ``NavigableString`` or ``Tag`` objects
+are equal when they represent the same HTML or XML markup. In this
+example, the two <b> tags are treated as equal, even though they live
+in different parts of the object tree, because they both look like
+"<b>pizza</b>"::
+
+ markup = "<p>I want <b>pizza</b> and more <b>pizza</b>!</p>"
+ soup = BeautifulSoup(markup, 'html.parser')
+ first_b, second_b = soup.find_all('b')
+ print first_b == second_b
+ # True
+
+ print first_b.previous_element == second_b.previous_element
+ # False
+
+If you want to see whether two variables refer to exactly the same
+object, use `is`::
+
+ print first_b is second_b
+ # False
+
+Copying Beautiful Soup objects
+==============================
+
+You can use ``copy.copy()`` to create a copy of any ``Tag`` or
+``NavigableString``::
+
+ import copy
+ p_copy = copy.copy(soup.p)
+ print p_copy
+ # <p>I want <b>pizza</b> and more <b>pizza</b>!</p>
+
+The copy is considered equal to the original, since it represents the
+same markup as the original, but it's not the same object::
+
+ print soup.p == p_copy
+ # True
+
+ print soup.p is p_copy
+ # False
+
+The only real difference is that the copy is completely detached from
+the original Beautiful Soup object tree, just as if ``extract()`` had
+been called on it::
+
+ print p_copy.parent
+ # None
+
+This is because two different ``Tag`` objects can't occupy the same
+space at the same time.
+
+
Parsing only part of a document
===============================