summaryrefslogtreecommitdiff
path: root/doc/source/index.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/source/index.rst')
-rw-r--r--doc/source/index.rst62
1 files changed, 59 insertions, 3 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index f6d3e38..81659ed 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -1787,7 +1787,6 @@ attributes, and delete attributes::
tag
# <blockquote>Extremely bold</blockquote>
-
Modifying ``.string``
---------------------
@@ -2419,8 +2418,9 @@ as ``exclude_encodings``::
soup.original_encoding
'WINDOWS-1255'
-(This isn't 100% correct, but Windows-1255 is a compatible superset of
-ISO-8859-8, so it's close enough.)
+Windows-1255 isn't 100% correct, but that encoding is a compatible
+superset of ISO-8859-8, so it's close enough. (``exclude_encodings``
+is a new feature in Beautiful Soup 4.4.0.)
In rare cases (usually when a UTF-8 document contains text written in
a completely different encoding), the only way to get Unicode may be
@@ -2609,6 +2609,62 @@ document is Windows-1252, and the document will come out looking like
``UnicodeDammit.detwingle()`` is new in Beautiful Soup 4.1.0.
+
+Comparing objects for equality
+==============================
+
+Beautiful Soup says that two ``NavigableString`` or ``Tag`` objects
+are equal when they represent the same HTML or XML markup. In this
+example, the two <b> tags are treated as equal, even though they live
+in different parts of the object tree, because they both look like
+"<b>pizza</b>"::
+
+ markup = "<p>I want <b>pizza</b> and more <b>pizza</b>!</p>"
+ soup = BeautifulSoup(markup, 'html.parser')
+ first_b, second_b = soup.find_all('b')
+ print first_b == second_b
+ # True
+
+ print first_b.previous_element == second_b.previous_element
+ # False
+
+If you want to see whether two variables refer to exactly the same
+object, use `is`::
+
+ print first_b is second_b
+ # False
+
+Copying Beautiful Soup objects
+==============================
+
+You can use ``copy.copy()`` to create a copy of any ``Tag`` or
+``NavigableString``::
+
+ import copy
+ p_copy = copy.copy(soup.p)
+ print p_copy
+ # <p>I want <b>pizza</b> and more <b>pizza</b>!</p>
+
+The copy is considered equal to the original, since it represents the
+same markup as the original, but it's not the same object::
+
+ print soup.p == p_copy
+ # True
+
+ print soup.p is p_copy
+ # False
+
+The only real difference is that the copy is completely detached from
+the original Beautiful Soup object tree, just as if ``extract()`` had
+been called on it::
+
+ print p_copy.parent
+ # None
+
+This is because two different ``Tag`` objects can't occupy the same
+space at the same time.
+
+
Parsing only part of a document
===============================