diff options
author | Leonard Richardson <leonardr@segfault.org> | 2014-12-08 22:02:34 -0500 |
---|---|---|
committer | Leonard Richardson <leonardr@segfault.org> | 2014-12-08 22:02:34 -0500 |
commit | e2b76538144efabd5e1c7958716bb1fb3451d399 (patch) | |
tree | 261474092526ad4a0ab39af6941f7a01a65ebebe /doc | |
parent | 2233b0f9590642e054f4c1f2da556e1f5e4d1635 (diff) | |
download | beautifulsoup4-e2b76538144efabd5e1c7958716bb1fb3451d399.tar.gz |
Rephrased the 'you need a parser' section to cover today's more common BS3 porting environments. [bug=1370364]
Diffstat (limited to 'doc')
-rw-r--r-- | doc/source/index.rst | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst index 775c3e1..5d067ea 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -2899,12 +2899,12 @@ deprecated and removed in Python 3.0. Beautiful Soup 4 uses ``html.parser`` by default, but you can plug in lxml or html5lib and use that instead. See `Installing a parser`_ for a comparison. -Since ``html.parser`` is not the same parser as ``SGMLParser``, it -will treat invalid markup differently. Usually the "difference" is -that ``html.parser`` crashes. In that case, you'll need to install -another parser. But sometimes ``html.parser`` just creates a different -parse tree than ``SGMLParser`` would. If this happens, you may need to -update your BS3 scraping code to deal with the new tree. +Since ``html.parser`` is not the same parser as ``SGMLParser``, you +may find that Beautiful Soup 4 gives you a different parse tree than +Beautiful Soup 3 for the same markup. If you swap out ``html.parser`` +for lxml or html5lib, you may find that the parse tree changes yet +again. If this happens, you'll need to update your scraping code to +deal with the new tree. Method names ^^^^^^^^^^^^ |