Merged in big encoding-detection refactoring branch.

author: Leonard Richardson <leonard.richardson@canonical.com> 2013-06-02 22:19:37 -0400
committer: Leonard Richardson <leonard.richardson@canonical.com> 2013-06-02 22:19:37 -0400
commit: b42a4ece63de739ad7a37973a4e10af23346ffd1 (patch)
tree: a65794b5422a1e12a8ddf943c9afd0e0f798f6c4 /doc
parent: b8b0711b903509e4b88e878fb6ca3731738ca99e (diff)
parent: 847a8e08e21de9036783feeecd8de93b112f3868 (diff)
download: beautifulsoup4-b42a4ece63de739ad7a37973a4e10af23346ffd1.tar.gz
1 files changed, 5 insertions, 13 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index a91854c..1b38df7 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -2478,9 +2478,11 @@ become Unicode::
  dammit.original_encoding
  # 'utf-8'
 
-The more data you give Unicode, Dammit, the more accurately it will
-guess. If you have your own suspicions as to what the encoding might
-be, you can pass them in as a list::
+Unicode, Dammit's guesses will get a lot more accurate if you install
+the ``chardet`` or ``cchardet`` Python libraries. The more data you
+give Unicode, Dammit, the more accurately it will guess. If you have
+your own suspicions as to what the encoding might be, you can pass
+them in as a list::
 
  dammit = UnicodeDammit("Sacr\xe9 bleu!", ["latin-1", "iso-8859-1"])
  print(dammit.unicode_markup)
@@ -2823,16 +2825,6 @@ significantly faster using lxml than using html.parser or html5lib.
 You can speed up encoding detection significantly by installing the
 `cchardet <http://pypi.python.org/pypi/cchardet/>`_ library.
 
-Sometimes `Unicode, Dammit`_ can only detect the encoding of a file by
-doing a byte-by-byte examination of the file. This slows Beautiful
-Soup to a crawl. My tests indicate that this only happened on 2.x
-versions of Python, and that it happened most often with documents
-using Russian or Chinese encodings. If this is happening to you, you
-can fix it by installing cchardet, or by using Python 3 for your
-script. If you happen to know a document's encoding, you can pass
-it into the ``BeautifulSoup`` constructor as ``from_encoding``, and
-bypass encoding detection altogether.
-
 `Parsing only part of a document`_ won't save you much time parsing
 the document, but it can save a lot of memory, and it'll make
 `searching` the document much faster.
author	Leonard Richardson <leonard.richardson@canonical.com>	2013-06-02 22:19:37 -0400
committer	Leonard Richardson <leonard.richardson@canonical.com>	2013-06-02 22:19:37 -0400
commit	b42a4ece63de739ad7a37973a4e10af23346ffd1 (patch)
tree	a65794b5422a1e12a8ddf943c9afd0e0f798f6c4 /doc
parent	b8b0711b903509e4b88e878fb6ca3731738ca99e (diff)
parent	847a8e08e21de9036783feeecd8de93b112f3868 (diff)
download	beautifulsoup4-b42a4ece63de739ad7a37973a4e10af23346ffd1.tar.gz