Added an exclude_encodings argument to UnicodeDammit and to the

Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408]
author: Leonard Richardson <leonardr@segfault.org> 2015-06-27 09:55:40 -0400
committer: Leonard Richardson <leonardr@segfault.org> 2015-06-27 09:55:40 -0400
commit: 017e4526af39ab75286ebfd2d64db25da116f27b (patch)
tree: 92441998f88babb05bb4a3d86949eeb9c4fd4985 /doc
parent: 800d1971dcbdc6316a013a4c6ce86e8c18d48dca (diff)
download: beautifulsoup4-017e4526af39ab75286ebfd2d64db25da116f27b.tar.gz
1 files changed, 13 insertions, 0 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index 1b7b1e6..821dad4 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -2397,6 +2397,19 @@ We can fix this by passing in the correct ``from_encoding``::
  soup.original_encoding
  'iso8859-8'
 
+If you don't know what the correct encoding is, but you know that
+Unicode, Dammit is guessing wrong, you can pass the wrong guesses in
+as ``exclude_encodings``::
+
+ soup = BeautifulSoup(markup, exclude_encodings=["ISO-8859-7"])
+ soup.h1
+ <h1>םולש</h1>
+ soup.original_encoding
+ 'WINDOWS-1255'
+
+(This isn't 100% correct, but Windows-1255 is a compatible superset of
+ISO-8859-8, so it's close enough.)
+
 In rare cases (usually when a UTF-8 document contains text written in
 a completely different encoding), the only way to get Unicode may be
 to replace some characters with the special Unicode character
author	Leonard Richardson <leonardr@segfault.org>	2015-06-27 09:55:40 -0400
committer	Leonard Richardson <leonardr@segfault.org>	2015-06-27 09:55:40 -0400
commit	017e4526af39ab75286ebfd2d64db25da116f27b (patch)
tree	92441998f88babb05bb4a3d86949eeb9c4fd4985 /doc
parent	800d1971dcbdc6316a013a4c6ce86e8c18d48dca (diff)
download	beautifulsoup4-017e4526af39ab75286ebfd2d64db25da116f27b.tar.gz