diff options
author | Georg Brandl <georg@python.org> | 2014-11-06 13:18:32 +0100 |
---|---|---|
committer | Georg Brandl <georg@python.org> | 2014-11-06 13:18:32 +0100 |
commit | 8c0814068d229cfbf67f9e3a070bcdaa089c7ffa (patch) | |
tree | 3297ab209f67532ff71c9e8b82b6edd1f8b984a6 /doc/docs/unicode.rst | |
parent | 69e83eb0856666d2594c96b1e8fae42dbeb92318 (diff) | |
download | pygments-8c0814068d229cfbf67f9e3a070bcdaa089c7ffa.tar.gz |
Update docs w.r.t. encodings.
Diffstat (limited to 'doc/docs/unicode.rst')
-rw-r--r-- | doc/docs/unicode.rst | 20 |
1 files changed, 14 insertions, 6 deletions
diff --git a/doc/docs/unicode.rst b/doc/docs/unicode.rst index e79b4bec..7291a3b2 100644 --- a/doc/docs/unicode.rst +++ b/doc/docs/unicode.rst @@ -6,12 +6,20 @@ Since Pygments 0.6, all lexers use unicode strings internally. Because of that you might encounter the occasional :exc:`UnicodeDecodeError` if you pass strings with the wrong encoding. -Per default all lexers have their input encoding set to `latin1`. -If you pass a lexer a string object (not unicode), it tries to decode the data -using this encoding. -You can override the encoding using the `encoding` lexer option. If you have the -`chardet`_ library installed and set the encoding to ``chardet`` if will ananlyse -the text and use the encoding it thinks is the right one automatically: +Per default all lexers have their input encoding set to `guess`. This means +that the following encodings are tried: + +* UTF-8 (including BOM handling) +* The locale encoding (i.e. the result of `locale.getpreferredencoding()`) +* As a last resort, `latin1` + +If you pass a lexer a byte string object (not unicode), it tries to decode the +data using this encoding. + +You can override the encoding using the `encoding` or `inencoding` lexer +options. If you have the `chardet`_ library installed and set the encoding to +``chardet`` if will ananlyse the text and use the encoding it thinks is the +right one automatically: .. sourcecode:: python |