diff options
author | gbrandl <devnull@localhost> | 2007-05-18 15:12:44 +0200 |
---|---|---|
committer | gbrandl <devnull@localhost> | 2007-05-18 15:12:44 +0200 |
commit | fd856078ccab574327f688f9b63fff8428fa2752 (patch) | |
tree | 87fb2c1d8b2a642fd750975ac26d43c4b5273f8f /docs/src | |
parent | 01eb32d9a8332a9f5699c1d6c83006ca098e6f25 (diff) | |
download | pygments-fd856078ccab574327f688f9b63fff8428fa2752.tar.gz |
[svn] Elaborate the Unicode docs a bit.
Diffstat (limited to 'docs/src')
-rw-r--r-- | docs/src/unicode.txt | 33 |
1 files changed, 24 insertions, 9 deletions
diff --git a/docs/src/unicode.txt b/docs/src/unicode.txt index 07d32b03..53ff8494 100644 --- a/docs/src/unicode.txt +++ b/docs/src/unicode.txt @@ -1,13 +1,14 @@ -=============== -Unicode Support -=============== +===================== +Unicode and Encodings +===================== -Since Pygments 0.6, the lexers use unicode strings internally. Because of that -you might discover the occasional `UnicodeDecodeError` if you pass strings with the +Since Pygments 0.6, all lexers use unicode strings internally. Because of that +you might encounter the occasional `UnicodeDecodeError` if you pass strings with the wrong encoding. -Per default all lexers have `encoding` set to `latin1`. If you pass a lexer a -string object (not unicode) it tries to decode the data using this encoding. +Per default all lexers have their input encoding set to `latin1`. +If you pass a lexer a string object (not unicode), it tries to decode the data +using this encoding. You can override the encoding using the `encoding` lexer option. If you have the `chardet`_ library installed and set the encoding to ``chardet`` if will ananlyse the text and fetch the best encoding automatically: @@ -20,12 +21,26 @@ the text and fetch the best encoding automatically: The best way is to pass Pygments unicode objects. In that case you can't get unexpected output. -The formatters now send unicode objects to the stream if you don't set the -encoding. You can do so by passing the formatters an `encoding` option: +The formatters now send Unicode objects to the stream if you don't set the +output encoding. You can do so by passing the formatters an `encoding` option: .. sourcecode:: python from pygments.formatters import HtmlFormatter f = HtmlFormatter(encoding='utf-8') +**You will have to set this option if you have non-ASCII characters in the +source and the output stream does not accept Unicode written to it!** +This is the case for all regular files and for terminals. + +Note: The Terminal formatter tries to be smart: if its output stream has an +`encoding` attribute, it will encode any Unicode string with this encoding +before writing it. This is the case for `sys.stdout`, for example. The other +formatters don't have that behavior. + +*New in Pygments 0.7*: the formatters now also accept an `outencoding` option +which will override the `encoding` option if given. This makes it possible to +use a single options dict with lexers and formatters, and still have different +input and output encodings. + .. _chardet: http://chardet.feedparser.org/ |