summaryrefslogtreecommitdiff
path: root/doc/docs/unicode.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/docs/unicode.rst')
-rw-r--r--doc/docs/unicode.rst58
1 files changed, 0 insertions, 58 deletions
diff --git a/doc/docs/unicode.rst b/doc/docs/unicode.rst
deleted file mode 100644
index dca91116..00000000
--- a/doc/docs/unicode.rst
+++ /dev/null
@@ -1,58 +0,0 @@
-=====================
-Unicode and Encodings
-=====================
-
-Since Pygments 0.6, all lexers use unicode strings internally. Because of that
-you might encounter the occasional :exc:`UnicodeDecodeError` if you pass strings
-with the wrong encoding.
-
-Per default all lexers have their input encoding set to `guess`. This means
-that the following encodings are tried:
-
-* UTF-8 (including BOM handling)
-* The locale encoding (i.e. the result of `locale.getpreferredencoding()`)
-* As a last resort, `latin1`
-
-If you pass a lexer a byte string object (not unicode), it tries to decode the
-data using this encoding.
-
-You can override the encoding using the `encoding` or `inencoding` lexer
-options. If you have the `chardet`_ library installed and set the encoding to
-``chardet`` if will analyse the text and use the encoding it thinks is the
-right one automatically:
-
-.. sourcecode:: python
-
- from pygments.lexers import PythonLexer
- lexer = PythonLexer(encoding='chardet')
-
-The best way is to pass Pygments unicode objects. In that case you can't get
-unexpected output.
-
-The formatters now send Unicode objects to the stream if you don't set the
-output encoding. You can do so by passing the formatters an `encoding` option:
-
-.. sourcecode:: python
-
- from pygments.formatters import HtmlFormatter
- f = HtmlFormatter(encoding='utf-8')
-
-**You will have to set this option if you have non-ASCII characters in the
-source and the output stream does not accept Unicode written to it!**
-This is the case for all regular files and for terminals.
-
-Note: The Terminal formatter tries to be smart: if its output stream has an
-`encoding` attribute, and you haven't set the option, it will encode any
-Unicode string with this encoding before writing it. This is the case for
-`sys.stdout`, for example. The other formatters don't have that behavior.
-
-Another note: If you call Pygments via the command line (`pygmentize`),
-encoding is handled differently, see :doc:`the command line docs <cmdline>`.
-
-.. versionadded:: 0.7
- The formatters now also accept an `outencoding` option which will override
- the `encoding` option if given. This makes it possible to use a single
- options dict with lexers and formatters, and still have different input and
- output encodings.
-
-.. _chardet: https://chardet.github.io/