[svn] Name change, round 4 (rename SVN root folder).

author: gbrandl <devnull@localhost> 2006-10-19 20:27:28 +0200
committer: gbrandl <devnull@localhost> 2006-10-19 20:27:28 +0200
commit: f4d019954468db777760d21f9243eca8b852c184 (patch)
tree: 328b8f8fac25338306b0e7b827686dcc7597df23 /docs/src
download: pygments-f4d019954468db777760d21f9243eca8b852c184.tar.gz
12 files changed, 2326 insertions, 0 deletions
diff --git a/docs/src/api.txt b/docs/src/api.txt
new file mode 100644
index 00000000..880687f1
--- /dev/null
+++ b/docs/src/api.txt
@@ -0,0 +1,183 @@
+.. -*- mode: rst -*-
+
+====================
+The full Pygments API
+====================
+
+This page describes the Pygments API.
+
+High-level API
+==============
+
+Functions from the `pygments` module:
+
+def `lex(code, lexer):`
+    Lex `code` with the `lexer` (must be a `Lexer` instance)
+    and return an iterable of tokens. Currently, this only calls
+    `lexer.get_tokens()`.
+
+def `format(tokens, formatter, outfile=None):`
+    Format a token stream (iterable of tokens) `tokens` with the
+    `formatter` (must be a `Formatter` instance). The result is
+    written to `outfile`, or if that is ``None``, returned as a
+    string.
+
+def `highlight(code, lexer, formatter, outfile=None):`
+    This is the most high-level highlighting function.
+    It combines `lex` and `format` in one function.
+
+
+Functions from `pygments.lexers`:
+
+def `get_lexer_by_name(alias, **options):`
+    Return an instance of a `Lexer` subclass that has `alias` in its
+    aliases list. The lexer is given the `options` at its
+    instantiation.
+
+    Will raise `ValueError` if no lexer with that alias is found.
+
+def `get_lexer_for_filename(fn, **options):`
+    Return a `Lexer` subclass instance that has a filename pattern
+    matching `fn`. The lexer is given the `options` at its
+    instantiation.
+
+    Will raise `ValueError` if no lexer for that filename is found.
+
+
+Functions from `pygments.formatters`:
+
+def `get_formatter_by_name(alias, **options):`
+    Return an instance of a `Formatter` subclass that has `alias` in its
+    aliases list. The formatter is given the `options` at its
+    instantiation.
+
+    Will raise `ValueError` if no formatter with that alias is found.
+
+def `get_formatter_for_filename(fn, **options):`
+    Return a `Formatter` subclass instance that has a filename pattern
+    matching `fn`. The formatter is given the `options` at its
+    instantiation.
+
+    Will raise `ValueError` if no formatter for that filename is found.
+
+
+Lexers
+======
+
+A lexer (derived from `pygments.lexer.Lexer`) has the following functions:
+
+def `__init__(self, **options):`
+    The constructor. Takes a \*\*keywords dictionary of options.
+    Every subclass must first process its own options and then call
+    the `Lexer` constructor, since it processes the `stripnl`,
+    `stripall` and `tabsize` options.
+
+    An example looks like this:
+
+    .. sourcecode:: python
+
+        def __init__(self, **options):
+            self.compress = options.get('compress', '')
+            Lexer.__init__(self, **options)
+
+    As these options must all be specifiable as strings (due to the
+    command line usage), there are various utility functions
+    available to help with that, see `Option processing`_.
+
+def `get_tokens(self, text):`
+    This method is the basic interface of a lexer. It is called by
+    the `highlight()` function. It must process the text and return an
+    iterable of ``(tokentype, value)`` pairs from `text`.
+
+    Normally, you don't need to override this method. The default
+    implementation processes the `stripnl`, `stripall` and `tabsize`
+    options and then yields all tokens from `get_tokens_unprocessed()`,
+    with the ``index`` dropped.
+
+def `get_tokens_unprocessed(self, text):`
+    This method should process the text and return an iterable of
+    ``(index, tokentype, value)`` tuples where ``index`` is the starting
+    position of the token within the input text.
+
+    This method must be overridden by subclasses.
+
+For a list of known tokens have a look at the `Tokens`_ page.
+
+The lexer also recognizes the following attributes that are used by the
+builtin lookup mechanism.
+
+`name`
+    Full name for the lexer, in human-readable form.
+
+`aliases`
+    A list of short, unique identifiers that can be used to lookup
+    the lexer from a list.
+
+`filenames`
+    A list of `fnmatch` patterns that can be used to find a lexer for
+    a given filename.
+
+
+.. _Tokens: tokens.txt
+
+
+Formatters
+==========
+
+A formatter (derived from `pygments.formatter.Formatter`) has the following
+functions:
+
+def `__init__(self, **options):`
+    As with lexers, this constructor processes options and then must call
+    the base class `__init__`.
+
+    The `Formatter` class recognizes the options `style`, `full` and
+    `title`. It is up to the formatter class whether it uses them.
+
+def `get_style_defs(self, arg=''):`
+    This method must return statements or declarations suitable to define
+    the current style for subsequent highlighted text (e.g. CSS classes
+    in the `HTMLFormatter`).
+
+    The optional argument `arg` can be used to modify the generation and
+    is formatter dependent (it is standardized because it can be given on
+    the command line).
+
+    This method is called by the ``-S`` `command-line option`_, the `arg`
+    is then given by the ``-a`` option.
+
+def `format(self, tokensource, outfile):`
+    This method must format the tokens from the `tokensource` iterable and
+    write the formatted version to the file object `outfile`.
+
+    Formatter options can control how exactly the tokens are converted.
+
+.. _command-line option: cmdline.txt
+
+
+Option processing
+=================
+
+The `pygments.util` module has some utility functions usable for option
+processing:
+
+class `OptionError`
+    This exception will be raised by all option processing functions if
+    the type of the argument is not correct.
+
+def `get_bool_opt(options, optname, default=None):`
+    Interpret the key `optname` from the dictionary `options`
+    as a boolean and return it. Return `default` if `optname`
+    is not in `options`.
+
+    The valid string values for ``True`` are ``1``, ``yes``,
+    ``true`` and ``on``, the ones for ``False`` are ``0``,
+    ``no``, ``false`` and ``off`` (matched case-insensitively).
+
+def `get_int_opt(options, optname, default=None):`
+    As `get_bool_opt`, but interpret the value as an integer.
+
+def `get_list_opt(options, optname, default=None):`
+    If the key `optname` from the dictionary `options` is a string,
+    split it at whitespace and return it. If it is already a list
+    or a tuple, it is returned as a list.
diff --git a/docs/src/cmdline.txt b/docs/src/cmdline.txt
new file mode 100644
index 00000000..461ecb32
--- /dev/null
+++ b/docs/src/cmdline.txt
@@ -0,0 +1,58 @@
+.. -*- mode: rst -*-
+
+======================
+Command Line Interface
+======================
+
+You can use Pygments from the shell, provided you installed the `pygmentize` script::
+
+    $ pygmentize test.py
+    print "Hello World"
+
+will print the file test.py to standard output, using the Python lexer
+(inferred from the file name extension) and the terminal formatter (because
+you didn't give an explicit formatter name).
+
+If you want HTML output::
+
+    $ pygmentize -f html -l python -o test.html test.py
+
+As you can see, the -l option explicitly selects a lexer. As seen above, if you
+give an input file name and it has an extension that Pygments recognizes, you can
+omit this option.
+
+The ``-o`` option gives an output file name. If it is not given, output is
+written to stdout.
+
+The ``-f`` option selects a formatter (as with ``-l``, it can also be omitted
+if an output file name is given and has a supported extension).
+If no output file name is given and ``-f`` is omitted, the
+`TerminalFormatter` is used.
+
+The above command could therefore also be given as::
+
+    $ pygmentize -o test.html test.py
+
+Lexer and formatter options can be given using the ``-O`` option::
+
+    $ pygmentize -f html -O style=colorful,linenos=1 -l python test.py
+
+Be sure to enclose the option string in quotes if it contains any special
+shell characters, such as spaces or expansion wildcards like ``*``.
+
+There's a special ``-S`` option for generating style definitions. Usage is
+as follows::
+
+    $ pygmentize -f html -S colorful -a .syntax
+
+generates a CSS style sheet (because you selected the HTML formatter) for
+the "colorful" style prepending a ".syntax" selector to all style rules.
+
+For an explanation what ``-a`` means for `a particular formatter`_, look for
+the `arg` argument for the formatter's `get_style_defs()` method.
+
+The ``-L`` option lists all lexers and formatters, along with their short
+names and supported file name extensions.
+
+
+.. _a particular formatter: formatters.txt
diff --git a/docs/src/formatterdev.txt b/docs/src/formatterdev.txt
new file mode 100644
index 00000000..82208aa0
--- /dev/null
+++ b/docs/src/formatterdev.txt
@@ -0,0 +1,169 @@
+.. -*- mode: rst -*-
+
+========================
+Write your own formatter
+========================
+
+As well as creating `your own lexer <lexerdevelopment.txt>`_, writing a new
+formatter for Pygments is easy and straightforward.
+
+A formatter is a class that is initialized with some keyword arguments (the
+formatter options) and that must provides a `format()` method.
+Additionally a formatter should provide a `get_style_defs()` method that
+returns the style definitions from the style in a form usable for the
+formatter's output format.
+
+
+Quickstart
+==========
+
+The most basic formatter shipped with Pygments is the `NullFormatter`. It just
+sends the value of a token to the output stream:
+
+.. sourcecode:: python
+
+    from pygments.formatter import Formatter
+
+    class NullFormatter(Formatter):
+        def format(self, tokensource, outfile):
+            for ttype, value in tokensource:
+                outfile.write(value)
+
+As you can see, the `format()` method is passed two parameters: `tokensource`
+and `outfile`. The first is an iterable of ``(token_type, value)`` tuples,
+the latter a file like object with a `write()` method.
+
+Because the formatter is that basic it doesn't overwrite the `get_style_defs()`
+method.
+
+
+Styles
+======
+
+Styles aren't instantiated but their metaclass provides some class functions
+so that you can access the style definitions easily.
+
+Styles are iterable and yield tuples in the form ``(ttype, d)`` where `ttype`
+is a token and `d` is a dict with the following keys:
+
+``'color'``
+    Hexadecimal color value (eg: ``'ff0000'`` for red) or `None` if not
+    defined.
+
+``'bold'``
+    `True` if the value should be bold
+
+``'italic'``
+    `True` if the value should be italic
+
+``'underline'``
+    `True` if the value should be underlined
+
+``'bgcolor'``
+    Hexadecimal color value for the background (eg: ``'eeeeeee'`` for light
+    gray) or `None` if not defined.
+
+``'border'``
+    Hexadecimal color value for the border (eg: ``'0000aa'`` for a dark
+    blue) or `None` for no border.
+
+Additional keys might appear in the future, formatters should ignore all keys
+they don't support.
+
+
+HTML 3.2 Formatter
+==================
+
+For an more complex example, let's implement a HTML 3.2 Formatter. We don't
+use CSS but inline markup (``<u>``, ``<font>``, etc). Because this isn't good
+style this formatter isn't in the standard library ;-)
+
+.. sourcecode:: python
+
+    from pygments.formatter import Formatter
+
+    class OldHtmlFormatter(Formatter):
+
+        def __init__(self, **options):
+            Formatter.__init__(self, **options)
+
+            # create a dict of (start, end) tuples that wrap the
+            # value of a token so that we can use it in the format
+            # method later
+            self.styles = {}
+
+            # we iterate over the `_styles` attribute of a style item
+            # that contains the parsed style values.
+            for token, style in self.style:
+                start = end = ''
+                # a style item is a tuple in the following form:
+                # colors are readily specified in hex: 'RRGGBB'
+                if style['color']:
+                    start += '<font color="#%s">' % color
+                    end += '</font>'
+                if style['bold']:
+                    start += '<b>'
+                    end += '</b>'
+                if style['italic']:
+                    start += '<i>'
+                    end += '</i>'
+                if style['underline']:
+                    start += '<u>'
+                    end += '</u>'
+                self.styles[token] = (start, end)
+
+        def format(self, tokensource, outfile):
+            # lastval is a string we use for caching
+            # because it's possible that an lexer yields a number
+            # of consecutive tokens with the same token type.
+            # to minimize the size of the generated html markup we
+            # try to join the values of same-type tokens here
+            lastval = ''
+            lasttype = None
+
+            # wrap the whole output with <pre>
+            outfile.write('<pre>')
+
+            for ttype, value in tokensource:
+                # if the token type doesn't exist in the stylemap
+                # we try it with the parent of the token type
+                # eg: parent of Token.Literal.String.Double is
+                # Token.Literal.String
+                while ttype not in self.styles:
+                    ttype = ttype.parent
+                if ttype == lasttype:
+                    # the current token type is the same of the last
+                    # iteration. cache it
+                    lastval += value
+                else:
+                    # not the same token as last iteration, but we
+                    # have some data in the buffer. wrap it with the
+                    # defined style and write it to the output file
+                    if lastval:
+                        stylebegin, styleend = self.styles[lasttype]
+                        outfile.write(stylebegin + lastval + styleend)
+                    # set lastval/lasttype to current values
+                    lastval = value
+                    lasttype = ttype
+
+            # if something is left in the buffer, write it to the
+            # output file, then close the opened <pre> tag
+            if lastval:
+                stylebegin, styleend = self.styles[lasttype]
+                outfile.write(stylebegin + lastval + styleend)
+            outfile.write('</pre>\n')
+
+The comments should explain it. Again, this formatter doesn't override the
+`get_style_defs()` method. If we would have used CSS classes instead of
+inline HTML markup, we would need to generate the CSS first. For that
+purpose the `get_style_defs()` method exists:
+
+
+Generating Style Definitions
+============================
+
+Some formatters like the `LatexFormatter` and the `HtmlFormatter` don't
+output inline markup but reference either macros or css classes. Because
+the definitions of those are not part of the output, the `get_style_defs()`
+method exists. It is passed one parameter (if it's used and how it's used
+is up to the formatter) and has to return a string or ``None``.
diff --git a/docs/src/formatters.txt b/docs/src/formatters.txt
new file mode 100644
index 00000000..08b52cc7
--- /dev/null
+++ b/docs/src/formatters.txt
@@ -0,0 +1,249 @@
+.. -*- mode: rst -*-
+
+====================
+Available formatters
+====================
+
+This page lists all builtin formatters.
+
+Common options
+==============
+
+The `HtmlFormatter` and `LatexFormatter` classes support these options:
+
+`style`
+    The style to use, can be a string or a Style subclass (default:
+    ``'default'``).
+
+`full`
+    Tells the formatter to output a "full" document, i.e. a complete
+    self-contained document (default: ``False``).
+
+`title`
+    If `full` is true, the title that should be used to caption the
+    document (default: ``''``).
+
+`linenos`
+    If set to ``True``, output line numbers (default: ``False``).
+
+`linenostart`
+    The line number for the first line (default: ``1``).
+
+`linenostep`
+    If set to a number n > 1, only every nth line number is printed.
+
+
+Formatter classes
+=================
+
+All these classes are importable from `pygments.formatters`.
+
+
+`HtmlFormatter`
+---------------
+
+    Formats tokens as HTML 4 ``<span>`` tags within a ``<pre>`` tag, wrapped
+    in a ``<div>`` tag. The ``<div>``'s CSS class can be set by the `cssclass`
+    option.
+
+    If the `linenos` option is given and true, the ``<pre>`` is additionally
+    wrapped inside a ``<table>`` which has one row and two cells: one
+    containing the line numbers and one containing the code. Example:
+
+    .. sourcecode:: html
+
+        <div class="highlight" >
+        <table><tr>
+          <td class="linenos" title="click to toggle"
+            onclick="with (this.firstChild.style)
+                     { display = (display == '') ? 'none' : '' }">
+            <pre>1
+            2</pre>
+          </td>
+          <td class="code">
+            <pre><span class="Ke">def </span><span class="NaFu">foo</span>(bar):
+              <span class="Ke">pass</span>
+            </pre>
+          </td>
+        </tr></table></div>
+
+    (whitespace added to improve clarity). Wrapping can be disabled using the
+    `nowrap` option.
+
+    With the `full` option, a complete HTML 4 document is output, including
+    the style definitions inside a ``<style>`` tag.
+
+    The `get_style_defs(arg='')` method of a `HtmlFormatter` returns a string
+    containing CSS rules for the CSS classes used by the formatter. The
+    argument `arg` can be used to specify additional CSS selectors that
+    are prepended to the classes. A call `fmter.get_style_defs('td .code')`
+    would result in the following CSS classes:
+
+    .. sourcecode:: css
+
+        td .code .kw { font-weight: bold; color: #00FF00 }
+        td .code .cm { color: #999999 }
+        ...
+
+    Additional options accepted by the `HtmlFormatter`:
+
+    `nowrap`
+        If set to ``True``, don't wrap the tokens at all, not even in a ``<pre>``
+        tag. This disables all other options (default: ``False``).
+
+    `noclasses`
+        If set to true, token ``<span>`` tags will not use CSS classes, but
+        inline styles. This is not recommended for larger pieces of code since
+        it increases output size by quite a bit (default: ``False``).
+
+    `classprefix`
+        Since the token types use relatively short class names, they may clash
+        with some of your own class names. In this case you can use the
+        `classprefix` option to give a string to prepend to all Pygments-generated
+        CSS class names for token types.
+        Note that this option also affects the output of `get_style_defs()`.
+
+    `cssclass`
+        CSS class for the wrapping ``<div>`` tag (default: ``'highlight'``).
+
+    `cssstyles`
+        Inline CSS styles for the wrapping ``<div>`` tag (default: ``''``).
+
+    `linenospecial`
+        If set to a number n > 0, every nth line number is given the CSS
+        class ``"special"`` (default: ``0``).
+    
+    :Aliases: ``html``
+    :Filename patterns: ``*.html``, ``*.htm``
+    
+
+`LatexFormatter`
+----------------
+
+    Formats tokens as LaTeX code. This needs the `fancyvrb` and `color`
+    standard packages.
+
+    Without the `full` option, code is formatted as one ``Verbatim``
+    environment, like this:
+
+    .. sourcecode:: latex
+
+        \begin{Verbatim}[commandchars=@\[\]]
+        @Can[def ]@Cax[foo](bar):
+            @Can[pass]
+        \end{Verbatim}
+
+    The command sequences used here (``@Can`` etc.) are generated from the given
+    `style` and can be retrieved using the `get_style_defs` method.
+
+    With the `full` option, a complete LaTeX document is output, including
+    the command definitions in the preamble.
+
+    The `get_style_defs(arg='')` method of a `LatexFormatter` returns a string
+    containing ``\newcommand`` commands defining the commands used inside the
+    ``Verbatim`` environments. If the argument `arg` is true,
+    ``\renewcommand`` is used instead.
+    
+    Additional options accepted by the `LatexFormatter`:
+
+    `docclass`
+        If the `full` option is enabled, this is the document class to use
+        (default: ``'article'``).
+
+    `preamble`
+        If the `full` option is enabled, this can be further preamble commands,
+        e.g. ``\usepackage`` (default: ``''``).
+
+    `verboptions`
+        Additional options given to the Verbatim environment (see the *fancyvrb*
+        docs for possible values) (default: ``''``).
+    
+    :Aliases: ``latex``, ``tex``
+    :Filename pattern: ``*.tex``
+    
+
+`BBCodeFormatter`
+-----------------
+    
+    Formats tokens with BBcodes. These formatting codes are used by many
+    bulletin boards, so you can highlight your sourcecode with pygments before
+    posting it there.
+
+    This formatter has no support for background colors and borders, as there
+    are no common BBcode tags for that.
+
+    Some board systems (e.g. phpBB) don't support colors in their [code] tag,
+    so you can't use the highlighting together with that tag.
+    Text in a [code] tag usually is shown with a monospace font (which this
+    formatter can do with the ``monofont`` option) and no spaces (which you
+    need for indentation) are removed.
+
+    The `BBCodeFormatter` accepts two additional option:
+
+    `codetag`
+        If set to true, put the output into ``[code]`` tags (default:
+        ``false``)
+
+    `monofont`
+        If set to true, add a tag to show the code with a monospace font
+        (default: ``false``).
+
+    :Aliases: ``bbcode``, ``bb``
+    :Filename pattern: None
+
+
+`TerminalFormatter`
+-------------------
+    
+    Formats tokens with ANSI color sequences, for output in a text console.
+    Color sequences are terminated at newlines, so that paging the output
+    works correctly.
+
+    The `get_style_defs()` method doesn't do anything special since there is
+    no support for common styles.
+
+    The TerminalFormatter class supports only these options:
+
+    `bg`
+        Set to ``"light"`` or ``"dark"`` depending on the terminal's background
+        (default: ``"light"``).
+
+    `colorscheme`
+        A dictionary mapping token types to (lightbg, darkbg) color names or
+        ``None`` (default: ``None`` = use builtin colorscheme).
+
+    `debug`
+        If this option is true, output the string "<<ERROR>>" after each error
+        token. This is meant as a help for debugging Pygments (default: ``False``).
+
+    :Aliases: ``terminal``, ``console``
+    :Filename pattern: None
+
+
+`RawTokenFormatter`
+-------------------
+
+    Formats tokens as a raw representation for storing token streams.
+
+    The format is ``tokentype<TAB>repr(tokenstring)\n``. The output can later
+    be converted to a token stream with the `RawTokenLexer`, described in the
+    `lexer list <lexers.txt>`_.
+
+    One option is accepted:
+
+    `compress`
+        If set to ``'gz'`` or ``'bz2'``, compress the output with the given
+        compression algorithm after encoding (default: ``''``).
+
+    :Aliases: ``raw``, ``tokens``
+    :Filename pattern: ``*.raw``
+
+
+`NullFormatter`
+---------------
+
+    Just output all tokens, don't format in any way.
+
+    :Aliases: ``text``, ``null``
+    :Filename pattern: ``*.txt``
+
diff --git a/docs/src/index.txt b/docs/src/index.txt
new file mode 100644
index 00000000..33874d1c
--- /dev/null
+++ b/docs/src/index.txt
@@ -0,0 +1,51 @@
+.. -*- mode: rst -*-
+
+========
+Overview
+========
+
+Welcome to the Pygments documentation.
+
+- Starting with Pygments
+
+  - `Installation <installation.txt>`_
+
+  - `Quickstart <quickstart.txt>`_
+
+  - `Command line interface <cmdline.txt>`_
+
+- Essential to know
+
+  - `Builtin lexers <lexers.txt>`_
+
+  - `Builtin formatters <formatters.txt>`_
+
+  - `Styles <styles.txt>`_
+
+- API and more
+
+  - `API documentation <api.txt>`_
+
+  - `Builtin Tokens <tokens.txt>`_
+
+- Hacking for Pygments
+
+  - `Write your own lexer <lexerdevelopment.txt>`_
+
+  - `Write your own formatter <formatterdev.txt>`_
+
+- Hints and Tricks
+
+  - `Using Pygments in ReST documents <rstdirective.txt>`_
+
+
+--------------
+
+If you find bugs or have suggestions for the documentation, please
+look `here`_ for info on how to contact the team.
+
+You can download an offline version of this documentation from the
+`download page`_.
+
+.. _here: http://pygments.pocoo.org/contribute
+.. _download page: http://pygments.pocoo.org/download
diff --git a/docs/src/installation.txt b/docs/src/installation.txt
new file mode 100644
index 00000000..708592a8
--- /dev/null
+++ b/docs/src/installation.txt
@@ -0,0 +1,47 @@
+.. -*- mode: rst -*-
+
+============
+Installation
+============
+
+Pygments requires at least Python 2.3 to work correctly. Just to clarify:
+there *wont't* ever be support for Python versions below 2.3.
+
+
+Install the Release Version
+===========================
+
+1.  download the recent tarball from the `download page`_
+2.  unpack the tarball
+3.  ``sudo python setup.py install``
+
+Note that the last command will automatically download and install
+`setuptools`_ if you don't already have it installed. This requires a working
+internet connection.
+
+This will install Pygments into your Python installation's site-packages directory.
+
+
+Install via easy_install
+========================
+
+You can also install the most recent Pygments version using `easy_install`_::
+
+    sudo easy_install Pygments
+
+This will install a Pygments egg in your Python installation's site-packages
+directory.
+
+
+Installing the development Version
+==================================
+
+1.  Install `subversion`_
+2.  ``svn co http://trac.pocoo.org/repos/pygments/trunk pygments``
+3.  ``ln -s `pwd`/pygments/pygments /usr/lib/python2.X/site-packages``
+
+
+.. _download page: http://pygments.pocoo.org/download/
+.. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools
+.. _easy_install: http://peak.telecommunity.com/DevCenter/EasyInstall
+.. _subversion: http://subversion.tigris.org/
diff --git a/docs/src/lexerdevelopment.txt b/docs/src/lexerdevelopment.txt
new file mode 100644
index 00000000..619c5c39
--- /dev/null
+++ b/docs/src/lexerdevelopment.txt
@@ -0,0 +1,482 @@
+.. -*- mode: rst -*-
+
+====================
+Write your own lexer
+====================
+
+If a lexer for your favorite language is missing in the Pygments package, you can
+easily write your own and extend Pygments.
+
+All you need can be found inside the `pygments.lexer` module. As you can read in
+the `API documentation <api.txt>`_, a lexer is a class that is initialized with
+some keyword arguments (the lexer options) and that provides a
+`get_tokens_unprocessed()` method which is given a string or unicode object with
+the data to parse.
+
+The `get_tokens_unprocessed()` method must return an iterator or iterable
+containing tuples in the form ``(index, token, value)``. Normally you don't need
+to do this since there are numerous base lexers you can subclass.
+
+
+RegexLexer
+==========
+
+A very powerful (but quite easy to use) lexer is the `RegexLexer`. This lexer
+base class allows you to define lexing rules in terms of *regular expressions*
+for different *states*.
+
+States are groups of regular expressions that are matched against the input
+string at the *current position*. If one of these expressions matches, a
+corresponding action is performed (normally yielding a token with a specific
+type), the current position is set to where the last match ended and the
+matching process continues with the first regex of the current state.
+
+Lexer states are kept in a state stack: each time a new state is entered, the
+new state is pushed onto the stack.  The most basic lexers (like the
+`DiffLexer`) just need one state.
+
+Each state is defined as a list of tuples in the form (`regex`, `action`,
+`new_state`) where the last item is optional.  In the most basic form, `action`
+is a token type (like `Name.Builtin`).  That means: When `regex` matches, emit a
+token with the match text and type `tokentype` and push `new_state` on the state
+stack.  If the new state is ``'#pop'``, the topmost state is popped from the
+stack instead. (To pop more than one state, use ``'#pop:2'`` and so on.)
+``'#push'`` is a synonym for pushing the current state on the
+stack.
+
+The following example shows the `DiffLexer` from the builtin lexers. Note that
+it contains some additional attributes `name`, `aliases` and `filenames` which
+aren't required for a lexer. They are used by the builtin lexer lookup
+functions.
+
+.. sourcecode:: python
+
+    from pygments.lexer import RegexLexer
+    from pygments.token import \
+         Text, Comment, Keyword, Name, String, Generic
+
+    class DiffLexer(RegexLexer):
+        name = 'Diff'
+        aliases = ['diff']
+        filenames = ['*.diff']
+
+        tokens = {
+            'root': [
+                (r' .*\n', Text),
+                (r'\+.*\n', Generic.Inserted),
+                (r'-.*\n', Generic.Deleted),
+                (r'@.*\n', Generic.Subheading),
+                (r'Index.*\n', Generic.Heading),
+                (r'=.*\n', Generic.Heading),
+                (r'.*\n', Text),
+            ]
+        }
+
+As you can see this lexer only uses one state.  When the lexer starts scanning
+the text, it first checks if the current character is a space. If this is true
+it scans everything until newline and returns the parsed data as `Text` token.
+
+If this rule doesn't match, it checks if the current char is a plus sign.  And
+so on.
+
+If no rule matches at the current position, the current char is emitted as an
+`Error` token that indicates a parsing error, and the position is increased by
+1.
+
+
+Regex Flags
+===========
+
+You can either define regex flags in the regex (``r'(?x)foo bar'``) or by adding
+a `flags` attribute to your lexer class. If no attribute is defined, it defaults
+to `re.MULTILINE`. For more informations about regular expression flags see the
+`regular expressions`_ help page in the python documentation.
+
+.. _regular expressions: http://docs.python.org/lib/re-syntax.html
+
+
+Scanning multiple tokens at once
+================================
+
+Here is a more complex lexer that highlights INI files. INI files consist of
+sections, comments and key = value pairs:
+
+.. sourcecode:: python
+
+    from pygments.lexer import RegexLexer, bygroups
+
+    class IniLexer(RegexLexer):
+        name = 'INI'
+        aliases = ['ini', 'cfg']
+        filenames = ['*.ini', '*.cfg']
+
+        tokens = {
+            'root': [
+                (r'\s+', Text),
+                (r';.*?$', Comment),
+                (r'\[.*?\]$', Keyword),
+                (r'(.*?)(\s*)(=)(\s*)(.*?)$',
+                 bygroups(Name.Attribute, Text, Operator, Text, String))
+            ]
+        }
+
+The lexer first looks for whitespace, comments and section names. And later it
+looks for a line that looks like a key, value pair, seperated by an ``'='``
+sign, and optional whitespace.
+
+The `bygroups` helper makes sure that aach group is yielded with a different
+token type. First the `Name.Attribute` token, then a `Text` token for the
+optional whitespace, after that a `Operator` token for the equals sign. Then a
+`Text` token for the whitespace again. The rest of the line is returned as
+`String`.
+
+Note that for this to work, every part of the match must be inside a capturing
+group (a ``(...)``), and there must not be any nested capturing groups.  If you
+nevertheless need a group, use a non-capturing group defined using this syntax:
+``r'(?:some|words|here)'`` (note the ``?:`` after the beginning parenthesis).
+
+
+Changing states
+===============
+
+Many lexers need multiple states to work as expected. For example, some
+languages allow multiline comments to be nested. Since this is a recursive
+pattern it's impossible to lex just using regular expressions.
+
+Here is the solution:
+
+.. sourcecode:: python
+
+    class ExampleLexer(RegexLexer):
+        name = 'Example Lexer with states'
+
+        tokens = {
+            'root': [
+                (r'[^/]+', Text),
+                (r'/\*', Comment.Multiline, 'comment'),
+                (r'//.*?$', Comment.Singleline),
+                (r'/', Text)
+            ],
+            'comment': [
+                (r'[^*/]', Comment.Multiline),
+                (r'/\*', Comment.Multiline, '#push'),
+                (r'\*/', Comment.Multiline, '#pop'),
+                (r'[*/]', Comment.Multiline)
+            ]
+        }
+
+This lexer starts lexing in the ``'root'`` state. It tries to match as much as
+possible until it finds a slash (``'/'``). If the next character after the slash
+is a star (``'*'``) the `RegexLexer` sends those two characters to the output
+stream marked as `Comment.Multiline` and continues parsing with the rules
+defined in the ``'comment'`` state.
+
+If there wasn't a star after the slash, the `RegexLexer` checks if it's a
+singleline comment (eg: followed by a second slash). If this also wasn't the
+case it must be a single slash (the separate regex for a single slash must also
+be given, else the slash would be marked as an error token).
+
+Inside the ``'comment'`` state, we do the same thing again. Scan until the lexer
+finds a star or slash. If it's the opening of a multiline comment, push the
+``'comment'`` state on the stack and continue scanning, again in the
+``'comment'`` state.  Else, check if it's the end of the multiline comment. If
+yes, pop one state from the stack.
+
+Note: If you pop from an empty stack you'll get an `IndexError`. (There is an
+easy way to prevent this from happening: don't ``'#pop'`` in the root state).
+
+If the `RegexLexer` encounters a newline that is flagged as an error token, the
+stack is emptied and the lexer continues scanning in the ``'root'`` state. This
+helps producing error-tolerant highlighting for erroneous input, e.g. when a
+single-line string is not closed.
+
+
+Advanced state tricks
+=====================
+
+There are a few more things you can do with states:
+
+- You can push multiple states onto the stack if you give a tuple instead of a
+  simple string as the third item in a rule tuple. For example, if you want to
+  match a comment containing a directive, something like::
+
+      /* <processing directive>    rest of comment */
+
+  you can use this rule:
+
+  .. sourcecode:: python
+
+      tokens = {
+          'root': [
+              (r'/\* <', Comment, ('comment', 'directive')),
+              ...
+          ],
+          'directive': [
+              (r'[^>]*', Comment.Directive),
+              (r'>', Comment, '#pop'),
+          ],
+          'comment': [
+              (r'[^*]+', Comment),
+              (r'\*/', Comment, '#pop'),
+              (r'\*', Comment),
+          ]
+      }
+
+  When this encounters the above sample, first ``'comment'`` and ``'directive'``
+  are pushed onto the stack, then the lexer continues in the directive state
+  until it finds the closing ``>``, then it continues in the comment state until
+  the closing ``*/``. Then, both states are popped from the stack again and
+  lexing continues in the root state.
+
+
+- You can include the rules of a state in the definition of another.  This is
+  done by using `include` from `pygments.lexer`:
+
+  .. sourcecode:: python
+
+      from pygments.lexer import RegexLexer, include
+
+      class ExampleLexer(RegexLexer):
+          tokens = {
+              'comments': [
+                  (r'/\*.*?\*/', Comment),
+                  (r'//.*?\n', Comment),
+              ],
+              'root': [
+                  include('comments'),
+                  (r'(function )(\w+)( {)',
+                   (Keyword, Name, Keyword), 'function'),
+                  (r'.', Text),
+              ],
+              'function': [
+                  (r'[^}/]+', Text),
+                  include('comments'),
+                  (r'/', Text),
+                  (r'}', Keyword, '#pop'),
+              ]
+          }
+
+  This is a hypothetical lexer for a language that consist of functions and
+  comments. Because comments can occur at toplevel and in functions, we need
+  rules for comments in both states. As you can see, the `include` helper saves
+  repeating rules that occur more than once (in this example, the state
+  ``'comment'`` will never be entered by the lexer, as it's only there to be
+  included in ``'root'`` and ``'function'``).
+
+
+- Sometimes, you may want to "combine" a state from existing ones.  This is
+  possible with the `combine` helper from `pygments.lexer`.
+
+  If you, instead of a new state, write ``combined('state1', 'state2')`` as the
+  third item of a rule tuple, a new anonymous state will be formed from state1
+  and state2 and if the rule matches, the lexer will enter this state.
+
+  This is not used very often, but can be helpful in some cases, such as the
+  `PythonLexer`'s string literal processing.
+
+- If you want your lexer to start lexing in a different state you can modify
+  the stack by overloading the `get_tokens_unprocessed` method:
+
+  .. sourcecode:: python
+
+      class MyLexer(RegexLexer):
+          tokens = {...}
+
+          def get_tokens_unprocessed(self, text):
+              stack = ['root', 'otherstate']
+              for item in RegexLexer.get_tokens_unprocessed(text, stack):
+                  yield item
+
+  Some lexers like the `PhpLexer` use this to make the leading ``<?php``
+  preprocessor comments optional. Note that you can crash the lexer easily
+  by putting values into the stack that don't exist in the token map. Also
+  removing ``'root'`` from the stack can result in strange errors!
+
+
+Using multiple lexers
+=====================
+
+Using multiple lexers for the same input can be tricky. One of the easiest
+combination techniques is shown here: You can replace the token type entry in a
+rule tuple (the second item) with a lexer class. The matched text will then be
+lexed with that lexer, and the resulting tokens will be yielded.
+
+For example, look at this stripped-down HTML lexer:
+
+.. sourcecode:: python
+
+    from pygments.lexer import RegexLexer, bygroups, using
+
+    class HtmlLexer(RegexLexer):
+        name = 'HTML'
+        aliases = ['html']
+        filenames = ['*.html', '*.htm']
+
+        flags = re.IGNORECASE | re.DOTALL
+        tokens = {
+            'root': [
+                ('[^<&]+', Text),
+                ('&.*?;', Name.Entity),
+                (r'<\s*script\s*', Name.Tag, ('script-content', 'tag')),
+                (r'<\s*[a-zA-Z0-9:]+', Name.Tag, 'tag'),
+                (r'<\s*/\s*[a-zA-Z0-9:]+\s*>', Name.Tag),
+            ],
+            'script-content': [
+                (r'(.+?)(<\s*/\s*script\s*>)',
+                 bygroups(using(JavascriptLexer), Name.Tag),
+                 '#pop'),
+            ]
+        }
+
+Here the content of a ``<script>`` tag is passed to a newly created instance of
+a `JavascriptLexer` and not processed by the `HtmlLexer`. This is done using the
+`using` helper that takes the other lexer class as its parameter.
+
+Note the combination of `bygroups` and `using`. This makes sure that the content
+up to the ``</script>`` end tag is processed by the `JavascriptLexer`, while the
+end tag is yielded as a normal token with the `Name.Tag` type.
+
+As an additional goodie, if the lexer class is replaced by `this` (imported from
+`pygments.lexer`), the "other" lexer will be the current one (because you cannot
+refer to the current class within the code that runs at class definition time).
+
+Also note the ``(r'<\s*script\s*', Name.Tag, ('script-content', 'tag'))`` rule.
+Here, two states are pushed onto the state stack, ``'script-content'`` and
+``'tag'``.  That means that first ``'tag'`` is processed, which will parse
+attributes and the closing ``>``, then the ``'tag'`` state is popped and the
+next state on top of the stack will be ``'script-content'``.
+
+Any keywords arguments passed to ``using()`` are added to the keyword arguments
+used to create the lexer.
+
+
+Delegating Lexer
+================
+
+Another approach for nested lexers is the `DelegatingLexer` which is for
+example used for the template engine lexers. It takes two lexers as
+arguments on initialisation: a `root_lexer` and a `language_lexer`.
+
+The input is processed as follows: First, the whole text is lexed with the
+`language_lexer`. All tokens yielded with a type of ``Other`` are then
+concatenated and given to the `root_lexer`. The language tokens of the
+`language_lexer` are then inserted into the `root_lexer`'s token stream
+at the appropriate positions.
+
+.. sourcecode:: python
+
+    from pygments.lexer import DelegatingLexer
+    from pygments.lexers.web import HtmlLexer, PhpLexer
+
+    class HtmlPhpLexer(DelegatingLexer):
+        def __init__(self, **options):
+            super(HtmlPhpLexer, self).__init__(HtmlLexer, PhpLexer, **options)
+
+This procedure ensures that e.g. HTML with template tags in it is highlighted
+correctly even if the template tags are put into HTML tags or attributes.
+
+If you want to change the needle token ``Other`` to something else, you can
+give the lexer another token type as the third parameter:
+
+.. sourcecode:: python
+
+    DelegatingLexer.__init__(MyLexer, OtherLexer, Text, **options)
+
+
+Callbacks
+=========
+
+Sometimes the grammar of a language is so complex that a lexer would be unable
+to parse it just by using regular expressions and stacks.
+
+For this, the `RegexLexer` allows callbacks to be given in rule tuples, instead
+of token types (`bygroups` and `using` are nothing else but preimplemented
+callbacks). The callback must be a function taking two arguments:
+
+* the lexer itself
+* the match object for the last matched rule
+
+The callback must then return an iterable of (or simply yield) ``(index,
+tokentype, value)`` tuples, which are then just passed through by
+`get_tokens_unprocessed()`. The ``index`` here is the position of the token in
+the input string, ``tokentype`` is the normal token type (like `Name.Builtin`),
+and ``value`` the associated part of the input string.
+
+You can see an example here:
+
+.. sourcecode:: python
+
+    class HypotheticLexer(RegexLexer):
+
+        def headline_callback(lexer, match):
+            yield match.start(), Generic.Headline, equal_signs + text + equal_signs
+
+        tokens = {
+            'root': [
+                (r'(=+)(.*?)(\1)', headline_callback)
+            ]
+        }
+
+If the regex for the `headline_callback` matches, the function is called with the
+match object. Note that after the callback is done, processing continues
+normally, that is, after the end of the previous match. The callback has no
+possibility to influence the position.
+
+There are not really any simple examples for lexer callbacks, but you can see
+them in action e.g. in the `compiled.py`_ source code in the `CLexer` and
+`JavaLexer` classes.
+
+.. _compiled.py: http://trac.pocoo.org/repos/pygments/lexers/compiled.py
+
+
+The ExtendedRegexLexer class
+============================
+
+The `RegexLexer`, even with callbacks, unfortunately isn't powerful enough for
+the funky syntax rules of some languages that will go unnamed, such as Ruby.
+
+But fear not; even then you don't have to abandon the regular expression
+approach. For Pygments has a subclass of `RegexLexer`, the `ExtendedRegexLexer`.
+All features known from RegexLexers are available here too, and the tokens are
+specified in exactly the same way, *except* for one detail:
+
+The `get_tokens_unprocessed()` method holds its internal state data not as local
+variables, but in an instance of the `pygments.lexer.LexerContext` class, and
+that instance is passed to callbacks as a third argument. This means that you
+can modify the lexer state in callbacks.
+
+The `LexerContext` class has the following members:
+
+* `text` -- the input text
+* `pos` -- the current starting position that is used for matching regexes
+* `stack` -- a list containing the state stack
+* `end` -- the maximum position to which regexes are matched, this defaults to
+  the length of `text`
+
+Additionally, the `get_tokens_unprocessed()` method can be given a
+`LexerContext` instead of a string and will then process this context instead of
+creating a new one for the string argument.
+
+Note that because you can set the current position to anything in the callback,
+it won't be automatically be set by the caller after the callback is finished.
+For example, this is how the hypothetical lexer above would be written with the
+`ExtendedRegexLexer`:
+
+.. sourcecode:: python
+
+    class ExHypotheticLexer(ExtendedRegexLexer):
+
+        def headline_callback(lexer, match, ctx):
+            yield match.start(), Generic.Headline, equal_signs + text + equal_signs
+            ctx.pos = match.end()
+
+        tokens = {
+            'root': [
+                (r'(=+)(.*?)(\1)', headline_callback)
+            ]
+        }
+
+This might sound confusing (and it can really be). But it is needed, and for an
+example look at the Ruby lexer in `agile.py`_.
+
+.. _agile.py: http://trac.pocoo.org/repos/pygments/trunk/pygments/lexers/agile.py
diff --git a/docs/src/lexers.txt b/docs/src/lexers.txt
new file mode 100644
index 00000000..5fd8b19e
--- /dev/null
+++ b/docs/src/lexers.txt
@@ -0,0 +1,521 @@
+.. -*- mode: rst -*-
+
+================
+Available lexers
+================
+
+This page lists all available builtin lexers and the options they take.
+
+Currently, **all lexers** support these options:
+
+`stripnl`
+    Strip leading and trailing newlines from the input (default: ``True``)
+
+`stripall`
+    Strip all leading and trailing whitespace from the input (default:
+    ``False``).
+
+`tabsize`
+    If given and greater than 0, expand tabs in the input (default: ``0``).
+
+
+These lexers are builtin and can be imported from
+`pygments.lexers`:
+
+
+Special lexers
+==============
+
+`TextLexer`
+
+    "Null" lexer, doesn't highlight anything.
+
+    :Aliases: ``text``
+    :Filename patterns: ``*.txt``
+
+
+`RawTokenLexer`
+
+    Recreates a token stream formatted with the `RawTokenFormatter`.
+
+    Additional option:
+
+    `compress`
+         If set to ``'gz'`` or ``'bz2'``, decompress the token stream with
+         the given compression algorithm before lexing (default: '').
+
+    :Aliases: ``raw``
+    :Filename patterns: ``*.raw``
+
+
+Agile languages
+===============
+
+`PythonLexer`
+
+    For `Python <http://www.python.org>`_ source code.
+
+    :Aliases: ``python``, ``py``
+    :Filename patterns: ``*.py``, ``*.pyw``
+
+
+`PythonConsoleLexer`
+
+    For Python console output or doctests, such as:
+
+    .. sourcecode:: pycon
+
+        >>> a = 'foo'
+        >>> print a
+        'foo'
+        >>> 1/0
+        Traceback (most recent call last):
+        ...
+
+    :Aliases: ``pycon``
+    :Filename patterns: None
+
+
+`RubyLexer`
+
+    For `Ruby <http://www.ruby-lang.org>`_ source code.
+
+    :Aliases: ``ruby``, ``rb``
+    :Filename patterns: ``*.rb``
+
+
+`RubyConsoleLexer`
+
+    For Ruby interactive console (**irb**) output like:
+
+    .. sourcecode:: rbcon
+
+        irb(main):001:0> a = 1
+        => 1
+        irb(main):002:0> puts a
+        1
+        => nil
+
+    :Aliases: ``rbcon``, ``irb``
+    :Filename patterns: None
+
+
+`PerlLexer`
+
+    For `Perl <http://www.perl.org>`_ source code.
+
+    :Aliases: ``perl``, ``pl``
+    :Filename patterns: ``*.pl``, ``*.pm``
+
+
+`LuaLexer`
+
+    For `Lua <http://www.lua.org>`_ source code.
+
+    Additional options:
+
+    `func_name_highlighting`
+        If given and ``True``, highlight builtin function names
+        (default: ``True``).
+    `disabled_modules`
+        If given, must be a list of module names whose function names
+        should not be highlighted. By default all modules are highlighted.
+
+        To get a list of allowed modules have a look into the
+        `_luabuiltins` module:
+
+        .. sourcecode:: pycon
+
+            >>> from pygments.lexers._luabuiltins import MODULES
+            >>> MODULES.keys()
+            ['string', 'coroutine', 'modules', 'io', 'basic', ...]
+
+    :Aliases: ``lua``
+    :Filename patterns: ``*.lua``
+
+
+Compiled languages
+==================
+
+`CLexer`
+
+    For C source code with preprocessor directives.
+
+    :Aliases: ``c``
+    :Filename patterns: ``*.c``, ``*.h``
+
+
+`CppLexer`
+
+    For C++ source code with preprocessor directives.
+
+    :Aliases: ``cpp``, ``c++``
+    :Filename patterns: ``*.cpp``, ``*.hpp``, ``*.c++``, ``*.h++``
+
+
+`DelphiLexer`
+
+    For `Delphi <http://www.borland.com/delphi/>`_
+    (Borland Object Pascal) source code.
+
+    :Aliases: ``delphi``, ``pas``, ``pascal``, ``objectpascal``
+    :Filename patterns: ``*.pas``
+
+
+`JavaLexer`
+
+    For `Java <http://www.sun.com/java/>`_ source code.
+
+    :Aliases: ``java``
+    :Filename patterns: ``*.java``
+
+
+.NET languages
+==============
+
+`CSharpLexer`
+
+    For `C# <http://msdn2.microsoft.com/en-us/vcsharp/default.aspx>`_
+    source code.
+
+    :Aliases: ``c#``, ``csharp``
+    :Filename patterns: ``*.cs``
+
+`BooLexer`
+
+    For `Boo <http://boo.codehaus.org/>`_ source code.
+
+    :Aliases: ``boo``
+    :Filename patterns: ``*.boo``
+
+`VbNetLexer`
+
+    For
+    `Visual Basic.NET <http://msdn2.microsoft.com/en-us/vbasic/default.aspx>`_
+    source code.
+
+    :Aliases: ``vbnet``, ``vb.net``
+    :Filename patterns: ``*.vb``, ``*.bas``
+
+
+Web-related languages
+=====================
+
+`JavascriptLexer`
+
+    For JavaScript source code.
+
+    :Aliases: ``js``, ``javascript``
+    :Filename patterns: ``*.js``
+
+
+`CssLexer`
+
+    For CSS (Cascading Style Sheets).
+
+    :Aliases: ``css``
+    :Filename patterns: ``*.css``
+
+
+`HtmlLexer`
+
+    For HTML 4 and XHTML 1 markup. Nested JavaScript and CSS is highlighted
+    by the appropriate lexer.
+
+    :Aliases: ``html``
+    :Filename patterns: ``*.html``, ``*.htm``, ``*.xhtml``
+
+
+`PhpLexer`
+
+    For `PHP <http://www.php.net/>`_ source code.
+    For PHP embedded in HTML, use the `HtmlPhpLexer`.
+
+    Additional options:
+
+    `startinline`
+        If given and ``True`` the lexer starts highlighting with
+        php code. (i.e.: no starting ``<?php`` required)
+    `funcnamehighlighting`
+        If given and ``True``, highlight builtin function names
+        (default: ``True``).
+    `disabledmodules`
+        If given, must be a list of module names whose function names
+        should not be highlighted. By default all modules are highlighted
+        except the special ``'unknown'`` module that includes functions
+        that are known to php but are undocumented.
+
+        To get a list of allowed modules have a look into the
+        `_phpbuiltins` module:
+
+        .. sourcecode:: pycon
+
+            >>> from pygments.lexers._phpbuiltins import MODULES
+            >>> MODULES.keys()
+            ['PHP Options/Info', 'Zip', 'dba', ...]
+
+        In fact the names of those modules match the module names from
+        the php documentation.
+
+    :Aliases: ``php``, ``php3``, ``php4``, ``php5``
+    :Filename patterns: ``*.php``, ``*.php[345]``
+
+
+`XmlLexer`
+
+    Generic lexer for XML (extensible markup language).
+
+    :Aliases: ``xml``
+    :Filename patterns: ``*.xml``
+
+
+Template languages
+==================
+
+`ErbLexer`
+
+    Generic `ERB <http://ruby-doc.org/core/classes/ERB.html>`_ (Ruby Templating)
+    lexer.
+
+    Just highlights ruby code between the preprocessor directives, other data
+    is left untouched by the lexer.
+
+    All options are also forwarded to the `RubyLexer`.
+
+    :Aliases:   ``erb``
+    :Filename patterns: None
+
+
+`RhtmlLexer`
+
+    Subclass of the ERB lexer that highlights the unlexed data with the
+    html lexer.
+
+    Nested Javascript and CSS is highlighted too.
+
+    :Aliases:   ``rhtml``, ``html+erb``, ``html+ruby``
+    :Filename patterns: ``*.rhtml``
+
+
+`XmlErbLexer`
+
+    Subclass of `ErbLexer` which highlights data outside preprocessor
+    directives with the `XmlLexer`.
+
+    :Aliases:   ``xml+erb``, ``xml+ruby``
+    :Filename patterns: None
+
+
+`CssErbLexer`
+
+    Subclass of `ErbLexer` which highlights unlexed data with the `CssLexer`.
+
+    :Aliases:   ``css+erb``, ``css+ruby``
+    :Filename patterns: None
+
+
+`JavascriptErbLexer`
+
+    Subclass of `ErbLexer` which highlights unlexed data with the
+    `JavascriptLexer`.
+
+    :Aliases:   ``js+erb``, ``javascript+erb``, ``js+ruby``, ``javascript+ruby``
+    :Filename patterns: None
+
+
+`HtmlPhpLexer`
+
+    Subclass of `PhpLexer` that highlights unhandled data with the `HtmlLexer`.
+
+    Nested Javascript and CSS is highlighted too.
+
+    :Aliases:   ``html+php``
+    :Filename patterns:  ``*.phtml``
+
+
+`XmlPhpLexer`
+
+    Subclass of `PhpLexer` that higlights unhandled data with the `XmlLexer`.
+
+    :Aliases:   ``xml+php``
+    :Filename patterns: None
+
+
+`CssPhpLexer`
+
+    Subclass of `PhpLexer` which highlights unmatched data with the `CssLexer`.
+
+    :Aliases:   ``css+php``
+    :Filename patterns: None
+
+
+`JavascriptPhpLexer`
+
+    Subclass of `PhpLexer` which highlights unmatched data with the
+    `JavascriptLexer`.
+
+    :Aliases:   ``js+php``, ``javascript+php``
+    :Filename patterns: None
+
+
+`DjangoLexer`
+
+    Generic `django <http://www.djangoproject.com/documentation/templates/>`_
+    template lexer.
+
+    It just highlights django code between the preprocessor directives, other
+    data is left untouched by the lexer.
+
+    :Aliases:   ``django``
+    :Filename patterns: None
+
+
+`HtmlDjangoLexer`
+
+    Subclass of the `DjangoLexer` that highighlights unlexed data with the
+    `HtmlLexer`.
+
+    Nested Javascript and CSS is highlighted too.
+
+    :Aliases:   ``html+django``
+    :Filename patterns: None
+
+
+`XmlDjangoLexer`
+
+    Subclass of the `DjangoLexer` that highlights unlexed data with the
+    `XmlLexer`.
+
+    :Aliases:   ``xml+django``
+    :Filename patterns: None
+
+
+`CssDjangoLexer`
+
+    Subclass of the `DjangoLexer` that highlights unlexed data with the
+    `CssLexer`.
+
+    :Aliases:   ``css+django``
+    :Filename patterns: None
+
+
+`JavascriptDjangoLexer`
+
+    Subclass of the `DjangoLexer` that highlights unlexed data with the
+    `JavascriptLexer`.
+
+    :Aliases:   ``javascript+django``
+    :Filename patterns: None
+
+
+`SmartyLexer`
+
+    Generic `Smarty <http://smarty.php.net/>`_ template lexer.
+
+    Just highlights smarty code between the preprocessor directives, other
+    data is left untouched by the lexer.
+
+    :Aliases:   ``smarty``
+    :Filename patterns: None
+
+
+`HtmlSmartyLexer`
+
+    Subclass of the `SmartyLexer` that highighlights unlexed data with the
+    `HtmlLexer`.
+
+    Nested Javascript and CSS is highlighted too.
+
+    :Aliases:   ``html+smarty``
+    :Filename patterns: None
+
+
+`XmlSmartyLexer`
+
+    Subclass of the `SmartyLexer` that highlights unlexed data with the
+    `XmlLexer`.
+
+    :Aliases:   ``xml+smarty``
+    :Filename patterns: None
+
+
+`CssSmartyLexer`
+
+    Subclass of the `SmartyLexer` that highlights unlexed data with the
+    `CssLexer`.
+
+    :Aliases:   ``css+smarty``
+    :Filename patterns: None
+
+
+`JavascriptSmartyLexer`
+
+    Subclass of the `SmartyLexer` that highlights unlexed data with the
+    `JavascriptLexer`.
+
+    :Aliases:   ``javascript+smarty``
+    :Filename patterns: None
+
+
+Other languages
+===============
+
+`SqlLexer`
+
+    Lexer for Structured Query Language. Currently, this lexer does
+    not recognize any special syntax except ANSI SQL.
+
+    :Aliases: ``sql``
+    :Filename patterns: ``*.sql``
+
+
+`BrainfuckLexer`
+
+    Lexer for the esoteric `BrainFuck <http://www.muppetlabs.com/~breadbox/bf/>`_
+    language.
+
+    :Aliases: ``brainfuck``
+    :Filename patterns: ``*.bf``, ``*.b``
+
+
+Text lexers
+===========
+
+`IniLexer`
+
+    Lexer for configuration files in INI style.
+
+    :Aliases: ``ini``, ``cfg``
+    :Filename patterns: ``*.ini``, ``*.cfg``
+
+
+`MakefileLexer`
+
+    Lexer for Makefiles.
+
+    :Aliases: ``make``, ``makefile``, ``mf``
+    :Filename patterns: ``*.mak``, ``Makefile``, ``makefile``
+
+
+`DiffLexer`
+
+    Lexer for unified or context-style diffs.
+
+    :Aliases: ``diff``
+    :Filename patterns: ``*.diff``, ``*.patch``
+
+
+`IrcLogsLexer`
+
+    Lexer for IRC logs in **irssi** or **xchat** style.
+
+    :Aliases: ``irc``
+    :Filename patterns: None
+
+
+`TexLexer`
+
+    Lexer for the TeX and LaTeX typesetting languages.
+
+    :Aliases: ``tex``, ``latex``
+    :Filename patterns: ``*.tex``, ``*.aux``, ``*.toc``
diff --git a/docs/src/quickstart.txt b/docs/src/quickstart.txt
new file mode 100644
index 00000000..749889df
--- /dev/null
+++ b/docs/src/quickstart.txt
@@ -0,0 +1,121 @@
+.. -*- mode: rst -*-
+
+==========
+Quickstart
+==========
+
+
+Pygments comes with a wide range of lexers for modern languages which are all
+accessible through the pygments.lexers package. A lexer enables Pygments to
+parse the source code into tokens which are passed to a formatter. Currently
+formatters exist for HTML, LaTeX and ANSI sequences.
+
+
+Example
+=======
+
+Here is a small example for highlighting Python code:
+
+.. sourcecode:: python
+
+    from pygments import highlight
+    from pygments.lexers import PythonLexer
+    from pygments.formatters import HtmlFormatter
+
+    code = 'print "Hello World"'
+    print highlight(code, PythonLexer(), HtmlFormatter())
+
+which prints something like this:
+
+.. sourcecode:: html
+
+    <div class="highlight">
+    <pre><span class="k">print</span> <span class="l s">&quot;Hello World&quot;</span></pre>
+    </div>
+
+
+A CSS stylesheet which contains all CSS classes possibly used in the output can be
+produced by:
+
+.. sourcecode:: python
+
+    print HtmlFormatter().get_style_defs('.highlight')
+
+The argument is used as an additional CSS selector: the output may look like
+
+.. sourcecode:: css
+
+    .highlight .k { color: #AA22FF; font-weight: bold }
+    .highlight .s { color: #BB4444 }
+    ...
+
+
+Options
+=======
+
+The `highlight()` function supports a fourth argument called `outfile`, it must be
+a file object if given. The formatted output will then be written to this file
+instead of being returned as a string.
+
+Lexers and formatters both support options. They are given to them as keyword
+arguments either to the class or to the lookup method:
+
+.. sourcecode:: python
+
+    from pygments import highlight
+    from pygments.lexers import get_lexer_by_name
+    from pygments.formatters import HtmlFormatter
+
+    lexer = get_lexer_by_name("python", stripall=True)
+    formatter = HtmlFormatter(linenos=True, cssclass="source")
+    result = highlight(code, lexer, formatter)
+
+This makes the lexer strip all leading and trailing whitespace from the input
+(`stripall` option), lets the formatter output line numbers (`linenos` option),
+and sets the wrapping ``<div>``'s class to ``source`` (instead of
+``highlight``).
+
+For an overview of builtin lexers and formatters and their options, visit the
+`lexer <lexers.txt>`_ and `formatters <formatters.txt>`_ lists.
+
+
+Lexer and formatter lookup
+==========================
+
+If you want to lookup a built-in lexer by its alias or a filename, you can use
+one of the following methods:
+
+.. sourcecode:: pycon
+
+    >>> from pygments.lexers import get_lexer_by_name, get_lexer_for_filename
+    >>> get_lexer_by_name('python')
+    <pygments.lexers.agile.PythonLexer object at 0xb7bd6d0c>
+    >>> get_lexer_for_filename('spam.py')
+    <pygments.lexers.agile.PythonLexer object at 0xb7bd6b2c>
+
+The same API is available for formatters: use `get_formatter_by_name` and
+`get_formatter_for_filename` from the `pygments.formatters` module
+for this purpose.
+
+
+Command line usage
+==================
+
+You can use Pygments from the command line, using the `pygmentize` script::
+
+    $ pygmentize test.py
+
+will highlight the Python file test.py using ANSI escape sequences
+(a.k.a. terminal colors) and print the result to standard output.
+
+To output HTML, use the ``-f`` option::
+
+    $ pygmentize -f html -o test.html test.py
+
+to write an HTML-highlighted version of test.py to the file test.html.
+
+The stylesheet can be created with::
+
+    $ pygmentize -S default -f html > style.css
+
+More options and tricks and be found in the `command line referene <cmdline.txt>`_.
diff --git a/docs/src/rstdirective.txt b/docs/src/rstdirective.txt
new file mode 100644
index 00000000..60651319
--- /dev/null
+++ b/docs/src/rstdirective.txt
@@ -0,0 +1,42 @@
+===============================
+Using Pygments in ReST documents
+===============================
+
+Many Python people use `ReST`_ for documentation their sourcecode, programs etc.
+This also means that documentation often includes sourcecode samples etc.
+
+You can easily enable Pygments support for your rst texts as long as you
+use your own build script.
+
+Just add this code to it:
+
+.. sourcecode:: python
+
+    from docutils import nodes
+    from docutils.parsers.rst import directives
+    from pygments import highlight
+    from pygments.lexers import get_lexer_by_name
+    from pygments.formatters import HtmlFormatter
+
+    PYGMENTS_FORMATTER = HtmlFormatter()
+
+    def pygments_directive(name, arguments, options, content, lineno,
+                          content_offset, block_text, state, state_machine):
+        try:
+            lexer = get_lexer_by_name(arguments[0])
+        except ValueError:
+            # no lexer found
+            lexer = get_lexer_by_name('text')
+        parsed = highlight(u'\n'.join(content), lexer, PYGMENTS_FORMATTER)
+        return [nodes.raw('', parsed, format='html')]
+    pygments_directive.arguments = (1, 0, 1)
+    pygments_directive.content = 1
+    directives.register_directive('sourcecode', pygments_directive)
+
+Now you should be able to use Pygments in your rst files using this syntax::
+
+    .. sourcecode:: language
+
+        your code here
+
+.. _ReST: http://docutils.sf.net/rst.html
diff --git a/docs/src/styles.txt b/docs/src/styles.txt
new file mode 100644
index 00000000..4fd6c297
--- /dev/null
+++ b/docs/src/styles.txt
@@ -0,0 +1,119 @@
+.. -*- mode: rst -*-
+
+======
+Styles
+======
+
+Pygments comes with some builtin styles that work for both the HTML and
+LaTeX formatter.
+
+The builtin styles can be looked up with the `get_style_by_name` function:
+
+.. sourcecode:: pycon
+
+    >>> from pygments.styles import get_style_by_name
+    >>> get_style_by_name('colorful')
+    <class 'pygments.styles.colorful.ColorfulStyle'>
+
+You can pass a instance of a `Style` class to a formatter as the `style`
+option in form of a string:
+
+.. sourcecode:: pycon
+
+    >>> from pygments.styles import get_style_by_name
+    >>> HtmlFormatter(style='colorful').style
+    <class 'pygments.styles.colorful.ColorfulStyle'>
+
+Or you can also import your own style (which must be a subclass of
+`pygments.style.Style`) and pass it to the formatter:
+
+.. sourcecode:: pycon
+
+    >>> from yourapp.yourmodule import YourStyle
+    >>> HtmlFormatter(style=YourStyle).style
+    <class 'yourapp.yourmodule.YourStyle'>
+
+
+Creating Own Styles
+===================
+
+So, how to create a style? All you have to do is to subclass `Style` and
+define some styles:
+
+.. sourcecode:: python
+
+    from pygments.style import Style
+    from pygments.token import Keyword, Name, Comment, String, Error, \
+         Number, Operator, Generic
+
+    class YourStyle(Style):
+        default_style = ""
+        styles = {
+            Comment:                'italic #888',
+            Keyword:                'bold #005',
+            Name:                   '#f00',
+            Name.Function:          '#0f0',
+            Name.Class:             'bold #0f0',
+            String:                 'bg:#eee #111'
+        }
+
+That's it. There are just a few rules. When you define a style for `Name`
+the style automatically also affects `Name.Function` and so on. If you
+defined ``'bold'`` and you don't want boldface for a subtoken use ``'nobold'``.
+
+(Philosophy: the styles aren't written in CSS syntax since this way
+they can be used for a variety of formatters.)
+
+`default_style` is the style inherited by all token types.
+
+
+Style Rules
+===========
+
+Here a small overview over all allowed styles:
+
+``bold``
+    render text as bold
+``nobold``
+    don't render text as bold (to prevent subtokens behing highlighted bold)
+``italic``
+    render text italic
+``noitalic``
+    don't render text as italic
+``underline``
+    render text underlined
+``nounderline``
+    don't render text underlined
+``bg:``
+    transparent background
+``bg:#000000``
+    background color (black)
+``border:``
+    no border
+``border:#ffffff``
+    border color (white)
+``#ff0000``
+    text color (red)
+``noinherit``
+    don't inherit styles from supertoken
+
+Note that there may not be a space between ``bg:`` and the color value
+since the style definition string is split at whitespace.
+Also, using named colors is not allowed since the supported color names
+vary for different formatters.
+
+Furthermore, not all lexers might support every style.
+
+
+Builtin Styles
+==============
+
+Pygments ships some builtin styles which are maintained by the Pygments team.
+
+To get a list of known styles you can use this snippet:
+
+.. sourcecode:: pycon
+
+    >>> from pygments.styles import STYLE_MAP
+    >>> STYLE_MAP.keys()
+    ['default', 'emacs', 'friendly', 'colorful']
diff --git a/docs/src/tokens.txt b/docs/src/tokens.txt
new file mode 100644
index 00000000..47d8feea
--- /dev/null
+++ b/docs/src/tokens.txt
@@ -0,0 +1,284 @@
+.. -*- mode: rst -*-
+
+==============
+Builtin Tokens
+==============
+
+Inside the `pygments.token` module, there is a special object called `Token` that
+is used to create token types.
+
+You can create a new token type by accessing an attribute of `Token`:
+
+.. sourcecode:: pycon
+
+    >>> from pygments.token import Token
+    >>> Token.String
+    Token.String
+    >>> Token.String is Token.String
+    True
+
+Note that tokens are singletons so you can use the ``is`` operator for comparing
+token types.
+
+In principle, you can create an unlimited number of token types but nobody can
+guarantee that a style would define style rules for a token type. Because of
+that, Pygments proposes some global token types defined in the
+`pygments.token.STANDARD_TYPES` dict.
+
+For some tokens aliases are already defined:
+
+.. sourcecode:: pycon
+
+    >>> from pygments.token import String
+    >>> String
+    Token.Literal.String
+
+Inside the `pygments.token` module the following aliases are defined:
+
+=========== =============================== ==================================
+`Text`      `Token.Text`                    for any type of text data
+`Error`     `Token.Error`                   represents lexer errors
+`Other`     `Token.Other`                   special token for data not
+                                            matched by a parser (e.g. HTML
+                                            markup in PHP code)
+`Keyword`   `Token.Keyword`                 any kind of keywords
+`Name`      `Token.Name`                    variable/function names
+`Literal`   `Token.Literal`                 Any literals
+`String`    `Token.Literal.String`          string literals
+`Number`    `Token.Literal.Number`          number literals
+`Operator`  `Token.Operator`                operators (``+``, ``not`` etc)
+`Comment`   `Token.Comment`                 any kind of comments
+`Generic`   `Token.Generic`                 generic tokens (have a look at
+                                            the explanation below)
+=========== =============================== ==================================
+
+Normally you just create token types using the already defined aliases. For each
+of those token aliases, a number of subtypes exists (excluding the special tokens
+`Token.Text`, `Token.Error` and `Token.Other`)
+
+
+Keyword Tokens
+==============
+
+`Keyword`
+    For any kind of keyword (especially if it doesn't match any of the
+    subtypes of course).
+
+`Keyword.Constant`
+    For keywords that are constants (e.g. ``None`` in future Python versions).
+
+`Keyword.Declaration`
+    For keywords used for variable declaration (e.g. ``var`` in some programming
+    languages like JavaScript).
+
+`Keyword.Pseudo`
+    For keywords that aren't really keywords (e.g. ``None`` in old Python
+    versions).
+
+`Keyword.Reserved`
+    For reserved keywords.
+
+`Keyword.Type`
+    For builtin types that can't be used as identifiers (e.g. ``int``,
+    ``char`` etc. in C).
+
+
+Name Tokens
+===========
+
+`Name`
+    For any name (variable names, function names, classes).
+
+`Name.Attribute`
+    For all attributes (e.g. in HTML tags).
+
+`Name.Builtin`
+    Builtin names; names that are available in the global namespace.
+
+`Name.Builtin.Pseudo`
+    Builtin names that are implicit (e.g. ``self`` in Ruby, ``this`` in Java).
+
+`Name.Class`
+    Class names. Because no lexer can know if a name is a class or a function
+    or something else this token is meant for class declarations.
+
+`Name.Constant`
+    Token type for constants. In some languages you can recognise a token by the
+    way it's defined (the value after a ``const`` keyword for example). In
+    other languages constants are uppercase by definition (Ruby).
+
+`Name.Decorator`
+    Token type for decorators. Decorators are synatic elements in the Python
+    language. Similar syntax elements exist in C# and Java.
+
+`Name.Entity`
+    Token type for special entities. (e.g. ``&nbsp;`` in HTML).
+
+`Name.Exception`
+    Token type for exception names (e.g. ``RuntimeError`` in Python). Some languages
+    define exceptions in the function signature (Java). You can highlight
+    the name of that exception using this token then.
+
+`Name.Function`
+    Token type for function names.
+
+`Name.Label`
+    Token type for label names (e.g. in languages that support ``goto``).
+
+`Name.Namespace`
+    Token type for namespaces. (e.g. import paths in Java/Python), names following
+    the ``module``/``namespace`` keyword in other languages.
+
+`Name.Other`
+    Other names. Normally unused.
+
+`Name.Tag`
+    Tag names (in HTML/XML markup or configuration files).
+
+`Name.Variable`
+    Token type for variables. Some languages have prefixes for variable names
+    (PHP, Ruby, Perl). You can highlight them using this token.
+
+`Name.Variable.Class`
+    same as `Name.Variable` but for class variables (also static variables).
+
+`Name.Variable.Global`
+    same as `Name.Variable` but for global variables (used in Ruby, for
+    example).
+
+`Name.Variable.Instance`
+    same as `Name.Variable` but for instance variables.
+
+
+Literals
+========
+
+`Literal`
+    For any literal (if not further defined).
+
+`Literal.Date`
+    for date literals (e.g. ``42d`` in Boo).
+
+
+`String`
+    For any string literal.
+
+`String.Backtick`
+    Token type for strings enclosed in backticks.
+
+`String.Char`
+    Token type for single characters (e.g. Java, C).
+
+`String.Doc`
+    Token type for documentation strings (for example Python).
+
+`String.Double`
+    Double quoted strings.
+
+`String.Escape`
+    Token type for escape sequences in strings.
+
+`String.Heredoc`
+    Token type for "heredoc" strings (e.g. in Ruby or Perl).
+
+`String.Interpol`
+    Token type for interpolated parts in strings (e.g. ``#{foo}`` in Ruby).
+
+`String.Other`
+    Token type for any other strings (for example ``%q{foo}`` string constructs
+    in Ruby).
+
+`String.Regex`
+    Token type for regular expression literals (e.g. ``/foo/`` in JavaScript).
+
+`String.Single`
+    Token type for single quoted strings.
+
+`String.Symbol`
+    Token type for symbols (e.g. ``:foo`` in LISP or Ruby).
+
+
+`Number`
+    Token type for any number literal.
+
+`Number.Float`
+    Token type for float literals (e.g. ``42.0``).
+
+`Number.Hex`
+    Token type for hexadecimal number literals (e.g. ``0xdeadbeef``).
+
+`Number.Integer`
+    Token type for integer literals (e.g. ``42``).
+
+`Number.Integer.Long`
+    Token type for long integer literals (e.g. ``42L`` in Python).
+
+`Number.Oct`
+    Token type for octal literals.
+
+
+Operators
+=========
+
+`Operator`
+    For any punctuation operator (e.g. ``+``, ``-``).
+
+`Operator.Word`
+    For any operator that is a word (e.g. ``not``).
+
+
+Comments
+========
+
+`Comment`
+    Token type for any comment.
+
+`Comment.Multiline`
+    Token type for multiline comments.
+
+`Comment.Preproc`
+    Token type for preprocessor comments (also ``<?php``/``<%`` constructs).
+
+`Comment.Single`
+    Token type for comments that end at the end of a line (e.g. ``# foo``).
+
+
+Generic Tokens
+==============
+
+Generic tokens are for special lexers like the `DiffLexer` that doesn't really
+highlight a programming language but a patch file.
+
+
+`Generic`
+    A generic, unstyled token. Normally you don't use this token type.
+
+`Generic.Deleted`
+    Marks the token value as deleted.
+
+`Generic.Emph`
+    Marks the token value as emphasized.
+
+`Generic.Error`
+    Marks the token value as an error message.
+
+`Generic.Heading`
+    Marks the token value as headline.
+
+`Generic.Inserted`
+    Marks the token value as inserted.
+
+`Generic.Output`
+    Marks the token value as program output (e.g. for python cli lexer).
+
+`Generic.Prompt`
+    Marks the token value as command prompt (e.g. bash lexer).
+
+`Generic.Strong`
+    Marks the token value as bold (e.g. for rst lexer).
+
+`Generic.Subheading`
+    Marks the token value as subheadline.
+
+`Generic.Traceback`
+    Marks the token value as a part of an error traceback.
author	gbrandl <devnull@localhost>	2006-10-19 20:27:28 +0200
committer	gbrandl <devnull@localhost>	2006-10-19 20:27:28 +0200
commit	f4d019954468db777760d21f9243eca8b852c184 (patch)
tree	328b8f8fac25338306b0e7b827686dcc7597df23 /docs/src
download	pygments-f4d019954468db777760d21f9243eca8b852c184.tar.gz