summaryrefslogtreecommitdiff
path: root/docs/src
diff options
context:
space:
mode:
authorgbrandl <devnull@localhost>2006-10-19 20:27:28 +0200
committergbrandl <devnull@localhost>2006-10-19 20:27:28 +0200
commitf4d019954468db777760d21f9243eca8b852c184 (patch)
tree328b8f8fac25338306b0e7b827686dcc7597df23 /docs/src
downloadpygments-f4d019954468db777760d21f9243eca8b852c184.tar.gz
[svn] Name change, round 4 (rename SVN root folder).
Diffstat (limited to 'docs/src')
-rw-r--r--docs/src/api.txt183
-rw-r--r--docs/src/cmdline.txt58
-rw-r--r--docs/src/formatterdev.txt169
-rw-r--r--docs/src/formatters.txt249
-rw-r--r--docs/src/index.txt51
-rw-r--r--docs/src/installation.txt47
-rw-r--r--docs/src/lexerdevelopment.txt482
-rw-r--r--docs/src/lexers.txt521
-rw-r--r--docs/src/quickstart.txt121
-rw-r--r--docs/src/rstdirective.txt42
-rw-r--r--docs/src/styles.txt119
-rw-r--r--docs/src/tokens.txt284
12 files changed, 2326 insertions, 0 deletions
diff --git a/docs/src/api.txt b/docs/src/api.txt
new file mode 100644
index 00000000..880687f1
--- /dev/null
+++ b/docs/src/api.txt
@@ -0,0 +1,183 @@
+.. -*- mode: rst -*-
+
+====================
+The full Pygments API
+====================
+
+This page describes the Pygments API.
+
+High-level API
+==============
+
+Functions from the `pygments` module:
+
+def `lex(code, lexer):`
+ Lex `code` with the `lexer` (must be a `Lexer` instance)
+ and return an iterable of tokens. Currently, this only calls
+ `lexer.get_tokens()`.
+
+def `format(tokens, formatter, outfile=None):`
+ Format a token stream (iterable of tokens) `tokens` with the
+ `formatter` (must be a `Formatter` instance). The result is
+ written to `outfile`, or if that is ``None``, returned as a
+ string.
+
+def `highlight(code, lexer, formatter, outfile=None):`
+ This is the most high-level highlighting function.
+ It combines `lex` and `format` in one function.
+
+
+Functions from `pygments.lexers`:
+
+def `get_lexer_by_name(alias, **options):`
+ Return an instance of a `Lexer` subclass that has `alias` in its
+ aliases list. The lexer is given the `options` at its
+ instantiation.
+
+ Will raise `ValueError` if no lexer with that alias is found.
+
+def `get_lexer_for_filename(fn, **options):`
+ Return a `Lexer` subclass instance that has a filename pattern
+ matching `fn`. The lexer is given the `options` at its
+ instantiation.
+
+ Will raise `ValueError` if no lexer for that filename is found.
+
+
+Functions from `pygments.formatters`:
+
+def `get_formatter_by_name(alias, **options):`
+ Return an instance of a `Formatter` subclass that has `alias` in its
+ aliases list. The formatter is given the `options` at its
+ instantiation.
+
+ Will raise `ValueError` if no formatter with that alias is found.
+
+def `get_formatter_for_filename(fn, **options):`
+ Return a `Formatter` subclass instance that has a filename pattern
+ matching `fn`. The formatter is given the `options` at its
+ instantiation.
+
+ Will raise `ValueError` if no formatter for that filename is found.
+
+
+Lexers
+======
+
+A lexer (derived from `pygments.lexer.Lexer`) has the following functions:
+
+def `__init__(self, **options):`
+ The constructor. Takes a \*\*keywords dictionary of options.
+ Every subclass must first process its own options and then call
+ the `Lexer` constructor, since it processes the `stripnl`,
+ `stripall` and `tabsize` options.
+
+ An example looks like this:
+
+ .. sourcecode:: python
+
+ def __init__(self, **options):
+ self.compress = options.get('compress', '')
+ Lexer.__init__(self, **options)
+
+ As these options must all be specifiable as strings (due to the
+ command line usage), there are various utility functions
+ available to help with that, see `Option processing`_.
+
+def `get_tokens(self, text):`
+ This method is the basic interface of a lexer. It is called by
+ the `highlight()` function. It must process the text and return an
+ iterable of ``(tokentype, value)`` pairs from `text`.
+
+ Normally, you don't need to override this method. The default
+ implementation processes the `stripnl`, `stripall` and `tabsize`
+ options and then yields all tokens from `get_tokens_unprocessed()`,
+ with the ``index`` dropped.
+
+def `get_tokens_unprocessed(self, text):`
+ This method should process the text and return an iterable of
+ ``(index, tokentype, value)`` tuples where ``index`` is the starting
+ position of the token within the input text.
+
+ This method must be overridden by subclasses.
+
+For a list of known tokens have a look at the `Tokens`_ page.
+
+The lexer also recognizes the following attributes that are used by the
+builtin lookup mechanism.
+
+`name`
+ Full name for the lexer, in human-readable form.
+
+`aliases`
+ A list of short, unique identifiers that can be used to lookup
+ the lexer from a list.
+
+`filenames`
+ A list of `fnmatch` patterns that can be used to find a lexer for
+ a given filename.
+
+
+.. _Tokens: tokens.txt
+
+
+Formatters
+==========
+
+A formatter (derived from `pygments.formatter.Formatter`) has the following
+functions:
+
+def `__init__(self, **options):`
+ As with lexers, this constructor processes options and then must call
+ the base class `__init__`.
+
+ The `Formatter` class recognizes the options `style`, `full` and
+ `title`. It is up to the formatter class whether it uses them.
+
+def `get_style_defs(self, arg=''):`
+ This method must return statements or declarations suitable to define
+ the current style for subsequent highlighted text (e.g. CSS classes
+ in the `HTMLFormatter`).
+
+ The optional argument `arg` can be used to modify the generation and
+ is formatter dependent (it is standardized because it can be given on
+ the command line).
+
+ This method is called by the ``-S`` `command-line option`_, the `arg`
+ is then given by the ``-a`` option.
+
+def `format(self, tokensource, outfile):`
+ This method must format the tokens from the `tokensource` iterable and
+ write the formatted version to the file object `outfile`.
+
+ Formatter options can control how exactly the tokens are converted.
+
+.. _command-line option: cmdline.txt
+
+
+Option processing
+=================
+
+The `pygments.util` module has some utility functions usable for option
+processing:
+
+class `OptionError`
+ This exception will be raised by all option processing functions if
+ the type of the argument is not correct.
+
+def `get_bool_opt(options, optname, default=None):`
+ Interpret the key `optname` from the dictionary `options`
+ as a boolean and return it. Return `default` if `optname`
+ is not in `options`.
+
+ The valid string values for ``True`` are ``1``, ``yes``,
+ ``true`` and ``on``, the ones for ``False`` are ``0``,
+ ``no``, ``false`` and ``off`` (matched case-insensitively).
+
+def `get_int_opt(options, optname, default=None):`
+ As `get_bool_opt`, but interpret the value as an integer.
+
+def `get_list_opt(options, optname, default=None):`
+ If the key `optname` from the dictionary `options` is a string,
+ split it at whitespace and return it. If it is already a list
+ or a tuple, it is returned as a list.
diff --git a/docs/src/cmdline.txt b/docs/src/cmdline.txt
new file mode 100644
index 00000000..461ecb32
--- /dev/null
+++ b/docs/src/cmdline.txt
@@ -0,0 +1,58 @@
+.. -*- mode: rst -*-
+
+======================
+Command Line Interface
+======================
+
+You can use Pygments from the shell, provided you installed the `pygmentize` script::
+
+ $ pygmentize test.py
+ print "Hello World"
+
+will print the file test.py to standard output, using the Python lexer
+(inferred from the file name extension) and the terminal formatter (because
+you didn't give an explicit formatter name).
+
+If you want HTML output::
+
+ $ pygmentize -f html -l python -o test.html test.py
+
+As you can see, the -l option explicitly selects a lexer. As seen above, if you
+give an input file name and it has an extension that Pygments recognizes, you can
+omit this option.
+
+The ``-o`` option gives an output file name. If it is not given, output is
+written to stdout.
+
+The ``-f`` option selects a formatter (as with ``-l``, it can also be omitted
+if an output file name is given and has a supported extension).
+If no output file name is given and ``-f`` is omitted, the
+`TerminalFormatter` is used.
+
+The above command could therefore also be given as::
+
+ $ pygmentize -o test.html test.py
+
+Lexer and formatter options can be given using the ``-O`` option::
+
+ $ pygmentize -f html -O style=colorful,linenos=1 -l python test.py
+
+Be sure to enclose the option string in quotes if it contains any special
+shell characters, such as spaces or expansion wildcards like ``*``.
+
+There's a special ``-S`` option for generating style definitions. Usage is
+as follows::
+
+ $ pygmentize -f html -S colorful -a .syntax
+
+generates a CSS style sheet (because you selected the HTML formatter) for
+the "colorful" style prepending a ".syntax" selector to all style rules.
+
+For an explanation what ``-a`` means for `a particular formatter`_, look for
+the `arg` argument for the formatter's `get_style_defs()` method.
+
+The ``-L`` option lists all lexers and formatters, along with their short
+names and supported file name extensions.
+
+
+.. _a particular formatter: formatters.txt
diff --git a/docs/src/formatterdev.txt b/docs/src/formatterdev.txt
new file mode 100644
index 00000000..82208aa0
--- /dev/null
+++ b/docs/src/formatterdev.txt
@@ -0,0 +1,169 @@
+.. -*- mode: rst -*-
+
+========================
+Write your own formatter
+========================
+
+As well as creating `your own lexer <lexerdevelopment.txt>`_, writing a new
+formatter for Pygments is easy and straightforward.
+
+A formatter is a class that is initialized with some keyword arguments (the
+formatter options) and that must provides a `format()` method.
+Additionally a formatter should provide a `get_style_defs()` method that
+returns the style definitions from the style in a form usable for the
+formatter's output format.
+
+
+Quickstart
+==========
+
+The most basic formatter shipped with Pygments is the `NullFormatter`. It just
+sends the value of a token to the output stream:
+
+.. sourcecode:: python
+
+ from pygments.formatter import Formatter
+
+ class NullFormatter(Formatter):
+ def format(self, tokensource, outfile):
+ for ttype, value in tokensource:
+ outfile.write(value)
+
+As you can see, the `format()` method is passed two parameters: `tokensource`
+and `outfile`. The first is an iterable of ``(token_type, value)`` tuples,
+the latter a file like object with a `write()` method.
+
+Because the formatter is that basic it doesn't overwrite the `get_style_defs()`
+method.
+
+
+Styles
+======
+
+Styles aren't instantiated but their metaclass provides some class functions
+so that you can access the style definitions easily.
+
+Styles are iterable and yield tuples in the form ``(ttype, d)`` where `ttype`
+is a token and `d` is a dict with the following keys:
+
+``'color'``
+ Hexadecimal color value (eg: ``'ff0000'`` for red) or `None` if not
+ defined.
+
+``'bold'``
+ `True` if the value should be bold
+
+``'italic'``
+ `True` if the value should be italic
+
+``'underline'``
+ `True` if the value should be underlined
+
+``'bgcolor'``
+ Hexadecimal color value for the background (eg: ``'eeeeeee'`` for light
+ gray) or `None` if not defined.
+
+``'border'``
+ Hexadecimal color value for the border (eg: ``'0000aa'`` for a dark
+ blue) or `None` for no border.
+
+Additional keys might appear in the future, formatters should ignore all keys
+they don't support.
+
+
+HTML 3.2 Formatter
+==================
+
+For an more complex example, let's implement a HTML 3.2 Formatter. We don't
+use CSS but inline markup (``<u>``, ``<font>``, etc). Because this isn't good
+style this formatter isn't in the standard library ;-)
+
+.. sourcecode:: python
+
+ from pygments.formatter import Formatter
+
+ class OldHtmlFormatter(Formatter):
+
+ def __init__(self, **options):
+ Formatter.__init__(self, **options)
+
+ # create a dict of (start, end) tuples that wrap the
+ # value of a token so that we can use it in the format
+ # method later
+ self.styles = {}
+
+ # we iterate over the `_styles` attribute of a style item
+ # that contains the parsed style values.
+ for token, style in self.style:
+ start = end = ''
+ # a style item is a tuple in the following form:
+ # colors are readily specified in hex: 'RRGGBB'
+ if style['color']:
+ start += '<font color="#%s">' % color
+ end += '</font>'
+ if style['bold']:
+ start += '<b>'
+ end += '</b>'
+ if style['italic']:
+ start += '<i>'
+ end += '</i>'
+ if style['underline']:
+ start += '<u>'
+ end += '</u>'
+ self.styles[token] = (start, end)
+
+ def format(self, tokensource, outfile):
+ # lastval is a string we use for caching
+ # because it's possible that an lexer yields a number
+ # of consecutive tokens with the same token type.
+ # to minimize the size of the generated html markup we
+ # try to join the values of same-type tokens here
+ lastval = ''
+ lasttype = None
+
+ # wrap the whole output with <pre>
+ outfile.write('<pre>')
+
+ for ttype, value in tokensource:
+ # if the token type doesn't exist in the stylemap
+ # we try it with the parent of the token type
+ # eg: parent of Token.Literal.String.Double is
+ # Token.Literal.String
+ while ttype not in self.styles:
+ ttype = ttype.parent
+ if ttype == lasttype:
+ # the current token type is the same of the last
+ # iteration. cache it
+ lastval += value
+ else:
+ # not the same token as last iteration, but we
+ # have some data in the buffer. wrap it with the
+ # defined style and write it to the output file
+ if lastval:
+ stylebegin, styleend = self.styles[lasttype]
+ outfile.write(stylebegin + lastval + styleend)
+ # set lastval/lasttype to current values
+ lastval = value
+ lasttype = ttype
+
+ # if something is left in the buffer, write it to the
+ # output file, then close the opened <pre> tag
+ if lastval:
+ stylebegin, styleend = self.styles[lasttype]
+ outfile.write(stylebegin + lastval + styleend)
+ outfile.write('</pre>\n')
+
+The comments should explain it. Again, this formatter doesn't override the
+`get_style_defs()` method. If we would have used CSS classes instead of
+inline HTML markup, we would need to generate the CSS first. For that
+purpose the `get_style_defs()` method exists:
+
+
+Generating Style Definitions
+============================
+
+Some formatters like the `LatexFormatter` and the `HtmlFormatter` don't
+output inline markup but reference either macros or css classes. Because
+the definitions of those are not part of the output, the `get_style_defs()`
+method exists. It is passed one parameter (if it's used and how it's used
+is up to the formatter) and has to return a string or ``None``.
diff --git a/docs/src/formatters.txt b/docs/src/formatters.txt
new file mode 100644
index 00000000..08b52cc7
--- /dev/null
+++ b/docs/src/formatters.txt
@@ -0,0 +1,249 @@
+.. -*- mode: rst -*-
+
+====================
+Available formatters
+====================
+
+This page lists all builtin formatters.
+
+Common options
+==============
+
+The `HtmlFormatter` and `LatexFormatter` classes support these options:
+
+`style`
+ The style to use, can be a string or a Style subclass (default:
+ ``'default'``).
+
+`full`
+ Tells the formatter to output a "full" document, i.e. a complete
+ self-contained document (default: ``False``).
+
+`title`
+ If `full` is true, the title that should be used to caption the
+ document (default: ``''``).
+
+`linenos`
+ If set to ``True``, output line numbers (default: ``False``).
+
+`linenostart`
+ The line number for the first line (default: ``1``).
+
+`linenostep`
+ If set to a number n > 1, only every nth line number is printed.
+
+
+Formatter classes
+=================
+
+All these classes are importable from `pygments.formatters`.
+
+
+`HtmlFormatter`
+---------------
+
+ Formats tokens as HTML 4 ``<span>`` tags within a ``<pre>`` tag, wrapped
+ in a ``<div>`` tag. The ``<div>``'s CSS class can be set by the `cssclass`
+ option.
+
+ If the `linenos` option is given and true, the ``<pre>`` is additionally
+ wrapped inside a ``<table>`` which has one row and two cells: one
+ containing the line numbers and one containing the code. Example:
+
+ .. sourcecode:: html
+
+ <div class="highlight" >
+ <table><tr>
+ <td class="linenos" title="click to toggle"
+ onclick="with (this.firstChild.style)
+ { display = (display == '') ? 'none' : '' }">
+ <pre>1
+ 2</pre>
+ </td>
+ <td class="code">
+ <pre><span class="Ke">def </span><span class="NaFu">foo</span>(bar):
+ <span class="Ke">pass</span>
+ </pre>
+ </td>
+ </tr></table></div>
+
+ (whitespace added to improve clarity). Wrapping can be disabled using the
+ `nowrap` option.
+
+ With the `full` option, a complete HTML 4 document is output, including
+ the style definitions inside a ``<style>`` tag.
+
+ The `get_style_defs(arg='')` method of a `HtmlFormatter` returns a string
+ containing CSS rules for the CSS classes used by the formatter. The
+ argument `arg` can be used to specify additional CSS selectors that
+ are prepended to the classes. A call `fmter.get_style_defs('td .code')`
+ would result in the following CSS classes:
+
+ .. sourcecode:: css
+
+ td .code .kw { font-weight: bold; color: #00FF00 }
+ td .code .cm { color: #999999 }
+ ...
+
+ Additional options accepted by the `HtmlFormatter`:
+
+ `nowrap`
+ If set to ``True``, don't wrap the tokens at all, not even in a ``<pre>``
+ tag. This disables all other options (default: ``False``).
+
+ `noclasses`
+ If set to true, token ``<span>`` tags will not use CSS classes, but
+ inline styles. This is not recommended for larger pieces of code since
+ it increases output size by quite a bit (default: ``False``).
+
+ `classprefix`
+ Since the token types use relatively short class names, they may clash
+ with some of your own class names. In this case you can use the
+ `classprefix` option to give a string to prepend to all Pygments-generated
+ CSS class names for token types.
+ Note that this option also affects the output of `get_style_defs()`.
+
+ `cssclass`
+ CSS class for the wrapping ``<div>`` tag (default: ``'highlight'``).
+
+ `cssstyles`
+ Inline CSS styles for the wrapping ``<div>`` tag (default: ``''``).
+
+ `linenospecial`
+ If set to a number n > 0, every nth line number is given the CSS
+ class ``"special"`` (default: ``0``).
+
+ :Aliases: ``html``
+ :Filename patterns: ``*.html``, ``*.htm``
+
+
+`LatexFormatter`
+----------------
+
+ Formats tokens as LaTeX code. This needs the `fancyvrb` and `color`
+ standard packages.
+
+ Without the `full` option, code is formatted as one ``Verbatim``
+ environment, like this:
+
+ .. sourcecode:: latex
+
+ \begin{Verbatim}[commandchars=@\[\]]
+ @Can[def ]@Cax[foo](bar):
+ @Can[pass]
+ \end{Verbatim}
+
+ The command sequences used here (``@Can`` etc.) are generated from the given
+ `style` and can be retrieved using the `get_style_defs` method.
+
+ With the `full` option, a complete LaTeX document is output, including
+ the command definitions in the preamble.
+
+ The `get_style_defs(arg='')` method of a `LatexFormatter` returns a string
+ containing ``\newcommand`` commands defining the commands used inside the
+ ``Verbatim`` environments. If the argument `arg` is true,
+ ``\renewcommand`` is used instead.
+
+ Additional options accepted by the `LatexFormatter`:
+
+ `docclass`
+ If the `full` option is enabled, this is the document class to use
+ (default: ``'article'``).
+
+ `preamble`
+ If the `full` option is enabled, this can be further preamble commands,
+ e.g. ``\usepackage`` (default: ``''``).
+
+ `verboptions`
+ Additional options given to the Verbatim environment (see the *fancyvrb*
+ docs for possible values) (default: ``''``).
+
+ :Aliases: ``latex``, ``tex``
+ :Filename pattern: ``*.tex``
+
+
+`BBCodeFormatter`
+-----------------
+
+ Formats tokens with BBcodes. These formatting codes are used by many
+ bulletin boards, so you can highlight your sourcecode with pygments before
+ posting it there.
+
+ This formatter has no support for background colors and borders, as there
+ are no common BBcode tags for that.
+
+ Some board systems (e.g. phpBB) don't support colors in their [code] tag,
+ so you can't use the highlighting together with that tag.
+ Text in a [code] tag usually is shown with a monospace font (which this
+ formatter can do with the ``monofont`` option) and no spaces (which you
+ need for indentation) are removed.
+
+ The `BBCodeFormatter` accepts two additional option:
+
+ `codetag`
+ If set to true, put the output into ``[code]`` tags (default:
+ ``false``)
+
+ `monofont`
+ If set to true, add a tag to show the code with a monospace font
+ (default: ``false``).
+
+ :Aliases: ``bbcode``, ``bb``
+ :Filename pattern: None
+
+
+`TerminalFormatter`
+-------------------
+
+ Formats tokens with ANSI color sequences, for output in a text console.
+ Color sequences are terminated at newlines, so that paging the output
+ works correctly.
+
+ The `get_style_defs()` method doesn't do anything special since there is
+ no support for common styles.
+
+ The TerminalFormatter class supports only these options:
+
+ `bg`
+ Set to ``"light"`` or ``"dark"`` depending on the terminal's background
+ (default: ``"light"``).
+
+ `colorscheme`
+ A dictionary mapping token types to (lightbg, darkbg) color names or
+ ``None`` (default: ``None`` = use builtin colorscheme).
+
+ `debug`
+ If this option is true, output the string "<<ERROR>>" after each error
+ token. This is meant as a help for debugging Pygments (default: ``False``).
+
+ :Aliases: ``terminal``, ``console``
+ :Filename pattern: None
+
+
+`RawTokenFormatter`
+-------------------
+
+ Formats tokens as a raw representation for storing token streams.
+
+ The format is ``tokentype<TAB>repr(tokenstring)\n``. The output can later
+ be converted to a token stream with the `RawTokenLexer`, described in the
+ `lexer list <lexers.txt>`_.
+
+ One option is accepted:
+
+ `compress`
+ If set to ``'gz'`` or ``'bz2'``, compress the output with the given
+ compression algorithm after encoding (default: ``''``).
+
+ :Aliases: ``raw``, ``tokens``
+ :Filename pattern: ``*.raw``
+
+
+`NullFormatter`
+---------------
+
+ Just output all tokens, don't format in any way.
+
+ :Aliases: ``text``, ``null``
+ :Filename pattern: ``*.txt``
+
diff --git a/docs/src/index.txt b/docs/src/index.txt
new file mode 100644
index 00000000..33874d1c
--- /dev/null
+++ b/docs/src/index.txt
@@ -0,0 +1,51 @@
+.. -*- mode: rst -*-
+
+========
+Overview
+========
+
+Welcome to the Pygments documentation.
+
+- Starting with Pygments
+
+ - `Installation <installation.txt>`_
+
+ - `Quickstart <quickstart.txt>`_
+
+ - `Command line interface <cmdline.txt>`_
+
+- Essential to know
+
+ - `Builtin lexers <lexers.txt>`_
+
+ - `Builtin formatters <formatters.txt>`_
+
+ - `Styles <styles.txt>`_
+
+- API and more
+
+ - `API documentation <api.txt>`_
+
+ - `Builtin Tokens <tokens.txt>`_
+
+- Hacking for Pygments
+
+ - `Write your own lexer <lexerdevelopment.txt>`_
+
+ - `Write your own formatter <formatterdev.txt>`_
+
+- Hints and Tricks
+
+ - `Using Pygments in ReST documents <rstdirective.txt>`_
+
+
+--------------
+
+If you find bugs or have suggestions for the documentation, please
+look `here`_ for info on how to contact the team.
+
+You can download an offline version of this documentation from the
+`download page`_.
+
+.. _here: http://pygments.pocoo.org/contribute
+.. _download page: http://pygments.pocoo.org/download
diff --git a/docs/src/installation.txt b/docs/src/installation.txt
new file mode 100644
index 00000000..708592a8
--- /dev/null
+++ b/docs/src/installation.txt
@@ -0,0 +1,47 @@
+.. -*- mode: rst -*-
+
+============
+Installation
+============
+
+Pygments requires at least Python 2.3 to work correctly. Just to clarify:
+there *wont't* ever be support for Python versions below 2.3.
+
+
+Install the Release Version
+===========================
+
+1. download the recent tarball from the `download page`_
+2. unpack the tarball
+3. ``sudo python setup.py install``
+
+Note that the last command will automatically download and install
+`setuptools`_ if you don't already have it installed. This requires a working
+internet connection.
+
+This will install Pygments into your Python installation's site-packages directory.
+
+
+Install via easy_install
+========================
+
+You can also install the most recent Pygments version using `easy_install`_::
+
+ sudo easy_install Pygments
+
+This will install a Pygments egg in your Python installation's site-packages
+directory.
+
+
+Installing the development Version
+==================================
+
+1. Install `subversion`_
+2. ``svn co http://trac.pocoo.org/repos/pygments/trunk pygments``
+3. ``ln -s `pwd`/pygments/pygments /usr/lib/python2.X/site-packages``
+
+
+.. _download page: http://pygments.pocoo.org/download/
+.. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools
+.. _easy_install: http://peak.telecommunity.com/DevCenter/EasyInstall
+.. _subversion: http://subversion.tigris.org/
diff --git a/docs/src/lexerdevelopment.txt b/docs/src/lexerdevelopment.txt
new file mode 100644
index 00000000..619c5c39
--- /dev/null
+++ b/docs/src/lexerdevelopment.txt
@@ -0,0 +1,482 @@
+.. -*- mode: rst -*-
+
+====================
+Write your own lexer
+====================
+
+If a lexer for your favorite language is missing in the Pygments package, you can
+easily write your own and extend Pygments.
+
+All you need can be found inside the `pygments.lexer` module. As you can read in
+the `API documentation <api.txt>`_, a lexer is a class that is initialized with
+some keyword arguments (the lexer options) and that provides a
+`get_tokens_unprocessed()` method which is given a string or unicode object with
+the data to parse.
+
+The `get_tokens_unprocessed()` method must return an iterator or iterable
+containing tuples in the form ``(index, token, value)``. Normally you don't need
+to do this since there are numerous base lexers you can subclass.
+
+
+RegexLexer
+==========
+
+A very powerful (but quite easy to use) lexer is the `RegexLexer`. This lexer
+base class allows you to define lexing rules in terms of *regular expressions*
+for different *states*.
+
+States are groups of regular expressions that are matched against the input
+string at the *current position*. If one of these expressions matches, a
+corresponding action is performed (normally yielding a token with a specific
+type), the current position is set to where the last match ended and the
+matching process continues with the first regex of the current state.
+
+Lexer states are kept in a state stack: each time a new state is entered, the
+new state is pushed onto the stack. The most basic lexers (like the
+`DiffLexer`) just need one state.
+
+Each state is defined as a list of tuples in the form (`regex`, `action`,
+`new_state`) where the last item is optional. In the most basic form, `action`
+is a token type (like `Name.Builtin`). That means: When `regex` matches, emit a
+token with the match text and type `tokentype` and push `new_state` on the state
+stack. If the new state is ``'#pop'``, the topmost state is popped from the
+stack instead. (To pop more than one state, use ``'#pop:2'`` and so on.)
+``'#push'`` is a synonym for pushing the current state on the
+stack.
+
+The following example shows the `DiffLexer` from the builtin lexers. Note that
+it contains some additional attributes `name`, `aliases` and `filenames` which
+aren't required for a lexer. They are used by the builtin lexer lookup
+functions.
+
+.. sourcecode:: python
+
+ from pygments.lexer import RegexLexer
+ from pygments.token import \
+ Text, Comment, Keyword, Name, String, Generic
+
+ class DiffLexer(RegexLexer):
+ name = 'Diff'
+ aliases = ['diff']
+ filenames = ['*.diff']
+
+ tokens = {
+ 'root': [
+ (r' .*\n', Text),
+ (r'\+.*\n', Generic.Inserted),
+ (r'-.*\n', Generic.Deleted),
+ (r'@.*\n', Generic.Subheading),
+ (r'Index.*\n', Generic.Heading),
+ (r'=.*\n', Generic.Heading),
+ (r'.*\n', Text),
+ ]
+ }
+
+As you can see this lexer only uses one state. When the lexer starts scanning
+the text, it first checks if the current character is a space. If this is true
+it scans everything until newline and returns the parsed data as `Text` token.
+
+If this rule doesn't match, it checks if the current char is a plus sign. And
+so on.
+
+If no rule matches at the current position, the current char is emitted as an
+`Error` token that indicates a parsing error, and the position is increased by
+1.
+
+
+Regex Flags
+===========
+
+You can either define regex flags in the regex (``r'(?x)foo bar'``) or by adding
+a `flags` attribute to your lexer class. If no attribute is defined, it defaults
+to `re.MULTILINE`. For more informations about regular expression flags see the
+`regular expressions`_ help page in the python documentation.
+
+.. _regular expressions: http://docs.python.org/lib/re-syntax.html
+
+
+Scanning multiple tokens at once
+================================
+
+Here is a more complex lexer that highlights INI files. INI files consist of
+sections, comments and key = value pairs:
+
+.. sourcecode:: python
+
+ from pygments.lexer import RegexLexer, bygroups
+
+ class IniLexer(RegexLexer):
+ name = 'INI'
+ aliases = ['ini', 'cfg']
+ filenames = ['*.ini', '*.cfg']
+
+ tokens = {
+ 'root': [
+ (r'\s+', Text),
+ (r';.*?$', Comment),
+ (r'\[.*?\]$', Keyword),
+ (r'(.*?)(\s*)(=)(\s*)(.*?)$',
+ bygroups(Name.Attribute, Text, Operator, Text, String))
+ ]
+ }
+
+The lexer first looks for whitespace, comments and section names. And later it
+looks for a line that looks like a key, value pair, seperated by an ``'='``
+sign, and optional whitespace.
+
+The `bygroups` helper makes sure that aach group is yielded with a different
+token type. First the `Name.Attribute` token, then a `Text` token for the
+optional whitespace, after that a `Operator` token for the equals sign. Then a
+`Text` token for the whitespace again. The rest of the line is returned as
+`String`.
+
+Note that for this to work, every part of the match must be inside a capturing
+group (a ``(...)``), and there must not be any nested capturing groups. If you
+nevertheless need a group, use a non-capturing group defined using this syntax:
+``r'(?:some|words|here)'`` (note the ``?:`` after the beginning parenthesis).
+
+
+Changing states
+===============
+
+Many lexers need multiple states to work as expected. For example, some
+languages allow multiline comments to be nested. Since this is a recursive
+pattern it's impossible to lex just using regular expressions.
+
+Here is the solution:
+
+.. sourcecode:: python
+
+ class ExampleLexer(RegexLexer):
+ name = 'Example Lexer with states'
+
+ tokens = {
+ 'root': [
+ (r'[^/]+', Text),
+ (r'/\*', Comment.Multiline, 'comment'),
+ (r'//.*?$', Comment.Singleline),
+ (r'/', Text)
+ ],
+ 'comment': [
+ (r'[^*/]', Comment.Multiline),
+ (r'/\*', Comment.Multiline, '#push'),
+ (r'\*/', Comment.Multiline, '#pop'),
+ (r'[*/]', Comment.Multiline)
+ ]
+ }
+
+This lexer starts lexing in the ``'root'`` state. It tries to match as much as
+possible until it finds a slash (``'/'``). If the next character after the slash
+is a star (``'*'``) the `RegexLexer` sends those two characters to the output
+stream marked as `Comment.Multiline` and continues parsing with the rules
+defined in the ``'comment'`` state.
+
+If there wasn't a star after the slash, the `RegexLexer` checks if it's a
+singleline comment (eg: followed by a second slash). If this also wasn't the
+case it must be a single slash (the separate regex for a single slash must also
+be given, else the slash would be marked as an error token).
+
+Inside the ``'comment'`` state, we do the same thing again. Scan until the lexer
+finds a star or slash. If it's the opening of a multiline comment, push the
+``'comment'`` state on the stack and continue scanning, again in the
+``'comment'`` state. Else, check if it's the end of the multiline comment. If
+yes, pop one state from the stack.
+
+Note: If you pop from an empty stack you'll get an `IndexError`. (There is an
+easy way to prevent this from happening: don't ``'#pop'`` in the root state).
+
+If the `RegexLexer` encounters a newline that is flagged as an error token, the
+stack is emptied and the lexer continues scanning in the ``'root'`` state. This
+helps producing error-tolerant highlighting for erroneous input, e.g. when a
+single-line string is not closed.
+
+
+Advanced state tricks
+=====================
+
+There are a few more things you can do with states:
+
+- You can push multiple states onto the stack if you give a tuple instead of a
+ simple string as the third item in a rule tuple. For example, if you want to
+ match a comment containing a directive, something like::
+
+ /* <processing directive> rest of comment */
+
+ you can use this rule:
+
+ .. sourcecode:: python
+
+ tokens = {
+ 'root': [
+ (r'/\* <', Comment, ('comment', 'directive')),
+ ...
+ ],
+ 'directive': [
+ (r'[^>]*', Comment.Directive),
+ (r'>', Comment, '#pop'),
+ ],
+ 'comment': [
+ (r'[^*]+', Comment),
+ (r'\*/', Comment, '#pop'),
+ (r'\*', Comment),
+ ]
+ }
+
+ When this encounters the above sample, first ``'comment'`` and ``'directive'``
+ are pushed onto the stack, then the lexer continues in the directive state
+ until it finds the closing ``>``, then it continues in the comment state until
+ the closing ``*/``. Then, both states are popped from the stack again and
+ lexing continues in the root state.
+
+
+- You can include the rules of a state in the definition of another. This is
+ done by using `include` from `pygments.lexer`:
+
+ .. sourcecode:: python
+
+ from pygments.lexer import RegexLexer, include
+
+ class ExampleLexer(RegexLexer):
+ tokens = {
+ 'comments': [
+ (r'/\*.*?\*/', Comment),
+ (r'//.*?\n', Comment),
+ ],
+ 'root': [
+ include('comments'),
+ (r'(function )(\w+)( {)',
+ (Keyword, Name, Keyword), 'function'),
+ (r'.', Text),
+ ],
+ 'function': [
+ (r'[^}/]+', Text),
+ include('comments'),
+ (r'/', Text),
+ (r'}', Keyword, '#pop'),
+ ]
+ }
+
+ This is a hypothetical lexer for a language that consist of functions and
+ comments. Because comments can occur at toplevel and in functions, we need
+ rules for comments in both states. As you can see, the `include` helper saves
+ repeating rules that occur more than once (in this example, the state
+ ``'comment'`` will never be entered by the lexer, as it's only there to be
+ included in ``'root'`` and ``'function'``).
+
+
+- Sometimes, you may want to "combine" a state from existing ones. This is
+ possible with the `combine` helper from `pygments.lexer`.
+
+ If you, instead of a new state, write ``combined('state1', 'state2')`` as the
+ third item of a rule tuple, a new anonymous state will be formed from state1
+ and state2 and if the rule matches, the lexer will enter this state.
+
+ This is not used very often, but can be helpful in some cases, such as the
+ `PythonLexer`'s string literal processing.
+
+- If you want your lexer to start lexing in a different state you can modify
+ the stack by overloading the `get_tokens_unprocessed` method:
+
+ .. sourcecode:: python
+
+ class MyLexer(RegexLexer):
+ tokens = {...}
+
+ def get_tokens_unprocessed(self, text):
+ stack = ['root', 'otherstate']
+ for item in RegexLexer.get_tokens_unprocessed(text, stack):
+ yield item
+
+ Some lexers like the `PhpLexer` use this to make the leading ``<?php``
+ preprocessor comments optional. Note that you can crash the lexer easily
+ by putting values into the stack that don't exist in the token map. Also
+ removing ``'root'`` from the stack can result in strange errors!
+
+
+Using multiple lexers
+=====================
+
+Using multiple lexers for the same input can be tricky. One of the easiest
+combination techniques is shown here: You can replace the token type entry in a
+rule tuple (the second item) with a lexer class. The matched text will then be
+lexed with that lexer, and the resulting tokens will be yielded.
+
+For example, look at this stripped-down HTML lexer:
+
+.. sourcecode:: python
+
+ from pygments.lexer import RegexLexer, bygroups, using
+
+ class HtmlLexer(RegexLexer):
+ name = 'HTML'
+ aliases = ['html']
+ filenames = ['*.html', '*.htm']
+
+ flags = re.IGNORECASE | re.DOTALL
+ tokens = {
+ 'root': [
+ ('[^<&]+', Text),
+ ('&.*?;', Name.Entity),
+ (r'<\s*script\s*', Name.Tag, ('script-content', 'tag')),
+ (r'<\s*[a-zA-Z0-9:]+', Name.Tag, 'tag'),
+ (r'<\s*/\s*[a-zA-Z0-9:]+\s*>', Name.Tag),
+ ],
+ 'script-content': [
+ (r'(.+?)(<\s*/\s*script\s*>)',
+ bygroups(using(JavascriptLexer), Name.Tag),
+ '#pop'),
+ ]
+ }
+
+Here the content of a ``<script>`` tag is passed to a newly created instance of
+a `JavascriptLexer` and not processed by the `HtmlLexer`. This is done using the
+`using` helper that takes the other lexer class as its parameter.
+
+Note the combination of `bygroups` and `using`. This makes sure that the content
+up to the ``</script>`` end tag is processed by the `JavascriptLexer`, while the
+end tag is yielded as a normal token with the `Name.Tag` type.
+
+As an additional goodie, if the lexer class is replaced by `this` (imported from
+`pygments.lexer`), the "other" lexer will be the current one (because you cannot
+refer to the current class within the code that runs at class definition time).
+
+Also note the ``(r'<\s*script\s*', Name.Tag, ('script-content', 'tag'))`` rule.
+Here, two states are pushed onto the state stack, ``'script-content'`` and
+``'tag'``. That means that first ``'tag'`` is processed, which will parse
+attributes and the closing ``>``, then the ``'tag'`` state is popped and the
+next state on top of the stack will be ``'script-content'``.
+
+Any keywords arguments passed to ``using()`` are added to the keyword arguments
+used to create the lexer.
+
+
+Delegating Lexer
+================
+
+Another approach for nested lexers is the `DelegatingLexer` which is for
+example used for the template engine lexers. It takes two lexers as
+arguments on initialisation: a `root_lexer` and a `language_lexer`.
+
+The input is processed as follows: First, the whole text is lexed with the
+`language_lexer`. All tokens yielded with a type of ``Other`` are then
+concatenated and given to the `root_lexer`. The language tokens of the
+`language_lexer` are then inserted into the `root_lexer`'s token stream
+at the appropriate positions.
+
+.. sourcecode:: python
+
+ from pygments.lexer import DelegatingLexer
+ from pygments.lexers.web import HtmlLexer, PhpLexer
+
+ class HtmlPhpLexer(DelegatingLexer):
+ def __init__(self, **options):
+ super(HtmlPhpLexer, self).__init__(HtmlLexer, PhpLexer, **options)
+
+This procedure ensures that e.g. HTML with template tags in it is highlighted
+correctly even if the template tags are put into HTML tags or attributes.
+
+If you want to change the needle token ``Other`` to something else, you can
+give the lexer another token type as the third parameter:
+
+.. sourcecode:: python
+
+ DelegatingLexer.__init__(MyLexer, OtherLexer, Text, **options)
+
+
+Callbacks
+=========
+
+Sometimes the grammar of a language is so complex that a lexer would be unable
+to parse it just by using regular expressions and stacks.
+
+For this, the `RegexLexer` allows callbacks to be given in rule tuples, instead
+of token types (`bygroups` and `using` are nothing else but preimplemented
+callbacks). The callback must be a function taking two arguments:
+
+* the lexer itself
+* the match object for the last matched rule
+
+The callback must then return an iterable of (or simply yield) ``(index,
+tokentype, value)`` tuples, which are then just passed through by
+`get_tokens_unprocessed()`. The ``index`` here is the position of the token in
+the input string, ``tokentype`` is the normal token type (like `Name.Builtin`),
+and ``value`` the associated part of the input string.
+
+You can see an example here:
+
+.. sourcecode:: python
+
+ class HypotheticLexer(RegexLexer):
+
+ def headline_callback(lexer, match):
+ yield match.start(), Generic.Headline, equal_signs + text + equal_signs
+
+ tokens = {
+ 'root': [
+ (r'(=+)(.*?)(\1)', headline_callback)
+ ]
+ }
+
+If the regex for the `headline_callback` matches, the function is called with the
+match object. Note that after the callback is done, processing continues
+normally, that is, after the end of the previous match. The callback has no
+possibility to influence the position.
+
+There are not really any simple examples for lexer callbacks, but you can see
+them in action e.g. in the `compiled.py`_ source code in the `CLexer` and
+`JavaLexer` classes.
+
+.. _compiled.py: http://trac.pocoo.org/repos/pygments/lexers/compiled.py
+
+
+The ExtendedRegexLexer class
+============================
+
+The `RegexLexer`, even with callbacks, unfortunately isn't powerful enough for
+the funky syntax rules of some languages that will go unnamed, such as Ruby.
+
+But fear not; even then you don't have to abandon the regular expression
+approach. For Pygments has a subclass of `RegexLexer`, the `ExtendedRegexLexer`.
+All features known from RegexLexers are available here too, and the tokens are
+specified in exactly the same way, *except* for one detail:
+
+The `get_tokens_unprocessed()` method holds its internal state data not as local
+variables, but in an instance of the `pygments.lexer.LexerContext` class, and
+that instance is passed to callbacks as a third argument. This means that you
+can modify the lexer state in callbacks.
+
+The `LexerContext` class has the following members:
+
+* `text` -- the input text
+* `pos` -- the current starting position that is used for matching regexes
+* `stack` -- a list containing the state stack
+* `end` -- the maximum position to which regexes are matched, this defaults to
+ the length of `text`
+
+Additionally, the `get_tokens_unprocessed()` method can be given a
+`LexerContext` instead of a string and will then process this context instead of
+creating a new one for the string argument.
+
+Note that because you can set the current position to anything in the callback,
+it won't be automatically be set by the caller after the callback is finished.
+For example, this is how the hypothetical lexer above would be written with the
+`ExtendedRegexLexer`:
+
+.. sourcecode:: python
+
+ class ExHypotheticLexer(ExtendedRegexLexer):
+
+ def headline_callback(lexer, match, ctx):
+ yield match.start(), Generic.Headline, equal_signs + text + equal_signs
+ ctx.pos = match.end()
+
+ tokens = {
+ 'root': [
+ (r'(=+)(.*?)(\1)', headline_callback)
+ ]
+ }
+
+This might sound confusing (and it can really be). But it is needed, and for an
+example look at the Ruby lexer in `agile.py`_.
+
+.. _agile.py: http://trac.pocoo.org/repos/pygments/trunk/pygments/lexers/agile.py
diff --git a/docs/src/lexers.txt b/docs/src/lexers.txt
new file mode 100644
index 00000000..5fd8b19e
--- /dev/null
+++ b/docs/src/lexers.txt
@@ -0,0 +1,521 @@
+.. -*- mode: rst -*-
+
+================
+Available lexers
+================
+
+This page lists all available builtin lexers and the options they take.
+
+Currently, **all lexers** support these options:
+
+`stripnl`
+ Strip leading and trailing newlines from the input (default: ``True``)
+
+`stripall`
+ Strip all leading and trailing whitespace from the input (default:
+ ``False``).
+
+`tabsize`
+ If given and greater than 0, expand tabs in the input (default: ``0``).
+
+
+These lexers are builtin and can be imported from
+`pygments.lexers`:
+
+
+Special lexers
+==============
+
+`TextLexer`
+
+ "Null" lexer, doesn't highlight anything.
+
+ :Aliases: ``text``
+ :Filename patterns: ``*.txt``
+
+
+`RawTokenLexer`
+
+ Recreates a token stream formatted with the `RawTokenFormatter`.
+
+ Additional option:
+
+ `compress`
+ If set to ``'gz'`` or ``'bz2'``, decompress the token stream with
+ the given compression algorithm before lexing (default: '').
+
+ :Aliases: ``raw``
+ :Filename patterns: ``*.raw``
+
+
+Agile languages
+===============
+
+`PythonLexer`
+
+ For `Python <http://www.python.org>`_ source code.
+
+ :Aliases: ``python``, ``py``
+ :Filename patterns: ``*.py``, ``*.pyw``
+
+
+`PythonConsoleLexer`
+
+ For Python console output or doctests, such as:
+
+ .. sourcecode:: pycon
+
+ >>> a = 'foo'
+ >>> print a
+ 'foo'
+ >>> 1/0
+ Traceback (most recent call last):
+ ...
+
+ :Aliases: ``pycon``
+ :Filename patterns: None
+
+
+`RubyLexer`
+
+ For `Ruby <http://www.ruby-lang.org>`_ source code.
+
+ :Aliases: ``ruby``, ``rb``
+ :Filename patterns: ``*.rb``
+
+
+`RubyConsoleLexer`
+
+ For Ruby interactive console (**irb**) output like:
+
+ .. sourcecode:: rbcon
+
+ irb(main):001:0> a = 1
+ => 1
+ irb(main):002:0> puts a
+ 1
+ => nil
+
+ :Aliases: ``rbcon``, ``irb``
+ :Filename patterns: None
+
+
+`PerlLexer`
+
+ For `Perl <http://www.perl.org>`_ source code.
+
+ :Aliases: ``perl``, ``pl``
+ :Filename patterns: ``*.pl``, ``*.pm``
+
+
+`LuaLexer`
+
+ For `Lua <http://www.lua.org>`_ source code.
+
+ Additional options:
+
+ `func_name_highlighting`
+ If given and ``True``, highlight builtin function names
+ (default: ``True``).
+ `disabled_modules`
+ If given, must be a list of module names whose function names
+ should not be highlighted. By default all modules are highlighted.
+
+ To get a list of allowed modules have a look into the
+ `_luabuiltins` module:
+
+ .. sourcecode:: pycon
+
+ >>> from pygments.lexers._luabuiltins import MODULES
+ >>> MODULES.keys()
+ ['string', 'coroutine', 'modules', 'io', 'basic', ...]
+
+ :Aliases: ``lua``
+ :Filename patterns: ``*.lua``
+
+
+Compiled languages
+==================
+
+`CLexer`
+
+ For C source code with preprocessor directives.
+
+ :Aliases: ``c``
+ :Filename patterns: ``*.c``, ``*.h``
+
+
+`CppLexer`
+
+ For C++ source code with preprocessor directives.
+
+ :Aliases: ``cpp``, ``c++``
+ :Filename patterns: ``*.cpp``, ``*.hpp``, ``*.c++``, ``*.h++``
+
+
+`DelphiLexer`
+
+ For `Delphi <http://www.borland.com/delphi/>`_
+ (Borland Object Pascal) source code.
+
+ :Aliases: ``delphi``, ``pas``, ``pascal``, ``objectpascal``
+ :Filename patterns: ``*.pas``
+
+
+`JavaLexer`
+
+ For `Java <http://www.sun.com/java/>`_ source code.
+
+ :Aliases: ``java``
+ :Filename patterns: ``*.java``
+
+
+.NET languages
+==============
+
+`CSharpLexer`
+
+ For `C# <http://msdn2.microsoft.com/en-us/vcsharp/default.aspx>`_
+ source code.
+
+ :Aliases: ``c#``, ``csharp``
+ :Filename patterns: ``*.cs``
+
+`BooLexer`
+
+ For `Boo <http://boo.codehaus.org/>`_ source code.
+
+ :Aliases: ``boo``
+ :Filename patterns: ``*.boo``
+
+`VbNetLexer`
+
+ For
+ `Visual Basic.NET <http://msdn2.microsoft.com/en-us/vbasic/default.aspx>`_
+ source code.
+
+ :Aliases: ``vbnet``, ``vb.net``
+ :Filename patterns: ``*.vb``, ``*.bas``
+
+
+Web-related languages
+=====================
+
+`JavascriptLexer`
+
+ For JavaScript source code.
+
+ :Aliases: ``js``, ``javascript``
+ :Filename patterns: ``*.js``
+
+
+`CssLexer`
+
+ For CSS (Cascading Style Sheets).
+
+ :Aliases: ``css``
+ :Filename patterns: ``*.css``
+
+
+`HtmlLexer`
+
+ For HTML 4 and XHTML 1 markup. Nested JavaScript and CSS is highlighted
+ by the appropriate lexer.
+
+ :Aliases: ``html``
+ :Filename patterns: ``*.html``, ``*.htm``, ``*.xhtml``
+
+
+`PhpLexer`
+
+ For `PHP <http://www.php.net/>`_ source code.
+ For PHP embedded in HTML, use the `HtmlPhpLexer`.
+
+ Additional options:
+
+ `startinline`
+ If given and ``True`` the lexer starts highlighting with
+ php code. (i.e.: no starting ``<?php`` required)
+ `funcnamehighlighting`
+ If given and ``True``, highlight builtin function names
+ (default: ``True``).
+ `disabledmodules`
+ If given, must be a list of module names whose function names
+ should not be highlighted. By default all modules are highlighted
+ except the special ``'unknown'`` module that includes functions
+ that are known to php but are undocumented.
+
+ To get a list of allowed modules have a look into the
+ `_phpbuiltins` module:
+
+ .. sourcecode:: pycon
+
+ >>> from pygments.lexers._phpbuiltins import MODULES
+ >>> MODULES.keys()
+ ['PHP Options/Info', 'Zip', 'dba', ...]
+
+ In fact the names of those modules match the module names from
+ the php documentation.
+
+ :Aliases: ``php``, ``php3``, ``php4``, ``php5``
+ :Filename patterns: ``*.php``, ``*.php[345]``
+
+
+`XmlLexer`
+
+ Generic lexer for XML (extensible markup language).
+
+ :Aliases: ``xml``
+ :Filename patterns: ``*.xml``
+
+
+Template languages
+==================
+
+`ErbLexer`
+
+ Generic `ERB <http://ruby-doc.org/core/classes/ERB.html>`_ (Ruby Templating)
+ lexer.
+
+ Just highlights ruby code between the preprocessor directives, other data
+ is left untouched by the lexer.
+
+ All options are also forwarded to the `RubyLexer`.
+
+ :Aliases: ``erb``
+ :Filename patterns: None
+
+
+`RhtmlLexer`
+
+ Subclass of the ERB lexer that highlights the unlexed data with the
+ html lexer.
+
+ Nested Javascript and CSS is highlighted too.
+
+ :Aliases: ``rhtml``, ``html+erb``, ``html+ruby``
+ :Filename patterns: ``*.rhtml``
+
+
+`XmlErbLexer`
+
+ Subclass of `ErbLexer` which highlights data outside preprocessor
+ directives with the `XmlLexer`.
+
+ :Aliases: ``xml+erb``, ``xml+ruby``
+ :Filename patterns: None
+
+
+`CssErbLexer`
+
+ Subclass of `ErbLexer` which highlights unlexed data with the `CssLexer`.
+
+ :Aliases: ``css+erb``, ``css+ruby``
+ :Filename patterns: None
+
+
+`JavascriptErbLexer`
+
+ Subclass of `ErbLexer` which highlights unlexed data with the
+ `JavascriptLexer`.
+
+ :Aliases: ``js+erb``, ``javascript+erb``, ``js+ruby``, ``javascript+ruby``
+ :Filename patterns: None
+
+
+`HtmlPhpLexer`
+
+ Subclass of `PhpLexer` that highlights unhandled data with the `HtmlLexer`.
+
+ Nested Javascript and CSS is highlighted too.
+
+ :Aliases: ``html+php``
+ :Filename patterns: ``*.phtml``
+
+
+`XmlPhpLexer`
+
+ Subclass of `PhpLexer` that higlights unhandled data with the `XmlLexer`.
+
+ :Aliases: ``xml+php``
+ :Filename patterns: None
+
+
+`CssPhpLexer`
+
+ Subclass of `PhpLexer` which highlights unmatched data with the `CssLexer`.
+
+ :Aliases: ``css+php``
+ :Filename patterns: None
+
+
+`JavascriptPhpLexer`
+
+ Subclass of `PhpLexer` which highlights unmatched data with the
+ `JavascriptLexer`.
+
+ :Aliases: ``js+php``, ``javascript+php``
+ :Filename patterns: None
+
+
+`DjangoLexer`
+
+ Generic `django <http://www.djangoproject.com/documentation/templates/>`_
+ template lexer.
+
+ It just highlights django code between the preprocessor directives, other
+ data is left untouched by the lexer.
+
+ :Aliases: ``django``
+ :Filename patterns: None
+
+
+`HtmlDjangoLexer`
+
+ Subclass of the `DjangoLexer` that highighlights unlexed data with the
+ `HtmlLexer`.
+
+ Nested Javascript and CSS is highlighted too.
+
+ :Aliases: ``html+django``
+ :Filename patterns: None
+
+
+`XmlDjangoLexer`
+
+ Subclass of the `DjangoLexer` that highlights unlexed data with the
+ `XmlLexer`.
+
+ :Aliases: ``xml+django``
+ :Filename patterns: None
+
+
+`CssDjangoLexer`
+
+ Subclass of the `DjangoLexer` that highlights unlexed data with the
+ `CssLexer`.
+
+ :Aliases: ``css+django``
+ :Filename patterns: None
+
+
+`JavascriptDjangoLexer`
+
+ Subclass of the `DjangoLexer` that highlights unlexed data with the
+ `JavascriptLexer`.
+
+ :Aliases: ``javascript+django``
+ :Filename patterns: None
+
+
+`SmartyLexer`
+
+ Generic `Smarty <http://smarty.php.net/>`_ template lexer.
+
+ Just highlights smarty code between the preprocessor directives, other
+ data is left untouched by the lexer.
+
+ :Aliases: ``smarty``
+ :Filename patterns: None
+
+
+`HtmlSmartyLexer`
+
+ Subclass of the `SmartyLexer` that highighlights unlexed data with the
+ `HtmlLexer`.
+
+ Nested Javascript and CSS is highlighted too.
+
+ :Aliases: ``html+smarty``
+ :Filename patterns: None
+
+
+`XmlSmartyLexer`
+
+ Subclass of the `SmartyLexer` that highlights unlexed data with the
+ `XmlLexer`.
+
+ :Aliases: ``xml+smarty``
+ :Filename patterns: None
+
+
+`CssSmartyLexer`
+
+ Subclass of the `SmartyLexer` that highlights unlexed data with the
+ `CssLexer`.
+
+ :Aliases: ``css+smarty``
+ :Filename patterns: None
+
+
+`JavascriptSmartyLexer`
+
+ Subclass of the `SmartyLexer` that highlights unlexed data with the
+ `JavascriptLexer`.
+
+ :Aliases: ``javascript+smarty``
+ :Filename patterns: None
+
+
+Other languages
+===============
+
+`SqlLexer`
+
+ Lexer for Structured Query Language. Currently, this lexer does
+ not recognize any special syntax except ANSI SQL.
+
+ :Aliases: ``sql``
+ :Filename patterns: ``*.sql``
+
+
+`BrainfuckLexer`
+
+ Lexer for the esoteric `BrainFuck <http://www.muppetlabs.com/~breadbox/bf/>`_
+ language.
+
+ :Aliases: ``brainfuck``
+ :Filename patterns: ``*.bf``, ``*.b``
+
+
+Text lexers
+===========
+
+`IniLexer`
+
+ Lexer for configuration files in INI style.
+
+ :Aliases: ``ini``, ``cfg``
+ :Filename patterns: ``*.ini``, ``*.cfg``
+
+
+`MakefileLexer`
+
+ Lexer for Makefiles.
+
+ :Aliases: ``make``, ``makefile``, ``mf``
+ :Filename patterns: ``*.mak``, ``Makefile``, ``makefile``
+
+
+`DiffLexer`
+
+ Lexer for unified or context-style diffs.
+
+ :Aliases: ``diff``
+ :Filename patterns: ``*.diff``, ``*.patch``
+
+
+`IrcLogsLexer`
+
+ Lexer for IRC logs in **irssi** or **xchat** style.
+
+ :Aliases: ``irc``
+ :Filename patterns: None
+
+
+`TexLexer`
+
+ Lexer for the TeX and LaTeX typesetting languages.
+
+ :Aliases: ``tex``, ``latex``
+ :Filename patterns: ``*.tex``, ``*.aux``, ``*.toc``
diff --git a/docs/src/quickstart.txt b/docs/src/quickstart.txt
new file mode 100644
index 00000000..749889df
--- /dev/null
+++ b/docs/src/quickstart.txt
@@ -0,0 +1,121 @@
+.. -*- mode: rst -*-
+
+==========
+Quickstart
+==========
+
+
+Pygments comes with a wide range of lexers for modern languages which are all
+accessible through the pygments.lexers package. A lexer enables Pygments to
+parse the source code into tokens which are passed to a formatter. Currently
+formatters exist for HTML, LaTeX and ANSI sequences.
+
+
+Example
+=======
+
+Here is a small example for highlighting Python code:
+
+.. sourcecode:: python
+
+ from pygments import highlight
+ from pygments.lexers import PythonLexer
+ from pygments.formatters import HtmlFormatter
+
+ code = 'print "Hello World"'
+ print highlight(code, PythonLexer(), HtmlFormatter())
+
+which prints something like this:
+
+.. sourcecode:: html
+
+ <div class="highlight">
+ <pre><span class="k">print</span> <span class="l s">&quot;Hello World&quot;</span></pre>
+ </div>
+
+
+A CSS stylesheet which contains all CSS classes possibly used in the output can be
+produced by:
+
+.. sourcecode:: python
+
+ print HtmlFormatter().get_style_defs('.highlight')
+
+The argument is used as an additional CSS selector: the output may look like
+
+.. sourcecode:: css
+
+ .highlight .k { color: #AA22FF; font-weight: bold }
+ .highlight .s { color: #BB4444 }
+ ...
+
+
+Options
+=======
+
+The `highlight()` function supports a fourth argument called `outfile`, it must be
+a file object if given. The formatted output will then be written to this file
+instead of being returned as a string.
+
+Lexers and formatters both support options. They are given to them as keyword
+arguments either to the class or to the lookup method:
+
+.. sourcecode:: python
+
+ from pygments import highlight
+ from pygments.lexers import get_lexer_by_name
+ from pygments.formatters import HtmlFormatter
+
+ lexer = get_lexer_by_name("python", stripall=True)
+ formatter = HtmlFormatter(linenos=True, cssclass="source")
+ result = highlight(code, lexer, formatter)
+
+This makes the lexer strip all leading and trailing whitespace from the input
+(`stripall` option), lets the formatter output line numbers (`linenos` option),
+and sets the wrapping ``<div>``'s class to ``source`` (instead of
+``highlight``).
+
+For an overview of builtin lexers and formatters and their options, visit the
+`lexer <lexers.txt>`_ and `formatters <formatters.txt>`_ lists.
+
+
+Lexer and formatter lookup
+==========================
+
+If you want to lookup a built-in lexer by its alias or a filename, you can use
+one of the following methods:
+
+.. sourcecode:: pycon
+
+ >>> from pygments.lexers import get_lexer_by_name, get_lexer_for_filename
+ >>> get_lexer_by_name('python')
+ <pygments.lexers.agile.PythonLexer object at 0xb7bd6d0c>
+ >>> get_lexer_for_filename('spam.py')
+ <pygments.lexers.agile.PythonLexer object at 0xb7bd6b2c>
+
+The same API is available for formatters: use `get_formatter_by_name` and
+`get_formatter_for_filename` from the `pygments.formatters` module
+for this purpose.
+
+
+Command line usage
+==================
+
+You can use Pygments from the command line, using the `pygmentize` script::
+
+ $ pygmentize test.py
+
+will highlight the Python file test.py using ANSI escape sequences
+(a.k.a. terminal colors) and print the result to standard output.
+
+To output HTML, use the ``-f`` option::
+
+ $ pygmentize -f html -o test.html test.py
+
+to write an HTML-highlighted version of test.py to the file test.html.
+
+The stylesheet can be created with::
+
+ $ pygmentize -S default -f html > style.css
+
+More options and tricks and be found in the `command line referene <cmdline.txt>`_.
diff --git a/docs/src/rstdirective.txt b/docs/src/rstdirective.txt
new file mode 100644
index 00000000..60651319
--- /dev/null
+++ b/docs/src/rstdirective.txt
@@ -0,0 +1,42 @@
+===============================
+Using Pygments in ReST documents
+===============================
+
+Many Python people use `ReST`_ for documentation their sourcecode, programs etc.
+This also means that documentation often includes sourcecode samples etc.
+
+You can easily enable Pygments support for your rst texts as long as you
+use your own build script.
+
+Just add this code to it:
+
+.. sourcecode:: python
+
+ from docutils import nodes
+ from docutils.parsers.rst import directives
+ from pygments import highlight
+ from pygments.lexers import get_lexer_by_name
+ from pygments.formatters import HtmlFormatter
+
+ PYGMENTS_FORMATTER = HtmlFormatter()
+
+ def pygments_directive(name, arguments, options, content, lineno,
+ content_offset, block_text, state, state_machine):
+ try:
+ lexer = get_lexer_by_name(arguments[0])
+ except ValueError:
+ # no lexer found
+ lexer = get_lexer_by_name('text')
+ parsed = highlight(u'\n'.join(content), lexer, PYGMENTS_FORMATTER)
+ return [nodes.raw('', parsed, format='html')]
+ pygments_directive.arguments = (1, 0, 1)
+ pygments_directive.content = 1
+ directives.register_directive('sourcecode', pygments_directive)
+
+Now you should be able to use Pygments in your rst files using this syntax::
+
+ .. sourcecode:: language
+
+ your code here
+
+.. _ReST: http://docutils.sf.net/rst.html
diff --git a/docs/src/styles.txt b/docs/src/styles.txt
new file mode 100644
index 00000000..4fd6c297
--- /dev/null
+++ b/docs/src/styles.txt
@@ -0,0 +1,119 @@
+.. -*- mode: rst -*-
+
+======
+Styles
+======
+
+Pygments comes with some builtin styles that work for both the HTML and
+LaTeX formatter.
+
+The builtin styles can be looked up with the `get_style_by_name` function:
+
+.. sourcecode:: pycon
+
+ >>> from pygments.styles import get_style_by_name
+ >>> get_style_by_name('colorful')
+ <class 'pygments.styles.colorful.ColorfulStyle'>
+
+You can pass a instance of a `Style` class to a formatter as the `style`
+option in form of a string:
+
+.. sourcecode:: pycon
+
+ >>> from pygments.styles import get_style_by_name
+ >>> HtmlFormatter(style='colorful').style
+ <class 'pygments.styles.colorful.ColorfulStyle'>
+
+Or you can also import your own style (which must be a subclass of
+`pygments.style.Style`) and pass it to the formatter:
+
+.. sourcecode:: pycon
+
+ >>> from yourapp.yourmodule import YourStyle
+ >>> HtmlFormatter(style=YourStyle).style
+ <class 'yourapp.yourmodule.YourStyle'>
+
+
+Creating Own Styles
+===================
+
+So, how to create a style? All you have to do is to subclass `Style` and
+define some styles:
+
+.. sourcecode:: python
+
+ from pygments.style import Style
+ from pygments.token import Keyword, Name, Comment, String, Error, \
+ Number, Operator, Generic
+
+ class YourStyle(Style):
+ default_style = ""
+ styles = {
+ Comment: 'italic #888',
+ Keyword: 'bold #005',
+ Name: '#f00',
+ Name.Function: '#0f0',
+ Name.Class: 'bold #0f0',
+ String: 'bg:#eee #111'
+ }
+
+That's it. There are just a few rules. When you define a style for `Name`
+the style automatically also affects `Name.Function` and so on. If you
+defined ``'bold'`` and you don't want boldface for a subtoken use ``'nobold'``.
+
+(Philosophy: the styles aren't written in CSS syntax since this way
+they can be used for a variety of formatters.)
+
+`default_style` is the style inherited by all token types.
+
+
+Style Rules
+===========
+
+Here a small overview over all allowed styles:
+
+``bold``
+ render text as bold
+``nobold``
+ don't render text as bold (to prevent subtokens behing highlighted bold)
+``italic``
+ render text italic
+``noitalic``
+ don't render text as italic
+``underline``
+ render text underlined
+``nounderline``
+ don't render text underlined
+``bg:``
+ transparent background
+``bg:#000000``
+ background color (black)
+``border:``
+ no border
+``border:#ffffff``
+ border color (white)
+``#ff0000``
+ text color (red)
+``noinherit``
+ don't inherit styles from supertoken
+
+Note that there may not be a space between ``bg:`` and the color value
+since the style definition string is split at whitespace.
+Also, using named colors is not allowed since the supported color names
+vary for different formatters.
+
+Furthermore, not all lexers might support every style.
+
+
+Builtin Styles
+==============
+
+Pygments ships some builtin styles which are maintained by the Pygments team.
+
+To get a list of known styles you can use this snippet:
+
+.. sourcecode:: pycon
+
+ >>> from pygments.styles import STYLE_MAP
+ >>> STYLE_MAP.keys()
+ ['default', 'emacs', 'friendly', 'colorful']
diff --git a/docs/src/tokens.txt b/docs/src/tokens.txt
new file mode 100644
index 00000000..47d8feea
--- /dev/null
+++ b/docs/src/tokens.txt
@@ -0,0 +1,284 @@
+.. -*- mode: rst -*-
+
+==============
+Builtin Tokens
+==============
+
+Inside the `pygments.token` module, there is a special object called `Token` that
+is used to create token types.
+
+You can create a new token type by accessing an attribute of `Token`:
+
+.. sourcecode:: pycon
+
+ >>> from pygments.token import Token
+ >>> Token.String
+ Token.String
+ >>> Token.String is Token.String
+ True
+
+Note that tokens are singletons so you can use the ``is`` operator for comparing
+token types.
+
+In principle, you can create an unlimited number of token types but nobody can
+guarantee that a style would define style rules for a token type. Because of
+that, Pygments proposes some global token types defined in the
+`pygments.token.STANDARD_TYPES` dict.
+
+For some tokens aliases are already defined:
+
+.. sourcecode:: pycon
+
+ >>> from pygments.token import String
+ >>> String
+ Token.Literal.String
+
+Inside the `pygments.token` module the following aliases are defined:
+
+=========== =============================== ==================================
+`Text` `Token.Text` for any type of text data
+`Error` `Token.Error` represents lexer errors
+`Other` `Token.Other` special token for data not
+ matched by a parser (e.g. HTML
+ markup in PHP code)
+`Keyword` `Token.Keyword` any kind of keywords
+`Name` `Token.Name` variable/function names
+`Literal` `Token.Literal` Any literals
+`String` `Token.Literal.String` string literals
+`Number` `Token.Literal.Number` number literals
+`Operator` `Token.Operator` operators (``+``, ``not`` etc)
+`Comment` `Token.Comment` any kind of comments
+`Generic` `Token.Generic` generic tokens (have a look at
+ the explanation below)
+=========== =============================== ==================================
+
+Normally you just create token types using the already defined aliases. For each
+of those token aliases, a number of subtypes exists (excluding the special tokens
+`Token.Text`, `Token.Error` and `Token.Other`)
+
+
+Keyword Tokens
+==============
+
+`Keyword`
+ For any kind of keyword (especially if it doesn't match any of the
+ subtypes of course).
+
+`Keyword.Constant`
+ For keywords that are constants (e.g. ``None`` in future Python versions).
+
+`Keyword.Declaration`
+ For keywords used for variable declaration (e.g. ``var`` in some programming
+ languages like JavaScript).
+
+`Keyword.Pseudo`
+ For keywords that aren't really keywords (e.g. ``None`` in old Python
+ versions).
+
+`Keyword.Reserved`
+ For reserved keywords.
+
+`Keyword.Type`
+ For builtin types that can't be used as identifiers (e.g. ``int``,
+ ``char`` etc. in C).
+
+
+Name Tokens
+===========
+
+`Name`
+ For any name (variable names, function names, classes).
+
+`Name.Attribute`
+ For all attributes (e.g. in HTML tags).
+
+`Name.Builtin`
+ Builtin names; names that are available in the global namespace.
+
+`Name.Builtin.Pseudo`
+ Builtin names that are implicit (e.g. ``self`` in Ruby, ``this`` in Java).
+
+`Name.Class`
+ Class names. Because no lexer can know if a name is a class or a function
+ or something else this token is meant for class declarations.
+
+`Name.Constant`
+ Token type for constants. In some languages you can recognise a token by the
+ way it's defined (the value after a ``const`` keyword for example). In
+ other languages constants are uppercase by definition (Ruby).
+
+`Name.Decorator`
+ Token type for decorators. Decorators are synatic elements in the Python
+ language. Similar syntax elements exist in C# and Java.
+
+`Name.Entity`
+ Token type for special entities. (e.g. ``&nbsp;`` in HTML).
+
+`Name.Exception`
+ Token type for exception names (e.g. ``RuntimeError`` in Python). Some languages
+ define exceptions in the function signature (Java). You can highlight
+ the name of that exception using this token then.
+
+`Name.Function`
+ Token type for function names.
+
+`Name.Label`
+ Token type for label names (e.g. in languages that support ``goto``).
+
+`Name.Namespace`
+ Token type for namespaces. (e.g. import paths in Java/Python), names following
+ the ``module``/``namespace`` keyword in other languages.
+
+`Name.Other`
+ Other names. Normally unused.
+
+`Name.Tag`
+ Tag names (in HTML/XML markup or configuration files).
+
+`Name.Variable`
+ Token type for variables. Some languages have prefixes for variable names
+ (PHP, Ruby, Perl). You can highlight them using this token.
+
+`Name.Variable.Class`
+ same as `Name.Variable` but for class variables (also static variables).
+
+`Name.Variable.Global`
+ same as `Name.Variable` but for global variables (used in Ruby, for
+ example).
+
+`Name.Variable.Instance`
+ same as `Name.Variable` but for instance variables.
+
+
+Literals
+========
+
+`Literal`
+ For any literal (if not further defined).
+
+`Literal.Date`
+ for date literals (e.g. ``42d`` in Boo).
+
+
+`String`
+ For any string literal.
+
+`String.Backtick`
+ Token type for strings enclosed in backticks.
+
+`String.Char`
+ Token type for single characters (e.g. Java, C).
+
+`String.Doc`
+ Token type for documentation strings (for example Python).
+
+`String.Double`
+ Double quoted strings.
+
+`String.Escape`
+ Token type for escape sequences in strings.
+
+`String.Heredoc`
+ Token type for "heredoc" strings (e.g. in Ruby or Perl).
+
+`String.Interpol`
+ Token type for interpolated parts in strings (e.g. ``#{foo}`` in Ruby).
+
+`String.Other`
+ Token type for any other strings (for example ``%q{foo}`` string constructs
+ in Ruby).
+
+`String.Regex`
+ Token type for regular expression literals (e.g. ``/foo/`` in JavaScript).
+
+`String.Single`
+ Token type for single quoted strings.
+
+`String.Symbol`
+ Token type for symbols (e.g. ``:foo`` in LISP or Ruby).
+
+
+`Number`
+ Token type for any number literal.
+
+`Number.Float`
+ Token type for float literals (e.g. ``42.0``).
+
+`Number.Hex`
+ Token type for hexadecimal number literals (e.g. ``0xdeadbeef``).
+
+`Number.Integer`
+ Token type for integer literals (e.g. ``42``).
+
+`Number.Integer.Long`
+ Token type for long integer literals (e.g. ``42L`` in Python).
+
+`Number.Oct`
+ Token type for octal literals.
+
+
+Operators
+=========
+
+`Operator`
+ For any punctuation operator (e.g. ``+``, ``-``).
+
+`Operator.Word`
+ For any operator that is a word (e.g. ``not``).
+
+
+Comments
+========
+
+`Comment`
+ Token type for any comment.
+
+`Comment.Multiline`
+ Token type for multiline comments.
+
+`Comment.Preproc`
+ Token type for preprocessor comments (also ``<?php``/``<%`` constructs).
+
+`Comment.Single`
+ Token type for comments that end at the end of a line (e.g. ``# foo``).
+
+
+Generic Tokens
+==============
+
+Generic tokens are for special lexers like the `DiffLexer` that doesn't really
+highlight a programming language but a patch file.
+
+
+`Generic`
+ A generic, unstyled token. Normally you don't use this token type.
+
+`Generic.Deleted`
+ Marks the token value as deleted.
+
+`Generic.Emph`
+ Marks the token value as emphasized.
+
+`Generic.Error`
+ Marks the token value as an error message.
+
+`Generic.Heading`
+ Marks the token value as headline.
+
+`Generic.Inserted`
+ Marks the token value as inserted.
+
+`Generic.Output`
+ Marks the token value as program output (e.g. for python cli lexer).
+
+`Generic.Prompt`
+ Marks the token value as command prompt (e.g. bash lexer).
+
+`Generic.Strong`
+ Marks the token value as bold (e.g. for rst lexer).
+
+`Generic.Subheading`
+ Marks the token value as subheadline.
+
+`Generic.Traceback`
+ Marks the token value as a part of an error traceback.