1 files changed, 169 insertions, 0 deletions
diff --git a/doc/docs/formatterdevelopment.rst b/doc/docs/formatterdevelopment.rst
new file mode 100644
index 00000000..2bfac05c
--- /dev/null
+++ b/doc/docs/formatterdevelopment.rst
@@ -0,0 +1,169 @@
+.. -*- mode: rst -*-
+
+========================
+Write your own formatter
+========================
+
+As well as creating :doc:`your own lexer <lexerdevelopment>`, writing a new
+formatter for Pygments is easy and straightforward.
+
+A formatter is a class that is initialized with some keyword arguments (the
+formatter options) and that must provides a `format()` method.
+Additionally a formatter should provide a `get_style_defs()` method that
+returns the style definitions from the style in a form usable for the
+formatter's output format.
+
+
+Quickstart
+==========
+
+The most basic formatter shipped with Pygments is the `NullFormatter`. It just
+sends the value of a token to the output stream:
+
+.. sourcecode:: python
+
+    from pygments.formatter import Formatter
+
+    class NullFormatter(Formatter):
+        def format(self, tokensource, outfile):
+            for ttype, value in tokensource:
+                outfile.write(value)
+
+As you can see, the `format()` method is passed two parameters: `tokensource`
+and `outfile`. The first is an iterable of ``(token_type, value)`` tuples,
+the latter a file like object with a `write()` method.
+
+Because the formatter is that basic it doesn't overwrite the `get_style_defs()`
+method.
+
+
+Styles
+======
+
+Styles aren't instantiated but their metaclass provides some class functions
+so that you can access the style definitions easily.
+
+Styles are iterable and yield tuples in the form ``(ttype, d)`` where `ttype`
+is a token and `d` is a dict with the following keys:
+
+``'color'``
+    Hexadecimal color value (eg: ``'ff0000'`` for red) or `None` if not
+    defined.
+
+``'bold'``
+    `True` if the value should be bold
+
+``'italic'``
+    `True` if the value should be italic
+
+``'underline'``
+    `True` if the value should be underlined
+
+``'bgcolor'``
+    Hexadecimal color value for the background (eg: ``'eeeeeee'`` for light
+    gray) or `None` if not defined.
+
+``'border'``
+    Hexadecimal color value for the border (eg: ``'0000aa'`` for a dark
+    blue) or `None` for no border.
+
+Additional keys might appear in the future, formatters should ignore all keys
+they don't support.
+
+
+HTML 3.2 Formatter
+==================
+
+For an more complex example, let's implement a HTML 3.2 Formatter. We don't
+use CSS but inline markup (``<u>``, ``<font>``, etc). Because this isn't good
+style this formatter isn't in the standard library ;-)
+
+.. sourcecode:: python
+
+    from pygments.formatter import Formatter
+
+    class OldHtmlFormatter(Formatter):
+
+        def __init__(self, **options):
+            Formatter.__init__(self, **options)
+
+            # create a dict of (start, end) tuples that wrap the
+            # value of a token so that we can use it in the format
+            # method later
+            self.styles = {}
+
+            # we iterate over the `_styles` attribute of a style item
+            # that contains the parsed style values.
+            for token, style in self.style:
+                start = end = ''
+                # a style item is a tuple in the following form:
+                # colors are readily specified in hex: 'RRGGBB'
+                if style['color']:
+                    start += '<font color="#%s">' % style['color']
+                    end = '</font>' + end
+                if style['bold']:
+                    start += '<b>'
+                    end = '</b>' + end
+                if style['italic']:
+                    start += '<i>'
+                    end = '</i>' + end
+                if style['underline']:
+                    start += '<u>'
+                    end = '</u>' + end
+                self.styles[token] = (start, end)
+
+        def format(self, tokensource, outfile):
+            # lastval is a string we use for caching
+            # because it's possible that an lexer yields a number
+            # of consecutive tokens with the same token type.
+            # to minimize the size of the generated html markup we
+            # try to join the values of same-type tokens here
+            lastval = ''
+            lasttype = None
+
+            # wrap the whole output with <pre>
+            outfile.write('<pre>')
+
+            for ttype, value in tokensource:
+                # if the token type doesn't exist in the stylemap
+                # we try it with the parent of the token type
+                # eg: parent of Token.Literal.String.Double is
+                # Token.Literal.String
+                while ttype not in self.styles:
+                    ttype = ttype.parent
+                if ttype == lasttype:
+                    # the current token type is the same of the last
+                    # iteration. cache it
+                    lastval += value
+                else:
+                    # not the same token as last iteration, but we
+                    # have some data in the buffer. wrap it with the
+                    # defined style and write it to the output file
+                    if lastval:
+                        stylebegin, styleend = self.styles[lasttype]
+                        outfile.write(stylebegin + lastval + styleend)
+                    # set lastval/lasttype to current values
+                    lastval = value
+                    lasttype = ttype
+
+            # if something is left in the buffer, write it to the
+            # output file, then close the opened <pre> tag
+            if lastval:
+                stylebegin, styleend = self.styles[lasttype]
+                outfile.write(stylebegin + lastval + styleend)
+            outfile.write('</pre>\n')
+
+The comments should explain it. Again, this formatter doesn't override the
+`get_style_defs()` method. If we would have used CSS classes instead of
+inline HTML markup, we would need to generate the CSS first. For that
+purpose the `get_style_defs()` method exists:
+
+
+Generating Style Definitions
+============================
+
+Some formatters like the `LatexFormatter` and the `HtmlFormatter` don't
+output inline markup but reference either macros or css classes. Because
+the definitions of those are not part of the output, the `get_style_defs()`
+method exists. It is passed one parameter (if it's used and how it's used
+is up to the formatter) and has to return a string or ``None``.