doc/docs/formatterdevelopment.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169

.. -*- mode: rst -*-

========================
Write your own formatter
========================

As well as creating :doc:`your own lexer <lexerdevelopment>`, writing a new
formatter for Pygments is easy and straightforward.

A formatter is a class that is initialized with some keyword arguments (the
formatter options) and that must provides a `format()` method.
Additionally a formatter should provide a `get_style_defs()` method that
returns the style definitions from the style in a form usable for the
formatter's output format.


Quickstart
==========

The most basic formatter shipped with Pygments is the `NullFormatter`. It just
sends the value of a token to the output stream:

.. sourcecode:: python

    from pygments.formatter import Formatter

    class NullFormatter(Formatter):
        def format(self, tokensource, outfile):
            for ttype, value in tokensource:
                outfile.write(value)

As you can see, the `format()` method is passed two parameters: `tokensource`
and `outfile`. The first is an iterable of ``(token_type, value)`` tuples,
the latter a file like object with a `write()` method.

Because the formatter is that basic it doesn't overwrite the `get_style_defs()`
method.


Styles
======

Styles aren't instantiated but their metaclass provides some class functions
so that you can access the style definitions easily.

Styles are iterable and yield tuples in the form ``(ttype, d)`` where `ttype`
is a token and `d` is a dict with the following keys:

``'color'``
    Hexadecimal color value (eg: ``'ff0000'`` for red) or `None` if not
    defined.

``'bold'``
    `True` if the value should be bold

``'italic'``
    `True` if the value should be italic

``'underline'``
    `True` if the value should be underlined

``'bgcolor'``
    Hexadecimal color value for the background (eg: ``'eeeeeee'`` for light
    gray) or `None` if not defined.

``'border'``
    Hexadecimal color value for the border (eg: ``'0000aa'`` for a dark
    blue) or `None` for no border.

Additional keys might appear in the future, formatters should ignore all keys
they don't support.


HTML 3.2 Formatter
==================

For an more complex example, let's implement a HTML 3.2 Formatter. We don't
use CSS but inline markup (``<u>``, ``<font>``, etc). Because this isn't good
style this formatter isn't in the standard library ;-)

.. sourcecode:: python

    from pygments.formatter import Formatter

    class OldHtmlFormatter(Formatter):

        def __init__(self, **options):
            Formatter.__init__(self, **options)

            # create a dict of (start, end) tuples that wrap the
            # value of a token so that we can use it in the format
            # method later
            self.styles = {}

            # we iterate over the `_styles` attribute of a style item
            # that contains the parsed style values.
            for token, style in self.style:
                start = end = ''
                # a style item is a tuple in the following form:
                # colors are readily specified in hex: 'RRGGBB'
                if style['color']:
                    start += '<font color="#%s">' % style['color']
                    end = '</font>' + end
                if style['bold']:
                    start += '<b>'
                    end = '</b>' + end
                if style['italic']:
                    start += '<i>'
                    end = '</i>' + end
                if style['underline']:
                    start += '<u>'
                    end = '</u>' + end
                self.styles[token] = (start, end)

        def format(self, tokensource, outfile):
            # lastval is a string we use for caching
            # because it's possible that an lexer yields a number
            # of consecutive tokens with the same token type.
            # to minimize the size of the generated html markup we
            # try to join the values of same-type tokens here
            lastval = ''
            lasttype = None

            # wrap the whole output with <pre>
            outfile.write('<pre>')

            for ttype, value in tokensource:
                # if the token type doesn't exist in the stylemap
                # we try it with the parent of the token type
                # eg: parent of Token.Literal.String.Double is
                # Token.Literal.String
                while ttype not in self.styles:
                    ttype = ttype.parent
                if ttype == lasttype:
                    # the current token type is the same of the last
                    # iteration. cache it
                    lastval += value
                else:
                    # not the same token as last iteration, but we
                    # have some data in the buffer. wrap it with the
                    # defined style and write it to the output file
                    if lastval:
                        stylebegin, styleend = self.styles[lasttype]
                        outfile.write(stylebegin + lastval + styleend)
                    # set lastval/lasttype to current values
                    lastval = value
                    lasttype = ttype

            # if something is left in the buffer, write it to the
            # output file, then close the opened <pre> tag
            if lastval:
                stylebegin, styleend = self.styles[lasttype]
                outfile.write(stylebegin + lastval + styleend)
            outfile.write('</pre>\n')

The comments should explain it. Again, this formatter doesn't override the
`get_style_defs()` method. If we would have used CSS classes instead of
inline HTML markup, we would need to generate the CSS first. For that
purpose the `get_style_defs()` method exists:


Generating Style Definitions
============================

Some formatters like the `LatexFormatter` and the `HtmlFormatter` don't
output inline markup but reference either macros or css classes. Because
the definitions of those are not part of the output, the `get_style_defs()`
method exists. It is passed one parameter (if it's used and how it's used
is up to the formatter) and has to return a string or ``None``.