Syntax Highlight

Contents

Syntax highlighting significantly enhances the readability of code. So it is almost a must for pretty-printing a literate program.

PyLit uses docutils as pretty-printing back end. However, in the current version, docutils does not highlight literal blocks. This may change in the future, as in a mail on Questions about writing programming manuals and scientific documents, docutils main developer David Goodger wrote:

I'd be happy to include Python source colouring support, and other languages would be welcome too. A multi-language solution would be useful, of course. My issue is providing support for all output formats -- HTML and LaTeX and XML and anything in the future -- simultaneously. Just HTML isn't good enough. Until there is a generic-output solution, this will be something users will have to put together themselves.

1   Existing highlighting additions to docutils

There are already docutils extensions providing syntax colouring, e.g:

Pygments seems to be the most promising docutils highlighter. For printed output, the listings package has its advantages too.

2   Pygments enhanced docutils front-ends

Here comes a working example for syntax highlighting in HTML and LaTeX output with pygments.

The example code in "Using Pygments in ReST documents" defines a new "sourcecode" directive. The directive takes one argument language and uses the Pygments source highlighter to parse and render its content as a colourful source code block.

Combining the pygments example code with the standard docutils front-ends, results in front-end scripts generating output documents with syntax colour. For consistency with the majority of existing add-ons, the directive is renamed to "code-block".

rst2html-pygments
enhances the standard docutils rst2html front-end to generate a HTML rendering with syntax highlight.
rst2latex-pygments
enhances docutils' rst2latex to generate LaTeX with syntax highlight.
Advantages:
  • Easy implementation with no changes to the stock docutils.
  • Separation of code blocks and ordinary literal blocks.
Disadvantages:
  • "code-block" content is formatted by pygments and inserted in the document tree as a "raw" node making the approach writer-dependant.
  • documents are incompatible with the standard docutils because of the locally defined directive.
  • more "invasive" markup distracting from content
  • no "minimal" code block marker -- three additional lines per code block

The later disadvantages become an issue in literate programming where a code block is the most used block markup (see the proposal for a configurable literal block directive below).

To support the .. code-block:: directive, the PyLit converter would need a configurable "code block marker" instead of the hard coded :: presently in use. (See also the code-block directive section in pylit.py.)

3   Proposal for a code-block directive in docutils

In a post to the docutils users list, David Goodger wrote (after an all too long discussion):

Here are my pronouncements:

  • If reST is to grow a code-block (or sourcecode or syntax-highlight or whatever) directive, it must be independent of the output format.
  • The result will be stored in a literal_block node in the document tree. There will be no new element.
  • There will be no "unparsed" code-block. It would make no sense.
  • There will be no special pass-through support for LaTeX to do its own syntax highlighting.

On 7.06.07, David Goodger wrote:

On 6/7/07, G. Milde suggested:

  1. Docutils will support optional features that are only available if a recommended package or module is installed.

    -> code-block directive content would be

    • rendered with syntax highlight if import pygments works,
    • output as "ordinary" literal-block (preserve space, mono-coloured fixed-width font) if import pygments fails.

+1 on number 3.

Implemented 2007-06-08.

3.1   Parsing

Felix Wiemann provided a proof of concept script that utilizes the pygments parser to parse a source code string and store the result in the document tree.

This concept is used in pygments_code_block_directive, (source: pygments_code_block_directive.py), to define and register a "code-block" directive.

  • The DocutilsInterface class uses pygments to parse the content of the directive and classify the tokens using short CSS class names identical to pygments HTML output. If pygments is not available, the unparsed code is returned.
  • The code_block_directive function inserts the tokens in a "rich" <literal_block> element with "classified" <inline> nodes.

The XML rendering of the small example file myfunction.py.txt looks like myfunction.py.xml.

3.2   Writing

The writers can use the class information in the <inline> elements to render the tokens. They should ignore the class information if they are unable to use it or to pass it on.

HTML

The "html" writer works out of the box.

  • The rst2html-highlight front end registers the "code-block" directive and converts an input file to html.
  • Styling is done with the adapted CSS style sheet pygments-default.css based on docutils' default stylesheet and the output of pygmentize -S default -f html.
  • The result looks like myfunction.py.html.

The "s5" and "pep" writers are not tested yet.

XML
"xml" and "pseudoxml" work out of the box, too. See myfunction.py.xml and myfunction.py.pseudoxml
LaTeX

Latex writers must be updated to handle the "rich" <literal_block> element correct.

  • The "latex" writer currently fails to handle "classified" <inline> doctree elements. The output myfunction.py.tex contains undefined control sequences \docutilsroleNone.

  • The "newlatex2e" writer produces a valid LaTeX document (myfunction.py.newlatex2e.tex). However the pdflatex output looks mixed up a bit (myfunction.py.newlatex2e.pdf).

    The pygments-produced style file will not currently work with "newlatex2e" output.

OpenOffice
The non-official "odtwriter" provides syntax highlight with pygments but uses a different syntax.

3.3   TODO

  • fix the "latex" writer.
  • think about an interface for pygments' options (like "encoding" or "linenumbers").

4   Configurable literal block directive

4.1   Goal

A clean and simple syntax for highlighted code blocks -- preserving the space saving feature of the "minimised" literal block marker (:: at the end of a text paragraph). This is especially desirable in literate programs with many code blocks.

4.2   Inline analogon

The role of inline interpreted text can be customised with the "default-role" directive. This allows the use of the concise "backtick" syntax for the most often used role, e.g. in a chemical paper, one could use:

.. default-role:: subscript

The triple point of H`2`O is at 0°C.

This customisation is currently not possible for block markup.

4.3   Proposal: make the default "literal block" role configurable.

  • Define a new "literal" directive for an ordinary literal block. This would insert the block content into the document tree as "literal-block" element with no parsing.

  • Define a "literal-block" setting that controls which directive is called on a block following ::. Default would be the "literal" directive.

    Alternatively, define a new "default-literal-block" directive instead of a settings key.

  • From a syntax view, this would be analog to the behaviour of the odtwriter. (I am not sure about the representation in the document tree, though.)

4.3.1   Motivation

Analogue to customising the default role of "interpreted text" with the "default-role" directive, the concise :: literal-block markup could be used for e.g.

  • a "code-block" or "sourcecode" directive for colourful code (analog to the one in the pygments enhanced docutils front-ends)
  • the "line-block" directive for poems or addresses
  • the "parsed-literal" directive

Example (using the upcoming "settings" directive):

ordinary literal block::

   some text typeset in monospace

.. settings::
   :literal-block:  code-block python

colourful Python code::

   def hello():
       print "hello world"

In the same line, a "default-block-quote" setting or directive could be considered to configure the role of a block quote.

5   Odtwriter

Dave Kuhlman's odtwriter extension can add syntax highlighting to ordinary literal blocks.

The --add-syntax-highlighting command line flag activates syntax highlighting in literal blocks. By default, the "python" lexer is used.

You can change this within your reST document with the sourcecode directive:

.. sourcecode:: off

ordinary literal block::

   content set in teletype

.. sourcecode:: on
.. sourcecode:: python

colourful Python code::

   def hello():
       print "hello world"

The "sourcecode" directive defined by the odtwriter is principally different from the "code-block" directive of rst2html-pygments:

I.e. the odtwriter implements a configurable literal block directive (but with a slightly different syntax than my proposal below).

6   Syntax highlight with the listings.sty LaTeX package

Using the listings LaTeX package for syntax highlight is currently not possible with the standard latex writer output.

Support for the use of listings with docutils is an issue that must be settled separate from the proposal for a code-block directive in docutils. It would need