.. -*- rst-mode -*- Syntax Highlight ================ .. contents:: .. sectnum:: Syntax highlighting significantly enhances the readability of code. However, in the current version, docutils does not highlight literal blocks. This sandbox project aims to add syntax highlight of code blocks to the capabilities of docutils. To find its way into the docutils core, it should meet the requirements laid out in a mail on `Questions about writing programming manuals and scientific documents`__, by docutils main developer David Goodger: I'd be happy to include Python source colouring support, and other languages would be welcome too. A multi-language solution would be useful, of course. My issue is providing support for all output formats -- HTML and LaTeX and XML and anything in the future -- simultaneously. Just HTML isn't good enough. Until there is a generic-output solution, this will be something users will have to put together themselves. __ http://sourceforge.net/mailarchive/message.php?msg_id=12921194 State of the art ---------------- There are already docutils extensions providing syntax colouring, e.g: SilverCity_, a C++ library and Python extension that can provide lexical analysis for over 20 different programming languages. A recipe__ for a "code-block" directive provides syntax highlight by SilverCity. __ http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/252170 `listings`_, a LaTeX package providing highly customisable and advanced syntax highlight, though only for LaTeX (and LaTeX derived PS|PDF). See also section `listings.sty`_ and a proposal__ Gael Varoquaux. __ http://article.gmane.org/gmane.text.docutils.devel/3914 Trac_ has `reStructuredText support`__ and offers syntax highlighting with a "code-block" directive using GNU Enscript_, SilverCity_, or Pygments_. __ http://trac.edgewall.org/wiki/WikiRestructuredText rest2web_, the "site builder" provides the `colorize`__ macro (using the `Moin-Moin Python colorizer`_) __ http://www.voidspace.org.uk/python/rest2web/macros.html#colorize Pygments_ is a generic syntax highlighter written completely in Python. * Usable as a command-line tool and as a Python package. * A wide range of common `languages and markup formats`_ is supported. * Additionally, OpenOffice's ``*.odt`` is supported by the odtwriter_. * The layout is configurable by style sheets. * Several built-in styles and an option for line-numbering. * Built-in output formats include HTML, LaTeX, rtf * Support for new languages, formats, and styles is added easily (modular structure, Python code, existing documentation). * Well documented and actively maintained. * The web site provides a recipe for `using Pygments in ReST documents`_. It is used in the `Pygments enhanced docutils front-ends`_ below. Odtwriter_, experimental writer for Docutils OpenOffice export supports syntax colours using Pygments. (See section `Odtwriter syntax`_.) Pygments_ seems to be the most promising docutils highlighter. For printed output, the listings_ package has its advantages too. Pygments enhanced docutils front-ends ------------------------------------- Syntax highlight can be achieved by `front-end scripts`_ combining docutils and pygments. Advantages: + Easy implementation with no changes to the stock docutils_. + Separation of code blocks and ordinary literal blocks. Disadvantages: 1. "code-block" content is formatted by `pygments`_ and inserted in the document tree as a "raw" node making the approach writer-dependant. 2. documents are incompatible with the standard docutils because of the locally defined directive. 3. more "invasive" markup distracting from content (no "minimal" code block marker -- three additional lines per code block) Point 1 and 2 lead to the `code-block directive proposal`_. Point 3 becomes an issue in literate programming where a code block is the most used block markup. It is addressed in the proposal for a `configurable literal block directive`_). `code-block` directive proposal ------------------------------- Reading """"""" Felix Wiemann provided a `proof of concept`_ script that utilizes the pygments_ parser to parse a source code string and store the result in the document tree. This concept is used in a `pygments_code_block_directive`_ (Source: `pygments_code_block_directive.py`_), to define and register a "code-block" directive. * The ``DocutilsInterface`` class uses pygments to parse the content of the directive and classify the tokens using short CSS class names identical to pygments HTML output. If pygments is not available, the unparsed code is returned. * The ``code_block_directive`` function inserts the tokens in a "rich" element with "classified" nodes. The XML rendering of the small example file `myfunction.py.txt`_ looks like `myfunction.py.xml`_. Writing """"""" The writers can use the class information in the elements to render the tokens. They should ignore the class information if they are unable to use it or to pass it on. HTML The "html" writer works out of the box. * The rst2html-highlight_ front end registers the "code-block" directive and converts an input file to html. * Styling is done with the adapted CSS style sheet `pygments-default.css`_ based on docutils' default stylesheet and the output of ``pygmentize -S default -f html``. * The result looks like `myfunction.py.htm`_. The "s5" and "pep" writers are not tested yet. XML "xml" and "pseudoxml" work out of the box, too. See `myfunction.py.xml`_ and `myfunction.py.pseudoxml`_ LaTeX Latex writers must be updated to handle the "rich" element correct. * The "latex" writer currently fails to handle "classified" doctree elements. The output `myfunction.py.tex`_ contains undefined control sequences ``\docutilsroleNone``. * The "newlatex2e" writer produces a valid LaTeX document (`myfunction.py.newlatex2e.tex`_). However the `pdflatex` output looks mixed up a bit (`myfunction.py.newlatex2e.pdf`_). The pygments-produced style file will not currently work with "newlatex2e" output. OpenOffice The non-official "odtwriter" provides syntax highlight with pygments but uses a different syntax. TODO """" * fix the "latex" writers. * think about an interface for pygments' options (like "encoding" or "linenumbers"). .. _proof of concept: http://article.gmane.org/gmane.text.docutils.user/3689 .. _pygments_code_block_directive.py: ../pygments_code_block_directive.py .. _pygments_code_block_directive: pygments_code_block_directive-bunt.py.htm .. _pygments_docutils_interface.py: pygments_docutils_interface.py .. _myfunction.py.txt: myfunction.py.txt .. _myfunction.py.xml: myfunction.py.xml .. _myfunction.py.htm: myfunction.py.htm .. _myfunction.py.pseudoxml: myfunction.py.pseudoxml .. _myfunction.py.tex: myfunction.py.tex .. _myfunction.py.newlatex2e.tex: myfunction.py.newlatex2e.tex .. _myfunction.py.newlatex2e.pdf: myfunction.py.newlatex2e.pdf .. _rst2html-highlight: ../rst2html-highlight .. _pygments-long.css: ../data/pygments-long.css Configurable literal block directive ------------------------------------ Goal """" A clean and simple syntax for highlighted code blocks -- preserving the space saving feature of the "minimised" literal block marker (``::`` at the end of a text paragraph). This is especially desirable in documents with many code blocks like tutorials or literate programs. Inline analogon """"""""""""""" The *role* of inline `interpreted text` can be customised with the "default-role" directive. This allows the use of the concise "backtick" syntax for the most often used role, e.g. in a chemical paper, one could use:: .. default-role:: subscript The triple point of H\ `2`\O is at 0°C. .. default-role:: subscript to produce The triple point of H\ `2`\O is at 0°C. This customisation is currently not possible for block markup. Proposal """""""" * Define a new "literal" directive for an ordinary literal block. This would insert the block content into the document tree as "literal-block" element with no parsing. * Define a "literal-block" setting that controls which directive is called on a block following ``::``. Default would be the "literal" directive. Alternatively, define a new "default-literal-block" directive instead of a settings key. * From a syntax view, this would be analog to the behaviour of the odtwriter_. (I am not sure about the representation in the document tree, though.) Motive """""" Analogue to customising the default role of "interpreted text" with the "default-role" directive, the concise ``::`` literal-block markup could be used for e.g. * a "code-block" or "sourcecode" directive for colourful code (analog to the one in the `pygments enhanced docutils front-ends`_) * the "line-block" directive for poems or addresses * the "parsed-literal" directive Example (using the upcoming "settings" directive):: ordinary literal block:: some text typeset in monospace .. settings:: :literal-block: code-block python colourful Python code:: def hello(): print "hello world" In the same line, a "default-block-quote" setting or directive could be considered to configure the role of a block quote. Odtwriter syntax ---------------- Dave Kuhlman's odtwriter_ extension can add syntax highlighting to ordinary literal blocks. The ``--add-syntax-highlighting`` command line flag activates syntax highlighting in literal blocks. By default, the "python" lexer is used. You can change this within your reST document with the `sourcecode` directive:: .. sourcecode:: off ordinary literal block:: content set in teletype .. sourcecode:: on .. sourcecode:: python colourful Python code:: def hello(): print "hello world" The "sourcecode" directive defined by the odtwriter is principally different from the "code-block" directive of ``rst2html-pygments``: * The odtwriter directive does not have content. It is a switch. * The syntax highlighting state and language/lexer set by this directive remain in effect until the next sourcecode directive is encountered in the reST document. ``.. sourcecode:: `` make highlighting active or inactive. is either ``on`` or ``off``. ``.. sourcecode:: `` change the lexer parsing literal code blocks. should be one of aliases listed at pygment's `languages and markup formats`_. I.e. the odtwriter implements a `configurable literal block directive`_ (but with a slightly different syntax than my proposal below). ``listings.sty`` ---------------- Using the listings_ LaTeX package for syntax highlight is currently not possible with the standard latex writer output. Support for the use of listings_ with docutils is an issue that must be settled separate from the `code-block directive proposal`_. It needs * a new, specialized docutils latex writer, or * a new option (and behaviour) to the existing latex writer. Ideas and experimental code is in the Sandbox under `latex-variants`_. .. External links .. _pylit: http://pylit.berlios.de .. _docutils: http://docutils.sourceforge.net/ .. _rest2web: http://www.voidspace.org.uk/python/rest2web/ .. _Enscript: http://www.gnu.org/software/enscript/enscript.html .. _SilverCity: http://silvercity.sourceforge.net/ .. _Trac: http://trac.edgewall.org/ .. _Moin-Moin Python colorizer: http://www.standards-schmandards.com/2005/fangs-093/ .. _odtwriter: http://www.rexx.com/~dkuhlman/odtwriter.html .. _pygments: http://pygments.org/ .. _listings: http://www.ctan.org/tex-archive/help/Catalogue/entries/listings.html .. _fancyvrb: http://www.ctan.org/tex-archive/help/Catalogue/entries/fancyvrb.html .. _alltt: http://www.ctan.org/tex-archive/help/Catalogue/entries/alltt.html .. _moreverb: http://www.ctan.org/tex-archive/help/Catalogue/entries/moreverb.html .. _verbatim: http://www.ctan.org/tex-archive/help/Catalogue/entries/verbatim.html .. _languages and markup formats: http://pygments.org/languages .. _Using Pygments in ReST documents: http://pygments.org/docs/rstdirective/ .. _Docutils Document Tree: http://docutils.sf.net/docs/ref/doctree.html#classes .. _latex-variants: http://docutils.sourceforge.net/sandbox/latex-variants/ .. Internal links .. _front-end scripts: ../tools/pygments-enhanced-front-ends .. _pygments-default.css: ../data/pygments-default.css