diff options
author | wiemann <wiemann@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2006-01-09 20:44:25 +0000 |
---|---|---|
committer | wiemann <wiemann@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2006-01-09 20:44:25 +0000 |
commit | d77fdfef70e08114f57cbef5d91707df8717ea9f (patch) | |
tree | 49444e3486c0c333cb7b33dfa721296c08ee4ece /docs/dev/hacking.txt | |
parent | 53cd16ca6ca5f638cbe5956988e88f9339e355cf (diff) | |
parent | 3993c4097756e9885bcfbd07cb1cc1e4e95e50e4 (diff) | |
download | docutils-0.4.tar.gz |
Release 0.4: tagging released revisiondocutils-0.4
git-svn-id: http://svn.code.sf.net/p/docutils/code/tags/docutils-0.4@4268 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
Diffstat (limited to 'docs/dev/hacking.txt')
-rw-r--r-- | docs/dev/hacking.txt | 264 |
1 files changed, 264 insertions, 0 deletions
diff --git a/docs/dev/hacking.txt b/docs/dev/hacking.txt new file mode 100644 index 000000000..d0ec9a3fb --- /dev/null +++ b/docs/dev/hacking.txt @@ -0,0 +1,264 @@ +========================== + Docutils_ Hacker's Guide +========================== + +:Author: Felix Wiemann +:Contact: Felix.Wiemann@ososo.de +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +:Abstract: This is the introduction to Docutils for all persons who + want to extend Docutils in some way. +:Prerequisites: You have used reStructuredText_ and played around with + the `Docutils front-end tools`_ before. Some (basic) Python + knowledge is certainly helpful (though not necessary, strictly + speaking). + +.. _Docutils: http://docutils.sourceforge.net/ +.. _reStructuredText: http://docutils.sourceforge.net/rst.html +.. _Docutils front-end tools: ../user/tools.html + +.. contents:: + + +Overview of the Docutils Architecture +===================================== + +To give you an understanding of the Docutils architecture, we'll dive +right into the internals using a practical example. + +Consider the following reStructuredText file:: + + My *favorite* language is Python_. + + .. _Python: http://www.python.org/ + +Using the ``rst2html.py`` front-end tool, you would get an HTML output +which looks like this:: + + [uninteresting HTML code removed] + <body> + <div class="document"> + <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p> + </div> + </body> + </html> + +While this looks very simple, it's enough to illustrate all internal +processing stages of Docutils. Let's see how this document is +processed from the reStructuredText source to the final HTML output: + + +Reading the Document +-------------------- + +The **Reader** reads the document from the source file and passes it +to the parser (see below). The default reader is the standalone +reader (``docutils/readers/standalone.py``) which just reads the input +data from a single text file. Unless you want to do really fancy +things, there is no need to change that. + +Since you probably won't need to touch readers, we will just move on +to the next stage: + + +Parsing the Document +-------------------- + +The **Parser** analyzes the the input document and creates a **node +tree** representation. In this case we are using the +**reStructuredText parser** (``docutils/parsers/rst/__init__.py``). +To see what that node tree looks like, we call ``quicktest.py`` (which +can be found in the ``tools/`` directory of the Docutils distribution) +with our example file (``test.txt``) as first parameter (Windows users +might need to type ``python quicktest.py test.txt``):: + + $ quicktest.py test.txt + <document source="test.txt"> + <paragraph> + My + <emphasis> + favorite + language is + <reference name="Python" refname="python"> + Python + . + <target ids="python" names="python" refuri="http://www.python.org/"> + +Let us now examine the node tree: + +The top-level node is ``document``. It has a ``source`` attribute +whose value is ``text.txt``. There are two children: A ``paragraph`` +node and a ``target`` node. The ``paragraph`` in turn has children: A +text node ("My "), an ``emphasis`` node, a text node (" language is "), +a ``reference`` node, and again a ``Text`` node ("."). + +These node types (``document``, ``paragraph``, ``emphasis``, etc.) are +all defined in ``docutils/nodes.py``. The node types are internally +arranged as a class hierarchy (for example, both ``emphasis`` and +``reference`` have the common superclass ``Inline``). To get an +overview of the node class hierarchy, use epydoc (type ``epydoc +nodes.py``) and look at the class hierarchy tree. + + +Transforming the Document +------------------------- + +In the node tree above, the ``reference`` node does not contain the +target URI (``http://www.python.org/``) yet. + +Assigning the target URI (from the ``target`` node) to the +``reference`` node is *not* done by the parser (the parser only +translates the input document into a node tree). + +Instead, it's done by a **Transform**. In this case (resolving a +reference), it's done by the ``ExternalTargets`` transform in +``docutils/transforms/references.py``. + +In fact, there are quite a lot of Transforms, which do various useful +things like creating the table of contents, applying substitution +references or resolving auto-numbered footnotes. + +The Transforms are applied after parsing. To see how the node tree +has changed after applying the Transforms, we use the +``rst2pseudoxml.py`` tool: + +.. parsed-literal:: + + $ rst2pseudoxml.py test.txt + <document source="test.txt"> + <paragraph> + My + <emphasis> + favorite + language is + <reference name="Python" **refuri="http://www.python.org/"**> + Python + . + <target ids="python" names="python" ``refuri="http://www.python.org/"``> + +For our small test document, the only change is that the ``refname`` +attribute of the reference has been replaced by a ``refuri`` +attribute |---| the reference has been resolved. + +While this does not look very exciting, transforms are a powerful tool +to apply any kind of transformation on the node tree. + +By the way, you can also get a "real" XML representation of the node +tree by using ``rst2xml.py`` instead of ``rst2pseudoxml.py``. + + +Writing the Document +-------------------- + +To get an HTML document out of the node tree, we use a **Writer**, the +HTML writer in this case (``docutils/writers/html4css1.py``). + +The writer receives the node tree and returns the output document. +For HTML output, we can test this using the ``rst2html.py`` tool:: + + $ rst2html.py --link-stylesheet test.txt + <?xml version="1.0" encoding="utf-8" ?> + <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> + <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> + <meta name="generator" content="Docutils 0.3.10: http://docutils.sourceforge.net/" /> + <title></title> + <link rel="stylesheet" href="../docutils/writers/html4css1/html4css1.css" type="text/css" /> + </head> + <body> + <div class="document"> + <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p> + </div> + </body> + </html> + +So here we finally have our HTML output. The actual document contents +are in the fourth-last line. Note, by the way, that the HTML writer +did not render the (invisible) ``target`` node |---| only the +``paragraph`` node and its children appear in the HTML output. + + +Extending Docutils +================== + +Now you'll ask, "how do I actually extend Docutils?" + +First of all, once you are clear about *what* you want to achieve, you +have to decide *where* to implement it |---| in the Parser (e.g. by +adding a directive or role to the reStructuredText parser), as a +Transform, or in the Writer. There is often one obvious choice among +those three (Parser, Transform, Writer). If you are unsure, ask on +the Docutils-develop_ mailing list. + +In order to find out how to start, it is often helpful to look at +similar features which are already implemented. For example, if you +want to add a new directive to the reStructuredText parser, look at +the implementation of a similar directive in +``docutils/parsers/rst/directives/``. + + +Modifying the Document Tree Before It Is Written +------------------------------------------------ + +You can modify the document tree right before the writer is called. +One possibility is to use the publish_doctree_ and +publish_from_doctree_ functions. + +To retrieve the document tree, call:: + + document = docutils.core.publish_doctree(...) + +Please see the docstring of publish_doctree for a list of parameters. + +.. XXX Need to write a well-readable list of (commonly used) options + of the publish_* functions. Probably in api/publisher.txt. + +``document`` is the root node of the document tree. You can now +change the document by accessing the ``document`` node and its +children |---| see `The Node Interface`_ below. + +When you're done with modifying the document tree, you can write it +out by calling:: + + output = docutils.core.publish_from_doctree(document, ...) + +.. _publish_doctree: ../api/publisher.html#publish_doctree +.. _publish_from_doctree: ../api/publisher.html#publish_from_doctree + + +The Node Interface +------------------ + +As described in the overview above, Docutils' internal representation +of a document is a tree of nodes. We'll now have a look at the +interface of these nodes. + +(To be completed.) + + +What Now? +========= + +This document is not complete. Many topics could (and should) be +covered here. To find out with which topics we should write about +first, we are awaiting *your* feedback. So please ask your questions +on the Docutils-develop_ mailing list. + + +.. _Docutils-develop: ../user/mailing-lists.html#docutils-develop + + +.. |---| unicode:: 8212 .. em-dash + :trim: + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: |