diff options
Diffstat (limited to 'docutils/docs/dev/hacking.txt')
-rw-r--r-- | docutils/docs/dev/hacking.txt | 264 |
1 files changed, 0 insertions, 264 deletions
diff --git a/docutils/docs/dev/hacking.txt b/docutils/docs/dev/hacking.txt deleted file mode 100644 index d0ec9a3fb..000000000 --- a/docutils/docs/dev/hacking.txt +++ /dev/null @@ -1,264 +0,0 @@ -========================== - Docutils_ Hacker's Guide -========================== - -:Author: Felix Wiemann -:Contact: Felix.Wiemann@ososo.de -:Revision: $Revision$ -:Date: $Date$ -:Copyright: This document has been placed in the public domain. - -:Abstract: This is the introduction to Docutils for all persons who - want to extend Docutils in some way. -:Prerequisites: You have used reStructuredText_ and played around with - the `Docutils front-end tools`_ before. Some (basic) Python - knowledge is certainly helpful (though not necessary, strictly - speaking). - -.. _Docutils: http://docutils.sourceforge.net/ -.. _reStructuredText: http://docutils.sourceforge.net/rst.html -.. _Docutils front-end tools: ../user/tools.html - -.. contents:: - - -Overview of the Docutils Architecture -===================================== - -To give you an understanding of the Docutils architecture, we'll dive -right into the internals using a practical example. - -Consider the following reStructuredText file:: - - My *favorite* language is Python_. - - .. _Python: http://www.python.org/ - -Using the ``rst2html.py`` front-end tool, you would get an HTML output -which looks like this:: - - [uninteresting HTML code removed] - <body> - <div class="document"> - <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p> - </div> - </body> - </html> - -While this looks very simple, it's enough to illustrate all internal -processing stages of Docutils. Let's see how this document is -processed from the reStructuredText source to the final HTML output: - - -Reading the Document --------------------- - -The **Reader** reads the document from the source file and passes it -to the parser (see below). The default reader is the standalone -reader (``docutils/readers/standalone.py``) which just reads the input -data from a single text file. Unless you want to do really fancy -things, there is no need to change that. - -Since you probably won't need to touch readers, we will just move on -to the next stage: - - -Parsing the Document --------------------- - -The **Parser** analyzes the the input document and creates a **node -tree** representation. In this case we are using the -**reStructuredText parser** (``docutils/parsers/rst/__init__.py``). -To see what that node tree looks like, we call ``quicktest.py`` (which -can be found in the ``tools/`` directory of the Docutils distribution) -with our example file (``test.txt``) as first parameter (Windows users -might need to type ``python quicktest.py test.txt``):: - - $ quicktest.py test.txt - <document source="test.txt"> - <paragraph> - My - <emphasis> - favorite - language is - <reference name="Python" refname="python"> - Python - . - <target ids="python" names="python" refuri="http://www.python.org/"> - -Let us now examine the node tree: - -The top-level node is ``document``. It has a ``source`` attribute -whose value is ``text.txt``. There are two children: A ``paragraph`` -node and a ``target`` node. The ``paragraph`` in turn has children: A -text node ("My "), an ``emphasis`` node, a text node (" language is "), -a ``reference`` node, and again a ``Text`` node ("."). - -These node types (``document``, ``paragraph``, ``emphasis``, etc.) are -all defined in ``docutils/nodes.py``. The node types are internally -arranged as a class hierarchy (for example, both ``emphasis`` and -``reference`` have the common superclass ``Inline``). To get an -overview of the node class hierarchy, use epydoc (type ``epydoc -nodes.py``) and look at the class hierarchy tree. - - -Transforming the Document -------------------------- - -In the node tree above, the ``reference`` node does not contain the -target URI (``http://www.python.org/``) yet. - -Assigning the target URI (from the ``target`` node) to the -``reference`` node is *not* done by the parser (the parser only -translates the input document into a node tree). - -Instead, it's done by a **Transform**. In this case (resolving a -reference), it's done by the ``ExternalTargets`` transform in -``docutils/transforms/references.py``. - -In fact, there are quite a lot of Transforms, which do various useful -things like creating the table of contents, applying substitution -references or resolving auto-numbered footnotes. - -The Transforms are applied after parsing. To see how the node tree -has changed after applying the Transforms, we use the -``rst2pseudoxml.py`` tool: - -.. parsed-literal:: - - $ rst2pseudoxml.py test.txt - <document source="test.txt"> - <paragraph> - My - <emphasis> - favorite - language is - <reference name="Python" **refuri="http://www.python.org/"**> - Python - . - <target ids="python" names="python" ``refuri="http://www.python.org/"``> - -For our small test document, the only change is that the ``refname`` -attribute of the reference has been replaced by a ``refuri`` -attribute |---| the reference has been resolved. - -While this does not look very exciting, transforms are a powerful tool -to apply any kind of transformation on the node tree. - -By the way, you can also get a "real" XML representation of the node -tree by using ``rst2xml.py`` instead of ``rst2pseudoxml.py``. - - -Writing the Document --------------------- - -To get an HTML document out of the node tree, we use a **Writer**, the -HTML writer in this case (``docutils/writers/html4css1.py``). - -The writer receives the node tree and returns the output document. -For HTML output, we can test this using the ``rst2html.py`` tool:: - - $ rst2html.py --link-stylesheet test.txt - <?xml version="1.0" encoding="utf-8" ?> - <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> - <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> - <head> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> - <meta name="generator" content="Docutils 0.3.10: http://docutils.sourceforge.net/" /> - <title></title> - <link rel="stylesheet" href="../docutils/writers/html4css1/html4css1.css" type="text/css" /> - </head> - <body> - <div class="document"> - <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p> - </div> - </body> - </html> - -So here we finally have our HTML output. The actual document contents -are in the fourth-last line. Note, by the way, that the HTML writer -did not render the (invisible) ``target`` node |---| only the -``paragraph`` node and its children appear in the HTML output. - - -Extending Docutils -================== - -Now you'll ask, "how do I actually extend Docutils?" - -First of all, once you are clear about *what* you want to achieve, you -have to decide *where* to implement it |---| in the Parser (e.g. by -adding a directive or role to the reStructuredText parser), as a -Transform, or in the Writer. There is often one obvious choice among -those three (Parser, Transform, Writer). If you are unsure, ask on -the Docutils-develop_ mailing list. - -In order to find out how to start, it is often helpful to look at -similar features which are already implemented. For example, if you -want to add a new directive to the reStructuredText parser, look at -the implementation of a similar directive in -``docutils/parsers/rst/directives/``. - - -Modifying the Document Tree Before It Is Written ------------------------------------------------- - -You can modify the document tree right before the writer is called. -One possibility is to use the publish_doctree_ and -publish_from_doctree_ functions. - -To retrieve the document tree, call:: - - document = docutils.core.publish_doctree(...) - -Please see the docstring of publish_doctree for a list of parameters. - -.. XXX Need to write a well-readable list of (commonly used) options - of the publish_* functions. Probably in api/publisher.txt. - -``document`` is the root node of the document tree. You can now -change the document by accessing the ``document`` node and its -children |---| see `The Node Interface`_ below. - -When you're done with modifying the document tree, you can write it -out by calling:: - - output = docutils.core.publish_from_doctree(document, ...) - -.. _publish_doctree: ../api/publisher.html#publish_doctree -.. _publish_from_doctree: ../api/publisher.html#publish_from_doctree - - -The Node Interface ------------------- - -As described in the overview above, Docutils' internal representation -of a document is a tree of nodes. We'll now have a look at the -interface of these nodes. - -(To be completed.) - - -What Now? -========= - -This document is not complete. Many topics could (and should) be -covered here. To find out with which topics we should write about -first, we are awaiting *your* feedback. So please ask your questions -on the Docutils-develop_ mailing list. - - -.. _Docutils-develop: ../user/mailing-lists.html#docutils-develop - - -.. |---| unicode:: 8212 .. em-dash - :trim: - - -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - End: |