[eggs2000] "Spam, Spam, Spam, Eggs,
+ Bacon, and Spam"
+
+1. No markup::
+
+ A URI http://spam.org, see eggs2000 (in Bacon [Publisher]).
+ Also see http://eggs.org.
+
+ eggs2000 "Spam, Spam, Spam, Eggs, Bacon, and Spam"
+
+2. StructuredText absolute/relative URI syntax
+ ("text":http://www.url.org)::
+
+ A "URI":http://spam.org, see [eggs2000] (in Bacon [Publisher]).
+ Also see "http://eggs.org":http://eggs.org.
+
+ .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"
+
+ Note that StructuredText does not recognize standalone URIs,
+ forcing doubling up as shown in the second line of the example
+ above.
+
+3. StructuredText absolute-only URI syntax
+ ("text", mailto:you@your.com)::
+
+ A "URI", http://spam.org, see [eggs2000] (in Bacon
+ [Publisher]). Also see "http://eggs.org", http://eggs.org.
+
+ .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"
+
+4. reStructuredText syntax::
+
+ 4. A URI_, see [eggs2000]_ (in Bacon [Publisher]).
+ Also see http://eggs.org.
+
+ .. _URI: http:/spam.org
+ .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"
+
+The bracketed text '[Publisher]' may be problematic with
+StructuredText (syntax 2 & 3).
+
+reStructuredText's syntax (#4) is definitely the most readable. The
+text is separated from the link URI and the footnote, resulting in
+cleanly readable text.
+
+.. _StructuredText:
+ http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage
+.. _Setext: http://docutils.sourceforge.net/mirror/setext.html
+.. _reStructuredText: http://docutils.sourceforge.net/rst.html
+.. _detailed description:
+ http://www.tibsnjoan.demon.co.uk/STNG-format.html
+.. _STMinus: http://www.cis.upenn.edu/~edloper/pydoc/stminus.html
+.. _StructuredTextNG:
+ http://dev.zope.org/Members/jim/StructuredTextWiki/StructuredTextNG
+.. _README: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/
+ python/python/dist/src/README
+.. _Emacs table mode: http://table.sourceforge.net/
+.. _reStructuredText Markup Specification: reStructuredText.html
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/dev/todo.txt b/docs/dev/todo.txt
new file mode 100644
index 000000000..311751f37
--- /dev/null
+++ b/docs/dev/todo.txt
@@ -0,0 +1,385 @@
+================
+ Docutils Notes
+================
+:Date: $Date$
+:Revision: $Revision$
+
+.. contents::
+
+To Do
+=====
+
+General
+-------
+
+- Document!
+
+ - Internal module documentation.
+
+ - User docs.
+
+ - Doctree nodes (DTD element) semantics:
+
+ - External (public) attributes (node.attributes).
+ - Internal attributes (node.*).
+ - Linking mechanism.
+
+- Refactor
+
+ - Rename methods & variables according to the `coding conventions`_
+ below.
+
+ - The name->id conversion and hyperlink resolution code needs to be
+ checked for correctness and refactored. I'm afraid it's a bit of
+ a spaghetti mess now.
+
+- Add validation? See http://pytrex.sourceforge.net, RELAX NG.
+
+- Ask Python-dev for opinions (GvR for a pronouncement) on special
+ variables (__author__, __version__, etc.): convenience vs. namespace
+ pollution. Ask opinions on whether or not Docutils should recognize
+ & use them.
+
+- Provide a mechanism to pass options to Readers, Writers, and Parsers
+ through docutils.core.publish/Publisher? Or create custom
+ Reader/Writer/Parser objects first, and pass *them* to
+ publish/Publisher?
+
+- In reader.get_reader_class (& parser & writer too), should we be
+ importing 'standalone' or 'docutils.readers.standalone'? (This would
+ avoid importing top-level modules if the module name is not in
+ docutils/readers. Potential nastiness.)
+
+- Perhaps store a name->id mapping file? This could be stored
+ permanently, read by subsequent processing runs, and updated with
+ new entries. ("Persistent ID mapping"?)
+
+- The "Docutils System Messages" section appears even if no actual
+ system messages are there. They must be below the threshold. The
+ transform should be fixed.
+
+- TOC transform: use alt-text for inline images.
+
+
+Specification
+-------------
+
+- Complete PEP 258 Docutils Design Specification.
+
+ - Fill in the blanks in API details.
+
+ - Specify the nodes.py internal data structure implementation.
+
+ [Tibs:] Eventually we need to have direct documentation in
+ there on how it all hangs together - the DTD is not enough
+ (indeed, is it still meant to be correct? [Yes, it is.]).
+
+- Rework PEP 257, separating style from spec from tools, wrt Docutils?
+ See Doc-SIG from 2001-06-19/20.
+
+- Add layout component to framework? Or part of the formatter?
+
+- Once doctree.txt is fleshed out, how about breaking (most of) it up
+ and putting it into nodes.py as docstrings?
+
+
+reStructuredText Parser
+-----------------------
+
+- Add motivation sections for constructs in spec.
+
+- Allow very long titles (on two or more lines)?
+
+- And for the sake of completeness, should definition list terms be
+ allowed to be very long (two or more lines) also?
+
+- Allow hyperlink references to targets in other documents? Not in an
+ HTML-centric way, though (it's trivial to say
+ ``http://www.whatever.com/doc#name``, and useless in non-HTML
+ contexts). XLink/XPointer? ``.. baseref::``? See Doc-SIG
+ 2001-08-10.
+
+- Add character processing? For example:
+
+ - ``--`` -> em-dash (or ``--`` -> en-dash, and ``---`` -> em-dash).
+ (Look for pre-existing conventions.)
+ - Convert quotes to curly quote entities. (Essentially impossible
+ for HTML? Unnecessary for TeX. An output issue?)
+ - Various forms of ``:-)`` to smiley icons.
+ - ``"\ "`` -> .
+ - Escaped newlines -> .
+ - Escaped period or quote as a disappearing catalyst to allow
+ character-level inline markup?
+ - Others?
+
+ How to represent character entities in the text though? Probably as
+ Unicode.
+
+ Which component is responsible for this, the parser, the reader, or
+ the writer?
+
+- Implement the header row separator modification to table.el. (Wrote
+ to Takaaki Ota & the table.el mailing list on 2001-08-12, suggesting
+ support for '=====' header rows. On 2001-08-17 he replied, saying
+ he'd put it on his to-do list, but "don't hold your breath".)
+
+- Tony says inline markup rule 7 could do with a *little* more
+ exposition in the spec, to make clear what is going on for people
+ with head colds.
+
+- Alan Jaffray suggested (and I agree) that it would be sensible to:
+
+ - have a directive to specify a default role for interpreted text
+ - allow the reST processor to take an argument for the default role
+ - issue a warning when processing documents with no default role
+ which contain interpreted text with no explicitly specified role
+
+- Fix the parser's indentation handling to conform with the stricter
+ definition in the spec. (Explicit markup blocks should be strict or
+ forgiving?)
+
+- Tighten up the spec for indentation of "constructs using complex
+ markers": field lists and option lists? Bodies may begin on the
+ same line as the marker or on a subsequent line (with blank lines
+ optional). Require that for bodies beginning on the same line as
+ the marker, all lines be in strict alignment. Currently, this is
+ acceptable::
+
+ :Field-name-of-medium-length: Field body beginning on the same
+ line as the field name.
+
+ This proposal would make the above example illegal, instead
+ requiring strict alignment. A field body may either begin on the
+ same line::
+
+ :Field-name-of-medium-length: Field body beginning on the same
+ line as the field name.
+
+ Or it may begin on a subsequent line::
+
+ :Field-name-of-medium-length:
+ Field body beginning on a line subsequent to that of the
+ field name.
+
+ This would be especially relevant in degenerate cases like this::
+
+ :Number-of-African-swallows-requried-to-carry-a-coconut:
+ It would be very difficult to align the field body with
+ the left edge of the first line if it began on the same
+ line as the field name.
+
+- Allow syntax constructs to be added or disabled at run-time.
+
+- Make footnotes two-way, GNU-style? What if there are multiple
+ references to a single footnote?
+
+- Add RFC-2822 header parsing (for PEP, email Readers).
+
+- Change ``.. meta::`` to use a "pending" element, only activated for
+ HTML writers.
+
+- Allow for variant styles by interpreting indented lists as if they
+ weren't indented? For example, currently the list below will be
+ parsed as a list within a block quote::
+
+ paragraph
+
+ * list item 1
+ * list item 2
+
+ But a lot of people seem to write that way, and HTML browsers make
+ it look as if that's the way it should be. The parser could check
+ the contents of block quotes, and if they contain only a single
+ list, remove the block quote wrapper. There would be two problems:
+
+ 1. What if we actually *do* want a list inside a block quote?
+
+ 2. What if such a list comes immediately after an indented
+ construct, such as a literal block?
+
+ Both could be solved using empty comments (problem 2 already exists
+ for a block quote after a literal block). But that's a hack.
+
+ See the Doc-SIG discussion starting 2001-04-18 with Ed Loper's
+ "Structuring: a summary; and an attempt at EBNF", item 4.
+
+- Produce a better system message when a list ends abruptly. Input::
+
+ -1 Option "1"
+ -2
+
+ Produces::
+
+ Reporter: WARNING (2) Unindent without blank line at line 2.
+
+ But it should produce::
+
+ Reporter: WARNING (2) List ends without blank line at line 2.
+
+
+Directives
+``````````
+
+- Allow directives to be added at run-time.
+
+- Use the language module for directive attribute names?
+
+- Add more attributes to the image directive: align, border?
+
+- Implement directives:
+
+ - html.imagemap
+
+ - components.endnotes, .citations, .topic, .sectnum (section
+ numbering; add support to .contents; could be cmdline option also)
+
+ - misc.raw
+
+ - misc.include: ``#include`` one file in another. But how to
+ parse wrt sections, reference names, conflicts?
+
+ - misc.exec: Execute Python code & insert the results. Perhaps
+ dangerous?
+
+ - misc.eval: Evaluate an expression & insert the text. At parse
+ time or at substitution time?
+
+ - block.qa: Questions & Answers. Implement as a generic two-column
+ marked list? Or as a standalone construct?
+
+ - block.columns: Multi-column table/list, with number of columns as
+ argument.
+
+ - block.verse: Paragraphs with linebreaks preserved. A directive
+ would be easy; what about a literal-block-like prefix, perhaps
+ ';;'? E.g.::
+
+ Take it away, Eric the orchestra leader! ;;
+
+ Half a bee,
+ Philosophically,
+ Must ipso-facto
+ Half not be.
+ You see?
+
+ ...
+
+ - colorize.python: Colorize Python code. Fine for HTML output, but
+ what about other formats? Revert to a literal block? Do we need
+ some kind of "alternate" mechanism? Perhaps use a "pending"
+ transform, which could switch its output based on the "format" in
+ use. Use a factory function "transformFF()" which returns either
+ "HTMLTransform()" instance or "GenericTransform" instance?
+
+ - text.date: Datestamp. For substitutions.
+
+ - Combined with misc.include, implement canned macros?
+
+
+Unimplemented Transforms
+------------------------
+
+- Footnote Gathering
+
+ Collect and move footnotes to the end of a document.
+
+- Hyperlink Target Gathering
+
+ It probably comes in two phases, because in a Python context we need
+ to *resolve* them on a per-docstring basis [do we? --DG], but if the
+ user is trying to do the callout form of presentation, they would
+ then want to group them all at the end of the document.
+
+- Reference Merging
+
+ When merging two or more subdocuments (such as docstrings),
+ conflicting references may need to be resolved. There may be:
+
+ - duplicate reference and/or substitution names that need to be made
+ unique; and/or
+ - duplicate footnote numbers that need to be renumbered.
+
+ Should this be done before or after reference-resolving transforms
+ are applied? What about references from within one subdocument to
+ inside another?
+
+- Document Splitting
+
+ If the processed document is written to multiple files (possibly in
+ a directory tree), it will need to be split up. References will
+ have to be adjusted.
+
+ (HTML only?)
+
+- Navigation
+
+ If a document is split up, each segment will need navigation links:
+ parent, children (small TOC), previous (preorder), next (preorder).
+
+- Index
+
+
+HTML Writer
+-----------
+
+- Considerations for an HTML Writer [#]_:
+
+ - Boolean attributes. ```` is good, ```` is bad. Use a special value in attribute
+ mappings, such as ``None``?
+
+ - Escape double-dashes inside comments.
+
+ - Put the language code into an appropriate element's LANG
+ attribute (?).
+
+ - Docutils identifiers (the "class" and "id" attributes) will
+ conform to the regular expression ``[a-z][-a-z0-9]*``. See
+ ``docutils.utils.id()``.
+
+ .. _HTML 4.01 spec: http://www.w3.org/TR/html401
+ .. _CSS1 spec: http://www.w3.org/TR/REC-CSS1
+ .. [#] Source: `HTML 4.0 in Netscape and Explorer`__.
+ __ http://www.webreference.com/dev/html4nsie/index.html
+
+- Allow for style sheet info to be passed in, either as a , or
+ as embedded style info.
+
+- Construct a templating system, as in ht2html/yaptu, using directives
+ and substitutions for dynamic stuff.
+
+- Improve the granularity of document parts in the HTML writer, so
+ that one could just grab the parts needed.
+
+
+Coding Conventions
+==================
+
+This project shall follow the generic coding conventions as specified
+in the `Style Guide for Python Code`__ and `Docstring Conventions`__
+PEPs, with the following clarifications:
+
+- 4 spaces per indentation level. No tabs.
+- No one-liner compound statements (i.e., no ``if x: return``: use two
+ lines & indentation), except for degenerate class or method
+ definitions (i.e., ``class X: pass`` is O.K.).
+- Lines should be no more than 78 or 79 characters long.
+- "CamelCase" shall be used for class names.
+- Use "lowercase" or "lowercase_with_underscores" for function,
+ method, and variable names. For short names, maximum two joined
+ words, use lowercase (e.g. 'tagname'). For long names with three or
+ more joined words, or where it's hard to parse the split between two
+ words, use lowercase_with_underscores (e.g., 'note_explicit_target',
+ 'explicit_target').
+
+__ http://www.python.org/peps/pep-0008.html
+__ http://www.python.org/peps/pep-0257.html
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/peps/pep-0256.txt b/docs/peps/pep-0256.txt
new file mode 100644
index 000000000..92c8e7f61
--- /dev/null
+++ b/docs/peps/pep-0256.txt
@@ -0,0 +1,253 @@
+PEP: 256
+Title: Docstring Processing System Framework
+Version: $Revision$
+Last-Modified: $Date$
+Author: goodger@users.sourceforge.net (David Goodger)
+Discussions-To: doc-sig@python.org
+Status: Draft
+Type: Standards Track
+Created: 01-Jun-2001
+Post-History: 13-Jun-2001
+
+
+Abstract
+
+ Python lends itself to inline documentation. With its built-in
+ docstring syntax, a limited form of Literate Programming [1]_ is
+ easy to do in Python. However, there are no satisfactory standard
+ tools for extracting and processing Python docstrings. The lack
+ of a standard toolset is a significant gap in Python's
+ infrastructure; this PEP aims to fill the gap.
+
+ The issues surrounding docstring processing have been contentious
+ and difficult to resolve. This PEP proposes a generic Docstring
+ Processing System (DPS) framework, which separates out the
+ components (program and conceptual), enabling the resolution of
+ individual issues either through consensus (one solution) or
+ through divergence (many). It promotes standard interfaces which
+ will allow a variety of plug-in components (input context readers,
+ markup parsers, and output format writers) to be used.
+
+ The concepts of a DPS framework are presented independently of
+ implementation details.
+
+
+Rationale
+
+ There are standard inline documentation systems for some other
+ languages. For example, Perl has POD [2]_ and Java has Javadoc
+ [3]_, but neither of these mesh with the Pythonic way. POD syntax
+ is very explicit, but takes after Perl in terms of readability.
+ Javadoc is HTML-centric; except for '@field' tags, raw HTML is
+ used for markup. There are also general tools such as Autoduck
+ [4]_ and Web (Tangle & Weave) [5]_, useful for multiple languages.
+
+ There have been many attempts to write auto-documentation systems
+ for Python (not an exhaustive list):
+
+ - Marc-Andre Lemburg's doc.py [6]_
+
+ - Daniel Larsson's pythondoc & gendoc [7]_
+
+ - Doug Hellmann's HappyDoc [8]_
+
+ - Laurence Tratt's Crystal [9]_
+
+ - Ka-Ping Yee's htmldoc & pydoc [10]_ (pydoc.py is now part of the
+ Python standard library; see below)
+
+ - Tony Ibbs' docutils [11]_
+
+ - Edward Loper's STminus formalization and related efforts [12]_
+
+ These systems, each with different goals, have had varying degrees
+ of success. A problem with many of the above systems was
+ over-ambition combined with inflexibility. They provided a
+ self-contained set of components: a docstring extraction system,
+ a markup parser, an internal processing system and one or more
+ output format writers. Inevitably, one or more aspects of each
+ system had serious shortcomings, and they were not easily extended
+ or modified, preventing them from being adopted as standard tools.
+
+ It has become clear (to this author, at least) that the "all or
+ nothing" approach cannot succeed, since no monolithic
+ self-contained system could possibly be agreed upon by all
+ interested parties. A modular component approach designed for
+ extension, where components may be multiply implemented, may be
+ the only chance for success. By separating out the issues, we can
+ form consensus more easily (smaller fights ;-), and accept
+ divergence more readily.
+
+ Each of the components of a docstring processing system should be
+ developed independently. A 'best of breed' system should be
+ chosen, either merged from existing systems, and/or developed
+ anew. This system should be included in Python's standard
+ library.
+
+
+PyDoc & Other Existing Systems
+
+ PyDoc became part of the Python standard library as of release
+ 2.1. It extracts and displays docstrings from within the Python
+ interactive interpreter, from the shell command line, and from a
+ GUI window into a web browser (HTML). Although a very useful
+ tool, PyDoc has several deficiencies, including:
+
+ - In the case of the GUI/HTML, except for some heuristic
+ hyperlinking of identifier names, no formatting of the
+ docstrings is done. They are presented within
+ tags to avoid unwanted line wrapping. Unfortunately, the result
+ is not attractive.
+
+ - PyDoc extracts docstrings and structural information (class
+ identifiers, method signatures, etc.) from imported module
+ objects. There are security issues involved with importing
+ untrusted code. Also, information from the source is lost when
+ importing, such as comments, "additional docstrings" (string
+ literals in non-docstring contexts; see PEP 258 [13]_), and the
+ order of definitions.
+
+ The functionality proposed in this PEP could be added to or used
+ by PyDoc when serving HTML pages. The proposed docstring
+ processing system's functionality is much more than PyDoc needs in
+ its current form. Either an independent tool will be developed
+ (which PyDoc may or may not use), or PyDoc could be expanded to
+ encompass this functionality and *become* the docstring processing
+ system (or one such system). That decision is beyond the scope of
+ this PEP.
+
+ Similarly for other existing docstring processing systems, their
+ authors may or may not choose compatibility with this framework.
+ However, if this framework is accepted and adopted as the Python
+ standard, compatibility will become an important consideration in
+ these systems' future.
+
+
+Specification
+
+ The docstring processing system framework consists of components,
+ as follows::
+
+ 1. Docstring conventions. Documents issues such as:
+
+ - What should be documented where.
+
+ - First line is a one-line synopsis.
+
+ PEP 257, Docstring Conventions [14]_, documents some of these
+ issues.
+
+ 2. Docstring processing system design specification. Documents
+ issues such as:
+
+ - High-level spec: what a DPS does.
+
+ - Command-line interface for executable script.
+
+ - System Python API.
+
+ - Docstring extraction rules.
+
+ - Readers, which encapsulate the input context .
+
+ - Parsers.
+
+ - Document tree: the intermediate internal data structure. The
+ output of the Parser and Reader, and the input to the Writer
+ all share the same data structure.
+
+ - Transforms, which modify the document tree.
+
+ - Writers for output formats.
+
+ - Distributors, which handle output management (one file, many
+ files, or objects in memory).
+
+ These issues are applicable to any docstring processing system
+ implementation. PEP 258, Docutils Design Specification [13 ]_,
+ documents these issues.
+
+ 3. Docstring processing system implementation.
+
+ 4. Input markup specifications: docstring syntax. PEP 2xx,
+ reStructuredText Standard Docstring Format [15]_, proposes a
+ standard syntax.
+
+ 5. Input parser implementations.
+
+ 6. Input context readers ("modes": Python source code, PEP,
+ standalone text file, email, etc.) and implementations.
+
+ 7. Output formats (HTML, XML, TeX, DocBook, info, etc.) and writer
+ implementations.
+
+ Components 1, 2/3, and 4/5 are the subject of individual companion
+ PEPs. If there is another implementation of the framework or
+ syntax/parser, additional PEPs may be required. Multiple
+ implementations of each of components 6 and 7 will be required;
+ the PEP mechanism may be overkill for these components.
+
+
+Project Web Site
+
+ A SourceForge project has been set up for this work at
+ http://docutils.sourceforge.net/.
+
+
+References and Footnotes
+
+ [1] http://www.literateprogramming.com/
+
+ [2] Perl "Plain Old Documentation"
+ http://www.perldoc.com/perl5.6/pod/perlpod.html
+
+ [3] http://java.sun.com/j2se/javadoc/
+
+ [4] http://www.helpmaster.com/hlp-developmentaids-autoduck.htm
+
+ [5] http://www-cs-faculty.stanford.edu/~knuth/cweb.html
+
+ [6] http://www.lemburg.com/files/python/SoftwareDescriptions.html#doc.py
+
+ [7] http://starship.python.net/crew/danilo/pythondoc/
+
+ [8] http://happydoc.sourceforge.net/
+
+ [9] http://www.btinternet.com/~tratt/comp/python/crystal/
+
+ [10] http://www.python.org/doc/current/lib/module-pydoc.html
+
+ [11] http://homepage.ntlworld.com/tibsnjoan/docutils/
+
+ [12] http://www.cis.upenn.edu/~edloper/pydoc/
+
+ [13] PEP 258, Docutils Design Specification, Goodger
+ http://www.python.org/peps/pep-0258.html
+
+ [14] PEP 257, Docstring Conventions, Goodger, Van Rossum
+ http://www.python.org/peps/pep-0257.html
+
+ [15] PEP 287, reStructuredText Standard Docstring Format, Goodger
+ http://www.python.org/peps/pep-0287.html
+
+ [16] http://www.python.org/sigs/doc-sig/
+
+
+Copyright
+
+ This document has been placed in the public domain.
+
+
+Acknowledgements
+
+ This document borrows ideas from the archives of the Python
+ Doc-SIG [16]_. Thanks to all members past & present.
+
+
+
+Local Variables:
+mode: indented-text
+indent-tabs-mode: nil
+fill-column: 70
+sentence-end-double-space: t
+End:
diff --git a/docs/peps/pep-0257.txt b/docs/peps/pep-0257.txt
new file mode 100644
index 000000000..48425d9cc
--- /dev/null
+++ b/docs/peps/pep-0257.txt
@@ -0,0 +1,248 @@
+PEP: 257
+Title: Docstring Conventions
+Version: $Revision$
+Last-Modified: $Date$
+Author: goodger@users.sourceforge.net (David Goodger),
+ guido@python.org (Guido van Rossum)
+Discussions-To: doc-sig@python.org
+Status: Active
+Type: Informational
+Created: 29-May-2001
+Post-History: 13-Jun-2001
+
+
+Abstract
+
+ This PEP documents the semantics and conventions associated with
+ Python docstrings.
+
+
+Rationale
+
+ The aim of this PEP is to standardize the high-level structure of
+ docstrings: what they should contain, and how to say it (without
+ touching on any markup syntax within docstrings). The PEP
+ contains conventions, not laws or syntax.
+
+ "A universal convention supplies all of maintainability,
+ clarity, consistency, and a foundation for good programming
+ habits too. What it doesn't do is insist that you follow it
+ against your will. That's Python!"
+
+ --Tim Peters on comp.lang.python, 2001-06-16
+
+ If you violate the conventions, the worst you'll get is some dirty
+ looks. But some software (such as the Docutils docstring
+ processing system [1] [2]) will be aware of the conventions, so
+ following them will get you the best results.
+
+
+Specification
+
+ What is a Docstring?
+ --------------------
+
+ A docstring is a string literal that occurs as the first statement
+ in a module, function, class, or method definition. Such a
+ docstring becomes the __doc__ special attribute of that object.
+
+ All modules should normally have docstrings, and all functions and
+ classes exported by a module should also have docstrings. Public
+ methods (including the __init__ constructor) should also have
+ docstrings. A package may be documented in the module docstring
+ of the __init__.py file in the package directory.
+
+ String literals occurring elsewhere in Python code may also act as
+ documentation. They are not recognized by the Python bytecode
+ compiler and are not accessible as runtime object attributes
+ (i.e. not assigned to __doc__), but two types of extra docstrings
+ may be extracted by software tools:
+
+ 1. String literals occurring immediately after a simple assignment
+ at the top level of a module, class, or __init__ method
+ are called "attribute docstrings".
+
+ 2. String literals occurring immediately after another docstring
+ are called "additional docstrings".
+
+ Please see PEP 258 "Docutils Design Specification" [2] for a
+ detailed description of attribute and additional docstrings.
+
+ XXX Mention docstrings of 2.2 properties.
+
+ For consistency, always use """triple double quotes""" around
+ docstrings. Use r"""raw triple double quotes""" if you use any
+ backslashes in your docstrings. For Unicode docstrings, use
+ u"""Unicode triple-quoted strings""".
+
+ There are two forms of docstrings: one-liners and multi-line
+ docstrings.
+
+ One-line Docstrings
+ --------------------
+
+ One-liners are for really obvious cases. They should really fit
+ on one line. For example::
+
+ def kos_root():
+ """Return the pathname of the KOS root directory."""
+ global _kos_root
+ if _kos_root: return _kos_root
+ ...
+
+ Notes:
+
+ - Triple quotes are used even though the string fits on one line.
+ This makes it easy to later expand it.
+
+ - The closing quotes are on the same line as the opening quotes.
+ This looks better for one-liners.
+
+ - There's no blank line either before or after the docstring.
+
+ - The docstring is a phrase ending in a period. It prescribes the
+ function or method's effect as a command ("Do this", "Return
+ that"), not as a description: e.g. don't write "Returns the
+ pathname ..."
+
+ - The one-line docstring should NOT be a "signature" reiterating
+ the function/method parameters (which can be obtained by
+ introspection). Don't do::
+
+ def function(a, b):
+ """function(a, b) -> list"""
+
+ This type of docstring is only appropriate for C functions (such
+ as built-ins), where introspection is not possible.
+
+ Multi-line Docstrings
+ ----------------------
+
+ Multi-line docstrings consist of a summary line just like a
+ one-line docstring, followed by a blank line, followed by a more
+ elaborate description. The summary line may be used by automatic
+ indexing tools; it is important that it fits on one line and is
+ separated from the rest of the docstring by a blank line. The
+ summary line may be on the same line as the opening quotes or on
+ the next line.
+
+ The entire docstring is indented the same as the quotes at its
+ first line (see example below). Docstring processing tools will
+ strip an amount of indentation from the second and further lines
+ of the docstring equal to the indentation of the first non-blank
+ line after the first line of the docstring. Relative indentation
+ of later lines in the docstring is retained.
+
+ Insert a blank line before and after all docstrings (one-line or
+ multi-line) that document a class -- generally speaking, the
+ class's methods are separated from each other by a single blank
+ line, and the docstring needs to be offset from the first method
+ by a blank line; for symmetry, put a blank line between the class
+ header and the docstring. Docstrings documenting functions or
+ methods generally don't have this requirement, unless the function
+ or method's body is written as a number of blank-line separated
+ sections -- in this case, treat the docstring as another section,
+ and precede it with a blank line.
+
+ The docstring of a script (a stand-alone program) should be usable
+ as its "usage" message, printed when the script is invoked with
+ incorrect or missing arguments (or perhaps with a "-h" option, for
+ "help"). Such a docstring should document the script's function
+ and command line syntax, environment variables, and files. Usage
+ messages can be fairly elaborate (several screens full) and should
+ be sufficient for a new user to use the command properly, as well
+ as a complete quick reference to all options and arguments for the
+ sophisticated user.
+
+ The docstring for a module should generally list the classes,
+ exceptions and functions (and any other objects) that are exported
+ by the module, with a one-line summary of each. (These summaries
+ generally give less detail than the summary line in the object's
+ docstring.) The docstring for a package (i.e., the docstring of
+ the package's __init__.py module) should also list the modules and
+ subpackages exported by the package.
+
+ The docstring for a function or method should summarize its
+ behavior and document its arguments, return value(s), side
+ effects, exceptions raised, and restrictions on when it can be
+ called (all if applicable). Optional arguments should be
+ indicated. It should be documented whether keyword arguments are
+ part of the interface.
+
+ The docstring for a class should summarize its behavior and list
+ the public methods and instance variables. If the class is
+ intended to be subclassed, and has an additional interface for
+ subclasses, this interface should be listed separately (in the
+ docstring). The class constructor should be documented in the
+ docstring for its __init__ method. Individual methods should be
+ documented by their own docstring.
+
+ If a class subclasses another class and its behavior is mostly
+ inherited from that class, its docstring should mention this and
+ summarize the differences. Use the verb "override" to indicate
+ that a subclass method replaces a superclass method and does not
+ call the superclass method; use the verb "extend" to indicate that
+ a subclass method calls the superclass method (in addition to its
+ own behavior).
+
+ *Do not* use the Emacs convention of mentioning the arguments of
+ functions or methods in upper case in running text. Python is
+ case sensitive and the argument names can be used for keyword
+ arguments, so the docstring should document the correct argument
+ names. It is best to list each argument on a separate line. For
+ example::
+
+ def complex(real=0.0, imag=0.0):
+ """Form a complex number.
+
+ Keyword arguments:
+ real -- the real part (default 0.0)
+ imag -- the imaginary part (default 0.0)
+
+ """
+ if imag == 0.0 and real == 0.0: return complex_zero
+ ...
+
+ The BDFL [3] recommends inserting a blank line between the last
+ paragraph in a multi-line docstring and its closing quotes,
+ placing the closing quotes on a line by themselves. This way,
+ Emacs' fill-paragraph command can be used on it.
+
+
+References and Footnotes
+
+ [1] PEP 256, Docstring Processing System Framework, Goodger
+ http://www.python.org/peps/pep-0256.html
+
+ [2] PEP 258, Docutils Design Specification, Goodger
+ http://www.python.org/peps/pep-0258.html
+
+ [3] Guido van Rossum, Python's creator and Benevolent Dictator For
+ Life.
+
+ [4] http://www.python.org/doc/essays/styleguide.html
+
+ [5] http://www.python.org/sigs/doc-sig/
+
+
+Copyright
+
+ This document has been placed in the public domain.
+
+
+Acknowledgements
+
+ The "Specification" text comes mostly verbatim from the Python
+ Style Guide essay by Guido van Rossum [4].
+
+ This document borrows ideas from the archives of the Python
+ Doc-SIG [5]. Thanks to all members past and present.
+
+
+
+Local Variables:
+mode: indented-text
+indent-tabs-mode: nil
+fill-column: 70
+sentence-end-double-space: t
+End:
diff --git a/docs/peps/pep-0258.txt b/docs/peps/pep-0258.txt
new file mode 100644
index 000000000..6a55e20de
--- /dev/null
+++ b/docs/peps/pep-0258.txt
@@ -0,0 +1,662 @@
+PEP: 258
+Title: Docutils Design Specification
+Version: $Revision$
+Last-Modified: $Date$
+Author: goodger@users.sourceforge.net (David Goodger)
+Discussions-To: doc-sig@python.org
+Status: Draft
+Type: Standards Track
+Requires: 256, 257
+Created: 31-May-2001
+Post-History: 13-Jun-2001
+
+
+Abstract
+
+ This PEP documents design issues and implementation details for
+ Docutils, a Python Docstring Processing System (DPS). The
+ rationale and high-level concepts of a DPS are documented in PEP
+ 256, "Docstring Processing System Framework" [1].
+
+ No changes to the core Python language are required by this PEP.
+ Its deliverables consist of a package for the standard library and
+ its documentation.
+
+
+Specification
+
+ Docstring Extraction Rules
+ ==========================
+
+ 1. What to examine:
+
+ a) If the "__all__" variable is present in the module being
+ documented, only identifiers listed in "__all__" are
+ examined for docstrings.
+
+ b) In the absense of "__all__", all identifiers are examined,
+ except those whose names are private (names begin with "_"
+ but don't begin and end with "__").
+
+ c) 1a and 1b can be overridden by a parameter or command-line
+ option.
+
+ 2. Where:
+
+ Docstrings are string literal expressions, and are recognized
+ in the following places within Python modules:
+
+ a) At the beginning of a module, function definition, class
+ definition, or method definition, after any comments. This
+ is the standard for Python __doc__ attributes.
+
+ b) Immediately following a simple assignment at the top level
+ of a module, class definition, or __init__ method
+ definition, after any comments. See "Attribute Docstrings"
+ below.
+
+ c) Additional string literals found immediately after the
+ docstrings in (a) and (b) will be recognized, extracted, and
+ concatenated. See "Additional Docstrings" below.
+
+ d) @@@ 2.2-style "properties" with attribute docstrings?
+
+ 3. How:
+
+ Whenever possible, Python modules should be parsed by Docutils,
+ not imported. There are security reasons for not importing
+ untrusted code. Information from the source is lost when using
+ introspection to examine an imported module, such as comments
+ and the order of definitions. Also, docstrings are to be
+ recognized in places where the bytecode compiler ignores string
+ literal expressions (2b and 2c above), meaning importing the
+ module will lose these docstrings. Of course, standard Python
+ parsing tools such as the "parser" library module may be used.
+
+ When the Python source code for a module is not available
+ (i.e. only the .pyc file exists) or for C extension modules, to
+ access docstrings the module can only be imported, and any
+ limitations must be lived with.
+
+ Since attribute docstrings and additional docstrings are ignored
+ by the Python bytecode compiler, no namespace pollution or runtime
+ bloat will result from their use. They are not assigned to
+ __doc__ or to any other attribute. The initial parsing of a
+ module may take a slight performance hit.
+
+
+ Attribute Docstrings
+ --------------------
+
+ (This is a simplified version of PEP 224 [2] by Marc-Andre
+ Lemberg.)
+
+ A string literal immediately following an assignment statement is
+ interpreted by the docstring extration machinery as the docstring
+ of the target of the assignment statement, under the following
+ conditions:
+
+ 1. The assignment must be in one of the following contexts:
+
+ a) At the top level of a module (i.e., not nested inside a
+ compound statement such as a loop or conditional): a module
+ attribute.
+
+ b) At the top level of a class definition: a class attribute.
+
+ c) At the top level of the "__init__" method definition of a
+ class: an instance attribute.
+
+ Since each of the above contexts are at the top level (i.e., in
+ the outermost suite of a definition), it may be necessary to
+ place dummy assignments for attributes assigned conditionally
+ or in a loop.
+
+ 2. The assignment must be to a single target, not to a list or a
+ tuple of targets.
+
+ 3. The form of the target:
+
+ a) For contexts 1a and 1b above, the target must be a simple
+ identifier (not a dotted identifier, a subscripted
+ expression, or a sliced expression).
+
+ b) For context 1c above, the target must be of the form
+ "self.attrib", where "self" matches the "__init__" method's
+ first parameter (the instance parameter) and "attrib" is a
+ simple indentifier as in 3a.
+
+ Blank lines may be used after attribute docstrings to emphasize
+ the connection between the assignment and the docstring.
+
+ Examples::
+
+ g = 'module attribute (module-global variable)'
+ """This is g's docstring."""
+
+ class AClass:
+
+ c = 'class attribute'
+ """This is AClass.c's docstring."""
+
+ def __init__(self):
+ self.i = 'instance attribute'
+ """This is self.i's docstring."""
+
+
+ Additional Docstrings
+ ---------------------
+
+ (This idea was adapted from PEP 216, Docstring Format [3], by
+ Moshe Zadka.)
+
+ Many programmers would like to make extensive use of docstrings
+ for API documentation. However, docstrings do take up space in
+ the running program, so some of these programmers are reluctant to
+ "bloat up" their code. Also, not all API documentation is
+ applicable to interactive environments, where __doc__ would be
+ displayed.
+
+ The docstring processing system's extraction tools will
+ concatenate all string literal expressions which appear at the
+ beginning of a definition or after a simple assignment. Only the
+ first strings in definitions will be available as __doc__, and can
+ be used for brief usage text suitable for interactive sessions;
+ subsequent string literals and all attribute docstrings are
+ ignored by the Python bytecode compiler and may contain more
+ extensive API information.
+
+ Example::
+
+ def function(arg):
+ """This is __doc__, function's docstring."""
+ """
+ This is an additional docstring, ignored by the bytecode
+ compiler, but extracted by the Docutils.
+ """
+ pass
+
+ Issue: This breaks "from __future__ import" statements in Python
+ 2.1 for multiple module docstrings. The Python Reference Manual
+ specifies:
+
+ A future statement must appear near the top of the module.
+ The only lines that can appear before a future statement are:
+
+ * the module docstring (if any),
+ * comments,
+ * blank lines, and
+ * other future statements.
+
+ Resolution?
+
+ 1. Should we search for docstrings after a __future__ statement?
+ Very ugly.
+
+ 2. Redefine __future__ statements to allow multiple preceeding
+ string literals?
+
+ 3. Or should we not even worry about this? There shouldn't be
+ __future__ statements in production code, after all. Will
+ modules with __future__ statements simply have to put up with
+ the single-docstring limitation?
+
+
+ Choice of Docstring Format
+ ==========================
+
+ Rather than force everyone to use a single docstring format,
+ multiple input formats are allowed by the processing system. A
+ special variable, __docformat__, may appear at the top level of a
+ module before any function or class definitions. Over time or
+ through decree, a standard format or set of formats should emerge.
+
+ The __docformat__ variable is a string containing the name of the
+ format being used, a case-insensitive string matching the input
+ parser's module or package name (i.e., the same name as required
+ to "import" the module or package), or a registered alias. If no
+ __docformat__ is specified, the default format is "plaintext" for
+ now; this may be changed to the standard format once determined.
+
+ The __docformat__ string may contain an optional second field,
+ separated from the format name (first field) by a single space: a
+ case-insensitive language identifier as defined in RFC 1766 [4].
+ A typical language identifier consists of a 2-letter language code
+ from ISO 639 [5] (3-letter codes used only if no 2-letter code
+ exists; RFC 1766 is currently being revised to allow 3-letter
+ codes). If no language identifier is specified, the default is
+ "en" for English. The language identifier is passed to the parser
+ and can be used for language-dependent markup features.
+
+
+ Docutils Project Model
+ ======================
+
+ ::
+
+ +--------------------------+
+ | Docutils: |
+ | docutils.core.Publisher, |
+ | docutils.core.publish() |
+ +--------------------------+
+ / \
+ / \
+ 1,3,5 / \ 6,8
+ +--------+ +--------+
+ | READER | =======================> | WRITER |
+ +--------+ +--------+
+ // \ / \
+ // \ / \
+ 2 // 4 \ 7 / 9 \
+ +--------+ +------------+ +------------+ +--------------+
+ | PARSER |...| reader | | writer |...| DISTRIBUTOR? |
+ +--------+ | transforms | | transforms | | |
+ | | | | | - one file |
+ | - docinfo | | - styling | | - many files |
+ | - titles | | - writer- | | - objects in |
+ | - linking | | specific | | memory |
+ | - lookups | | - etc. | +--------------+
+ | - reader- | +------------+
+ | specific |
+ | - parser- |
+ | specific |
+ | - layout |
+ | - etc. |
+ +------------+
+
+ The numbers indicate the path a document would take through the
+ code. Double-width lines between reader & parser and between
+ reader & writer, indicating that data sent along these paths
+ should be standard (pure & unextended) Docutils doc trees.
+ Single-width lines signify that internal tree extensions or
+ completely unrelated representations are possible, but they must
+ be supported internally at both ends.
+
+
+ Publisher
+ ---------
+
+ The "docutils.core" module contains a "Publisher" facade class and
+ "publish" convenience function. Publisher encapsulates the
+ high-level logic of a Docutils system. The Publisher.publish()
+ method passes its input to its Reader, then passes the resulting
+ document tree through its Writer to its destination.
+
+
+ Readers
+ -------
+
+ Readers understand the input context (where the data is coming
+ from), send the whole input or discrete "chunks" to the parser,
+ and provide the context to bind the chunks together back into a
+ cohesive whole. Using transforms_, Readers also resolve
+ references, footnote numbers, interpreted text processing, and
+ anything else that requires context-sensitive computation.
+
+ Each reader is a module or package exporting a "Reader" class with
+ a "read" method. The base "Reader" class can be found in the
+ docutils/readers/__init__.py module.
+
+ Most Readers will have to be told what parser to use. So far (see
+ the list of examples below), only the Python Source Reader
+ (PySource) will be able to determine the syntax on its own.
+
+ Responsibilities:
+
+ - Do raw input on the source ("Reader.scan()").
+
+ - Pass the raw text to the parser, along with a fresh doctree
+ root ("Reader.parse()").
+
+ - Run transforms over the doctree(s) ("Reader.transform()").
+
+ Examples:
+
+ - Standalone/Raw/Plain: Just read a text file and process it. The
+ reader needs to be told which parser to use. Parser-specific
+ readers?
+
+ - Python Source: See `Python Source Reader`_ above.
+
+ - Email: RFC-822 headers, quoted excerpts, signatures, MIME parts.
+
+ - PEP: RFC-822 headers, "PEP xxxx" and "RFC xxxx" conversion to
+ URIs. Either interpret PEPs' indented sections or convert
+ existing PEPs to reStructuredText (or both?).
+
+ - Wiki: Global reference lookups of "wiki links" incorporated into
+ transforms. (CamelCase only or unrestricted?) Lazy
+ indentation?
+
+ - Web Page: As standalone, but recognize meta fields as meta tags.
+ Support for templates of some sort? (After , before
+ ?)
+
+ - FAQ: Structured "question & answer(s)" constructs.
+
+ - Compound document: Merge chapters into a book. Master TOC file?
+
+
+ Parsers
+ -------
+
+ Parsers analyze their input and produce a Docutils `document
+ tree`_. They don't know or care anything about the source or
+ destination of the data.
+
+ Each input parser is a module or package exporting a "Parser"
+ class with a "parse" method. The base "Parser" class can be found
+ in the docutils/parsers/__init__.py module.
+
+ Responsibilities: Given raw input text and a doctree root node,
+ populate the doctree by parsing the input text.
+
+ Example: The only parser implemented so far is for the
+ reStructuredText markup.
+
+
+ Transforms
+ ----------
+
+ Transforms change the document tree from one form to another, add
+ to the tree, or prune it. Transforms are run by Reader and Writer
+ objects. Some transforms are Reader-specific, some are
+ Parser-specific, and others are Writer-specific. The choice and
+ order of transforms is specified in the Reader and Writer objects.
+
+ Each transform is a class in a module in the docutils/transforms
+ package, a subclass of docutils.tranforms.Transform.
+
+ Responsibilities:
+
+ - Modify a doctree in-place, either purely transforming one
+ structure into another, or adding new structures based on the
+ doctree and/or external data.
+
+ Examples (in "docutils.transforms"):
+
+ - frontmatter.DocInfo: conversion of document metadata
+ (bibliographic information).
+
+ - references.Hyperlinks: resolution of hyperlinks.
+
+ - document.Merger: combining multiple populated doctrees into one.
+
+
+ Writers
+ -------
+
+ Writers produce the final output (HTML, XML, TeX, etc.). Writers
+ translate the internal document tree structure into the final data
+ format, possibly running output-specific transforms_ first.
+
+ Each writer is a module or package exporting a "Writer" class with
+ a "write" method. The base "Writer" class can be found in the
+ docutils/writers/__init__.py module.
+
+ Responsibilities:
+
+ - Run transforms over the doctree(s).
+
+ - Translate doctree(s) into specific output formats.
+
+ - Transform references into format-native forms.
+
+ - Write output to the destination (possibly via a "Distributor").
+
+ Examples:
+
+ - XML: Various forms, such as DocBook. Also, raw doctree XML.
+
+ - HTML
+
+ - TeX
+
+ - Plain text
+
+ - reStructuredText?
+
+
+ Distributors
+ ------------
+
+ Distributors will exist for each method of storing the results of
+ processing:
+
+ - In a single file on disk.
+
+ - In a tree of directories and files on disk.
+
+ - In a single tree-shaped data structure in memory.
+
+ - Some other set of data structures in memory.
+
+ @@@ Distributors are currently just an idea; they may or may not
+ be practical. Issues:
+
+ Is it better for the writer to control the distributor, or
+ vice versa? Or should they be equals?
+
+ Looking at the list of writers, it seems that only HTML would
+ require anything other than monolithic output. Perhaps merge
+ the HTML "distributor" into "writer" variants?
+
+ Perhaps translator/writer instead of writer/distributor?
+
+ Responsibilities:
+
+ - Do raw output to the destination.
+
+ - Transform references per incarnation (method of distribution).
+
+ Examples:
+
+ - Single file.
+
+ - Multiple files & directories.
+
+ - Objects in memory.
+
+
+ Docutils Package Structure
+ ==========================
+
+ - Package "docutils".
+
+ - Module "docutils.core" contains facade class "Publisher" and
+ convenience function "publish()". See `Publisher API`_ below.
+
+ - Module "docutils.nodes" contains the Docutils document tree
+ element class library plus Visitor pattern base classes. See
+ `Document Tree`_ below.
+
+ - Module "docutils.roman" contains Roman numeral conversion
+ routines.
+
+ - Module "docutils.statemachine" contains a finite state machine
+ specialized for regular-expression-based text filters. The
+ reStructuredText parser implementation is based on this
+ module.
+
+ - Module "docutils.urischemes" contains a mapping of known URI
+ schemes ("http", "ftp", "mail", etc.).
+
+ - Module "docutils.utils" contains utility functions and
+ classes, including a logger class ("Reporter"; see `Error
+ Handling`_ below).
+
+ - Package "docutils.parsers": markup parsers_.
+
+ - Function "get_parser_class(parsername)" returns a parser
+ module by name. Class "Parser" is the base class of
+ specific parsers. (docutils/parsers/__init__.py)
+
+ - Package "docutils.parsers.rst": the reStructuredText parser.
+
+ - Alternate markup parsers may be added.
+
+ - Package "docutils.readers": context-aware input readers.
+
+ - Function "get_reader_class(readername)" returns a reader
+ module by name or alias. Class "Reader" is the base class
+ of specific readers. (docutils/readers/__init__.py)
+
+ - Module "docutils.readers.standalone": reads independent
+ document files.
+
+ - Readers to be added for: Python source code (structure &
+ docstrings), PEPs, email, FAQ, and perhaps Wiki and others.
+
+ - Package "docutils.writers": output format writers.
+
+ - Function "get_writer_class(writername)" returns a writer
+ module by name. Class "Writer" is the base class of
+ specific writers. (docutils/writers/__init__.py)
+
+ - Module "docutils.writers.pprint" is a simple internal
+ document tree writer; it writes indented pseudo-XML.
+
+ - Module "docutils.writers.html4css1" is a simple HyperText
+ Markup Language document tree writer for HTML 4.01 and CSS1.
+
+ - Writers to be added: HTML 3.2 or 4.01-loose, XML (various
+ forms, such as DocBook and the raw internal doctree), TeX,
+ plaintext, reStructuredText, and perhaps others.
+
+ - Package "docutils.transforms": tree transform classes.
+
+ - Class "Transform" is the base class of specific transforms;
+ see `Transform API`_ below.
+ (docutils/transforms/__init__.py)
+
+ - Each module contains related transform classes.
+
+ - Package "docutils.languages": Language modules contain
+ language-dependent strings and mappings. They are named for
+ their language identifier (as defined in `Choice of Docstring
+ Format`_ above), converting dashes to underscores.
+
+ - Function "getlanguage(languagecode)", returns matching
+ language module. (docutils/languages/__init__.py)
+
+ - Module "docutils.languages.en" (English).
+
+ - Other languages to be added.
+
+
+ Front-End Tools
+ ===============
+
+ @@@ To be determined.
+
+ @@@ Document tools & summarize their command-line interfaces.
+
+
+ Document Tree
+ =============
+
+ A single intermediate data structure is used internally by
+ Docutils, in the interfaces between components; it is defined in
+ the docutils.nodes module. It is not required that this data
+ structure be used *internally* by any of the components, just
+ *between* components. This data structure is similar to a DOM
+ tree whose schema is documented in an XML DTD (eXtensible Markup
+ Language Document Type Definition), which comes in two parts:
+
+ - the Docutils Generic DTD, docutils.dtd [6], and
+
+ - the OASIS Exchange Table Model, soextbl.dtd [7].
+
+ The DTD defines a rich set of elements, suitable for many input
+ and output formats. The DTD retains all information necessary to
+ reconstruct the original input text, or a reasonable facsimile
+ thereof.
+
+
+ Error Handling
+ ==============
+
+ When the parser encounters an error in markup, it inserts a system
+ message (DTD element "system_message"). There are five levels of
+ system messages:
+
+ - Level-0, "DEBUG": an internal reporting issue. There is no
+ effect on the processing. Level-0 system messages are
+ handled separately from the others.
+
+ - Level-1, "INFO": a minor issue that can be ignored. There is
+ little or no effect on the processing. Typically level-1 system
+ messages are not reported.
+
+ - Level-2, "WARNING": an issue that should be addressed. If
+ ignored, there may be unpredictable problems with the output.
+ Typically level-2 system messages are reported but do not halt
+ processing
+
+ - Level-3, "ERROR": a major issue that should be addressed. If
+ ignored, the output will contain errors. Typically level-3
+ system messages are reported but do not halt processing
+
+ - Level-4, "SEVERE": a critical error that must be addressed.
+ Typically level-4 system messages are turned into exceptions
+ which halt processing. If ignored, the output will contain
+ severe errors.
+
+ Although the initial message levels were devised independently,
+ they have a strong correspondence to VMS error condition severity
+ levels [8]; the names in quotes for levels 1 through 4 were
+ borrowed from VMS. Error handling has since been influenced by
+ the log4j project [9].
+
+
+References and Footnotes
+
+ [1] PEP 256, Docstring Processing System Framework, Goodger
+ http://www.python.org/peps/pep-0256.html
+
+ [2] PEP 224, Attribute Docstrings, Lemburg
+ http://www.python.org/peps/pep-0224.html
+
+ [3] PEP 216, Docstring Format, Zadka
+ http://www.python.org/peps/pep-0216.html
+
+ [4] http://www.rfc-editor.org/rfc/rfc1766.txt
+
+ [5] http://lcweb.loc.gov/standards/iso639-2/englangn.html
+
+ [6] http://docutils.sourceforge.net/spec/docutils.dtd
+
+ [7] http://docstring.sourceforge.net/spec/soextblx.dtd
+
+ [8] http://www.openvms.compaq.com:8000/73final/5841/
+ 5841pro_027.html#error_cond_severity
+
+ [9] http://jakarta.apache.org/log4j/
+
+ [10] http://www.python.org/sigs/doc-sig/
+
+
+Project Web Site
+
+ A SourceForge project has been set up for this work at
+ http://docutils.sourceforge.net/.
+
+
+Copyright
+
+ This document has been placed in the public domain.
+
+
+Acknowledgements
+
+ This document borrows ideas from the archives of the Python
+ Doc-SIG [10]. Thanks to all members past & present.
+
+
+
+Local Variables:
+mode: indented-text
+indent-tabs-mode: nil
+fill-column: 70
+sentence-end-double-space: t
+End:
diff --git a/docs/ref/doctree.txt b/docs/ref/doctree.txt
new file mode 100644
index 000000000..90aea7054
--- /dev/null
+++ b/docs/ref/doctree.txt
@@ -0,0 +1,344 @@
+==================================
+ Docutils Document Tree Structure
+==================================
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+
+This document describes the internal data structure representing
+document trees in Docutils. The data structure is defined by the
+hierarchy of classes in the ``docutils.nodes`` module. It is also
+formally described by the `Docutils Generic DTD`_ XML document type
+definition, docutils.dtd_, which is the definitive source for element
+hierarchy details.
+
+Below is a simplified diagram of the hierarchy of element types in the
+Docutils document tree structure. An element may contain any other
+elements immediately below it in the diagram. Text in square brackets
+are notes. Element types in parentheses indicate recursive or
+one-to-many relationships; sections may contain (sub)sections, tables
+contain further body elements, etc. ::
+
+ +--------------------------------------------------------------------+
+ | document [may begin with a title, subtitle, docinfo] |
+ | +--------------------------------------+
+ | | sections [each begins with a title] |
+ +-----------------------------+-------------------------+------------+
+ | [body elements:] | (sections) |
+ | | - literal | - lists | | - hyperlink +------------+
+ | | blocks | - tables | | targets |
+ | para- | - doctest | - block | foot- | - sub. defs |
+ | graphs | blocks | quotes | notes | - comments |
+ +---------+-----------+----------+-------+--------------+
+ | [text]+ | [text] | (body elements) | [text] |
+ | (inline +-----------+------------------+--------------+
+ | markup) |
+ +---------+
+
+
+-------------------
+ Element Hierarchy
+-------------------
+
+A class hierarchy has been implemented in nodes.py where the position
+of the element (the level at which it can occur) is significant.
+E.G., Root, Structural, Body, Inline classes etc. Certain
+transformations will be easier because we can use isinstance() on
+them.
+
+The elements making up Docutils document trees can be categorized into
+the following groups:
+
+- _`Root element`: document_
+
+- _`Title elements`: title_, subtitle_
+
+- _`Bibliographic elements`: docinfo_, author_, authors_,
+ organization_, contact_, version_, revision_, status_, date_,
+ copyright_
+
+- _`Structural elements`: document_, section_, topic_, transition_
+
+- _`Body elements`:
+
+ - _`General body elements`: paragraph_, literal_block_,
+ block_quote_, doctest_block_, table_, figure_, image_, footnote_
+
+ - _`Lists`: bullet_list_, enumerated_list_, definition_list_,
+ field_list_, option_list_
+
+ - _`Admonitions`: note_, tip_, warning_, error_, caution_, danger_,
+ important_
+
+ - _`Special body elements`: target_, substitution_definition_,
+ comment_, system_warning_
+
+- _`Inline elements`: emphasis_, strong_, interpreted_, literal_,
+ reference_, target_, footnote_reference_, substitution_reference_,
+ image_, problematic_
+
+
+``Node``
+========
+
+
+``Text``
+========
+
+
+``Element``
+===========
+
+
+``TextElement``
+===============
+
+
+-------------------
+ Element Reference
+-------------------
+
+``document``
+============
+description
+
+contents
+
+External attributes
+-------------------
+`Common external attributes`_.
+
+
+Internal attributes
+-------------------
+- `Common internal attributes`_.
+- ``explicittargets``
+- ``implicittargets``
+- ``externaltargets``
+- ``indirecttargets``
+- ``refnames``
+- ``anonymoustargets``
+- ``anonymousrefs``
+- ``autofootnotes``
+- ``autofootnoterefs``
+- ``reporter``
+
+
+---------------------
+ Attribute Reference
+---------------------
+
+External Attributes
+===================
+
+Through the `%basic.atts;`_ parameter entity, all elements share the
+following _`common external attributes`: id_, name_, dupname_,
+source_.
+
+
+``anonymous``
+-------------
+The ``anonymous`` attribute
+
+
+``auto``
+--------
+The ``auto`` attribute
+
+
+``dupname``
+-----------
+The ``dupname`` attribute
+
+
+``id``
+------
+The ``id`` attribute
+
+
+``name``
+--------
+The ``name`` attribute
+
+
+``refid``
+---------
+The ``refid`` attribute
+
+
+``refname``
+-----------
+The ``refname`` attribute
+
+
+``refuri``
+----------
+The ``refuri`` attribute
+
+
+``source``
+----------
+The ``source`` attribute
+
+
+``xml:space``
+-------------
+The ``xml:space`` attribute
+
+
+Internal Attributes
+===================
+
+All element objects share the following _`common internal attributes`:
+rawsource_, children_, attributes_, tagname_.
+
+
+------------------------
+ DTD Parameter Entities
+------------------------
+
+``%basic.atts;``
+================
+The ``%basic.atts;`` parameter entity lists attributes common to all
+elements. See `Common Attributes`_.
+
+
+``%body.elements;``
+===================
+The ``%body.elements;`` parameter entity
+
+
+``%inline.elements;``
+====================
+The ``%inline.elements;`` parameter entity
+
+
+``%reference.atts;``
+====================
+The ``%reference.atts;`` parameter entity
+
+
+``%structure.model;``
+=====================
+The ``%structure.model;`` parameter entity
+
+
+``%text.model;``
+================
+The ``%text.model;`` parameter entity
+
+
+--------------------------------
+ Appendix: Miscellaneous Topics
+--------------------------------
+
+Representation of Horizontal Rules
+==================================
+
+Having added the "horizontal rule" construct to the reStructuredText_
+spec, a decision had to be made as to how to reflect the construct in
+the implementation of the document tree. Given this source::
+
+ Document
+ ========
+
+ Paragraph
+
+ --------
+
+ Paragraph
+
+The horizontal rule indicates a "transition" (in prose terms) or the
+start of a new "division". Before implementation, the parsed document
+tree would be::
+
+
+
+
+ Document
+
+ Paragraph
+ -------- <--- error here
+
+ Paragraph
+
+There are several possibilities for the implementation. Solution 3
+was chosen.
+
+1. Implement horizontal rules as "divisions" or segments. A
+ "division" is a title-less, non-hierarchical section. The first
+ try at an implementation looked like this::
+
+
+
+
+ Document
+
+ Paragraph
+
+
+ Paragraph
+
+ But the two paragraphs are really at the same level; they shouldn't
+ appear to be at different levels. There's really an invisible
+ "first division". The horizontal rule splits the document body
+ into two segments, which should be treated uniformly.
+
+2. Treating "divisions" uniformly brings us to the second
+ possibility::
+
+
+
+
+ Document
+
+
+ Paragraph
+
+
+ Paragraph
+
+ With this change, documents and sections will directly contain
+ divisions and sections, but not body elements. Only divisions will
+ directly contain body elements. Even without a horizontal rule
+ anywhere, the body elements of a document or section would be
+ contained within a division element. This makes the document tree
+ deeper. This is similar to the way HTML treats document contents:
+ grouped within a element.
+
+3. Implement them as "transitions", empty elements::
+
+
+
+
+ Document
+
+ Paragraph
+
+
+ Paragraph
+
+ A transition would be a "point element", not containing anything,
+ only identifying a point within the document structure. This keeps
+ the document tree flatter, but the idea of a "point element" like
+ "transition" smells bad. A transition isn't a thing itself, it's
+ the space between two divisions.
+
+ This solution has been chosen for incorporation into the document
+ tree.
+
+
+.. _Docutils Generic DTD:
+.. _docutils.dtd: http://docutils.sourceforge.net/spec/docutils.dtd
+.. _reStructuredText:
+ http://docutils.sourceforge.net/spec/rst/reStructuredText.html
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/ref/docutils.dtd b/docs/ref/docutils.dtd
new file mode 100644
index 000000000..d47238b4d
--- /dev/null
+++ b/docs/ref/docutils.dtd
@@ -0,0 +1,514 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+%calstblx;
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/ref/rst/directives.txt b/docs/ref/rst/directives.txt
new file mode 100644
index 000000000..cbb8b4609
--- /dev/null
+++ b/docs/ref/rst/directives.txt
@@ -0,0 +1,360 @@
+=============================
+ reStructuredText Directives
+=============================
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+
+This document describes the directives implemented in the reference
+reStructuredText parser.
+
+
+.. contents::
+
+
+-------------
+ Admonitions
+-------------
+
+DTD elements: attention, caution, danger, error, hint, important,
+note, tip, warning.
+
+Directive block: directive data and all following indented text
+are interpreted as body elements.
+
+Admonitions are specially marked "topics" that can appear anywhere an
+ordinary body element can. They contain arbitrary body elements.
+Typically, an admonition is rendered as an offset block in a document,
+sometimes outlined or shaded, with a title matching the admonition
+type. For example::
+
+ .. DANGER::
+ Beware killer rabbits!
+
+This directive might be rendered something like this::
+
+ +------------------------+
+ | !DANGER! |
+ | |
+ | Beware killer rabbits! |
+ +------------------------+
+
+The following admonition directives have been implemented:
+
+- attention
+- caution
+- danger
+- error
+- hint
+- important
+- note
+- tip
+- warning
+
+Any text immediately following the directive indicator (on the same
+line and/or indented on following lines) is interpreted as a directive
+block and is parsed for normal body elements. For example, the
+following "note" admonition directive contains one paragraph and a
+bullet list consisting of two list items::
+
+ .. note:: This is a note admonition.
+ This is the second line of the first paragraph.
+
+ - The note contains all indented body elements
+ following.
+ - It includes this bullet list.
+
+
+--------
+ Images
+--------
+
+There are two image directives: "image" and "figure".
+
+
+Image
+=====
+
+DTD element: image.
+
+Directive block: directive data and following indented lines (up to
+the first blank line) are interpreted as image URI and optional
+attributes.
+
+An "image" is a simple picture::
+
+ .. image:: picture.png
+
+The URI for the image source file is specified in the directive data.
+As with hyperlink targets, the image URI may begin on the same line as
+the explicit markup start and target name, or it may begin in an
+indented text block immediately following, with no intervening blank
+lines. If there are multiple lines in the link block, they are
+stripped of leading and trailing whitespace and joined together.
+
+Optionally, the image link block may end with a flat field list, the
+_`image attributes`. For example::
+
+ .. image:: picture.png
+ :height: 100
+ :width: 200
+ :scale: 50
+ :alt: alternate text
+
+The following attributes are recognized:
+
+``alt`` : text
+ Alternate text: a short description of the image, displayed by
+ applications that cannot display images, or spoken by applications
+ for visually impaired users.
+``height`` : integer
+ The height of the image in pixels, used to reserve space or scale
+ the image vertically.
+``width`` : integer
+ The width of the image in pixels, used to reserve space or scale
+ the image horizontally.
+``scale`` : integer
+ The uniform scaling factor of the image, a percentage (but no "%"
+ symbol is required or allowed). "100" means full-size.
+
+
+Figure
+======
+
+DTD elements: figure, image, caption, legend.
+
+Directive block: directive data and all following indented text are
+interpreted as an image URI, optional attributes, a caption, and an
+optional legend.
+
+A "figure" consists of image_ data (optionally including `image
+attributes`_), an optional caption (a single paragraph), and an
+optional legend (arbitrary body elements)::
+
+ .. figure:: picture.png
+ :scale: 50
+ :alt: map to buried treasure
+
+ This is the caption of the figure (a simple paragraph).
+
+ The legend consists of all elements after the caption. In this
+ case, the legend consists of this paragraph and the following
+ table:
+
+ +-----------------------+-----------------------+
+ | Symbol | Meaning |
+ +=======================+=======================+
+ | .. image:: tent.png | Campground |
+ +-----------------------+-----------------------+
+ | .. image:: waves.png | Lake |
+ +-----------------------+-----------------------+
+ | .. image:: peak.png | Mountain |
+ +-----------------------+-----------------------+
+
+There must be a blank line before the caption paragraph and before the
+legend. To specify a legend without a caption, use an empty comment
+("..") in place of the caption.
+
+
+---------------------
+ Document Components
+---------------------
+
+Table of Contents
+=================
+
+DTD elements: pending, topic.
+
+Directive block: directive data and following indented lines (up to
+the first blank line) are interpreted as the topic title and optional
+attributes.
+
+The "contents" directive inserts a table of contents (TOC) in two
+passes: initial parse and transform. During the initial parse, a
+"pending" element is generated which acts as a placeholder, storing
+the TOC title and any attributes internally. At a later stage in the
+processing, the "pending" element is replaced by a "topic" element, a
+title and the table of contents proper.
+
+The directive in its simplest form::
+
+ .. contents::
+
+Language-dependent boilerplate text will be used for the title. The
+English default title text is "Contents".
+
+An explicit title, may be specified::
+
+ .. contents:: Table of Contents
+
+The title may span lines, although it is not recommended::
+
+ .. contents:: Here's a very long Table of
+ Contents title
+
+Attributes may be specified for the directive, using a field list::
+
+ .. contents:: Table of Contents
+ :depth: 2
+
+If the default title is to be used, the attribute field list may begin
+on the same line as the directive marker::
+
+ .. contents:: :depth: 2
+
+The following attributes are recognized:
+
+``depth`` : integer
+ The number of section levels that are collected in the table of
+ contents.
+``local`` : empty
+ Generate a local table of contents. Entries will only include
+ subsections of the section in which the directive is given. If no
+ explicit title is given, the table of contents will not be titled.
+
+
+Footnotes
+=========
+
+DTD elements: pending, topic.
+
+@@@
+
+
+Citations
+=========
+
+DTD elements: pending, topic.
+
+@@@
+
+
+Topic
+=====
+
+DTD element: topic.
+
+@@@
+
+
+---------------
+ HTML-Specific
+---------------
+
+Meta
+====
+
+Non-standard element: meta.
+
+Directive block: directive data and following indented lines (up to
+the first blank line) are parsed for a flat field list.
+
+The "meta" directive is used to specify HTML metadata stored in HTML
+META tags. "Metadata" is data about data, in this case data about web
+pages. Metadata is used to describe and classify web pages in the
+World Wide Web, in a form that is easy for search engines to extract
+and collate.
+
+Within the directive block, a flat field list provides the syntax for
+metadata. The field name becomes the contents of the "name" attribute
+of the META tag, and the field body (interpreted as a single string
+without inline markup) becomes the contents of the "content"
+attribute. For example::
+
+ .. meta::
+ :description: The reStructuredText plaintext markup language
+ :keywords: plaintext, markup language
+
+This would be converted to the following HTML::
+
+
+
+
+Support for other META attributes ("http-equiv", "scheme", "lang",
+"dir") are provided through field arguments, which must be of the form
+"attr=value"::
+
+ .. meta::
+ :description lang=en: An amusing story
+ :description lang=fr: Un histoire amusant
+
+And their HTML equivalents::
+
+
+
+
+Some META tags use an "http-equiv" attribute instead of the "name"
+attribute. To specify "http-equiv" META tags, simply omit the name::
+
+ .. meta::
+ :http-equiv=Content-Type: text/html; charset=ISO-8859-1
+
+HTML equivalent::
+
+
+
+
+Imagemap
+========
+
+Non-standard element: imagemap.
+
+
+---------------
+ Miscellaneous
+---------------
+
+Raw Data Pass-Through
+=====================
+
+DTD element: pending.
+
+Directive block: the directive data is interpreted as an output format
+type, and all following indented text is stored verbatim,
+uninterpreted.
+
+The "raw" directive indicates non-reStructuredText data that is to be
+passed untouched to the Writer. The name of the output format is
+given in the directive data. During the initial parse, a "pending"
+element is generated which acts as a placeholder, storing the format
+and raw data internally. The interpretation of the code is up to the
+Writer. A Writer may ignore any raw output not matching its format.
+
+For example, the following input would be passed untouched by an HTML
+Writer::
+
+ .. raw:: html
+
+
+A LaTeX Writer could insert the following raw content into its
+output stream::
+
+ .. raw:: latex
+ \documentclass[twocolumn]{article}
+
+
+Restructuredtext-Test-Directive
+===============================
+
+DTD element: system_warning.
+
+Directive block: directive data is stored, and all following indented
+text is interpreted as a literal block.
+
+This directive is provided for test purposes only. (Nobody is
+expected to type in a name *that* long!) It is converted into a
+level-1 (info) system message showing the directive data, possibly
+followed by a literal block containing the rest of the directive
+block.
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/ref/rst/introduction.txt b/docs/ref/rst/introduction.txt
new file mode 100644
index 000000000..3d7cfc5f8
--- /dev/null
+++ b/docs/ref/rst/introduction.txt
@@ -0,0 +1,307 @@
+=====================================
+ An Introduction to reStructuredText
+=====================================
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+
+reStructuredText_ is an easy-to-read, what-you-see-is-what-you-get
+plaintext markup syntax and parser system. It is useful for in-line
+program documentation (such as Python docstrings), for quickly
+creating simple web pages, and for standalone documents.
+reStructuredText_ is a proposed revision and reinterpretation of the
+StructuredText_ and Setext_ lightweight markup systems.
+
+reStructuredText is designed for extensibility for specific
+application domains. Its parser is a component of Docutils_.
+
+This document defines the goals_ of reStructuredText and provides a
+history_ of the project. It is written using the reStructuredText
+markup, and therefore serves as an example of its use. Please also
+see an analysis of the `problems with StructuredText`_ and the
+`reStructuredText markup specification`_ itself at project's web page,
+http://docutils.sourceforge.net/rst.html.
+
+.. _reStructuredText: http://docutils.sourceforge.net/rst.html
+.. _StructuredText:
+ http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage
+.. _Setext: http://docutils.sourceforge.net/mirror/setext.html
+.. _Docutils: http://docutils.sourceforge.net/
+.. _Problems with StructuredText: problems.html
+.. _reStructuredText Markup Specification: reStructuredText.html
+
+
+Goals
+=====
+
+The primary goal of reStructuredText_ is to define a markup syntax for
+use in Python docstrings and other documentation domains, that is
+readable and simple, yet powerful enough for non-trivial use. The
+intended purpose of the reStructuredText markup is twofold:
+
+- the establishment of a set of standard conventions allowing the
+ expression of structure within plaintext, and
+
+- the conversion of such documents into useful structured data
+ formats.
+
+The secondary goal of reStructuredText is to be accepted by the Python
+community (by way of being blessed by PythonLabs and the BDFL [#]_) as
+a standard for Python inline documentation (possibly one of several
+standards, to account for taste).
+
+.. [#] Python's creator and "Benevolent Dictator For Life",
+ Guido van Rossum.
+
+To clarify the primary goal, here are specific design goals, in order,
+beginning with the most important:
+
+1. Readable. The marked-up text must be easy to read without any
+ prior knowledge of the markup language. It should be as easily
+ read in raw form as in processed form.
+
+2. Unobtrusive. The markup that is used should be as simple and
+ unobtrusive as possible. The simplicity of markup constructs
+ should be roughly proporional to their frequency of use. The most
+ common constructs, with natural and obvious markup, should be the
+ simplest and most unobtrusive. Less common contstructs, for which
+ there is no natural or obvious markup, should be distinctive.
+
+3. Unambiguous. The rules for markup must not be open for
+ interpretation. For any given input, there should be one and only
+ one possible output (including error output).
+
+4. Unsurprising. Markup constructs should not cause unexpected output
+ upon processing. As a fallback, there must be a way to prevent
+ unwanted markup processing when a markup construct is used in a
+ non-markup context (for example, when documenting the markup syntax
+ itself).
+
+5. Intuitive. Markup should be as obvious and easily remembered as
+ possible, for the author as well as for the reader. Constructs
+ should take their cues from such naturally occurring sources as
+ plaintext email messages, newsgroup postings, and text
+ documentation such as README.txt files.
+
+6. Easy. It should be easy to mark up text using any ordinary text
+ editor.
+
+7. Scalable. The markup should be applicable regardless of the length
+ of the text.
+
+8. Powerful. The markup should provide enough constructs to produce a
+ reasonably rich structured document.
+
+9. Language-neutral. The markup should apply to multiple natural (as
+ well as artificial) languages, not only English.
+
+10. Extensible. The markup should provide a simple syntax and
+ interface for adding more complex general markup, and custom
+ markup.
+
+11. Output-format-neutral. The markup will be appropriate for
+ processing to multiple output formats, and will not be biased
+ toward any particular format.
+
+The design goals above were used as criteria for accepting or
+rejecting syntax, or selecting between alternatives.
+
+It is emphatically *not* the goal of reStructuredText to define
+docstring semantics, such as docstring contents or docstring length.
+These issues are orthogonal to the markup syntax and beyond the scope
+of this specification.
+
+Also, it is not the goal of reStructuredText to maintain compatibility
+with StructuredText_ or Setext_. reStructuredText shamelessly steals
+their great ideas and ignores the not-so-great.
+
+Author's note:
+
+ Due to the nature of the problem we're trying to solve (or,
+ perhaps, due to the nature of the proposed solution), the above
+ goals unavoidably conflict. I have tried to extract and distill
+ the wisdom accumulated over the years in the Python Doc-SIG_
+ mailing list and elsewhere, to come up with a coherent and
+ consistent set of syntax rules, and the above goals by which to
+ measure them.
+
+ There will inevitably be people who disagree with my particular
+ choices. Some desire finer control over their markup, others
+ prefer less. Some are concerned with very short docstrings,
+ others with full-length documents. This specification is an
+ effort to provide a reasonably rich set of markup constructs in a
+ reasonably simple form, that should satisfy a reasonably large
+ group of reasonable people.
+
+ David Goodger (goodger@users.sourceforge.net), 2001-04-20
+
+.. _Doc-SIG: http://www.python.org/sigs/doc-sig/
+
+
+History
+=======
+
+reStructuredText_, the specification, is based on StructuredText_ and
+Setext_. StructuredText was developed by Jim Fulton of `Zope
+Corporation`_ (formerly Digital Creations) and first released in 1996.
+It is now released as a part of the open-source 'Z Object Publishing
+Environment' (ZOPE_). Ian Feldman's and Tony Sanders' earlier Setext_
+specification was either an influence on StructuredText or, by their
+similarities, at least evidence of the correctness of this approach.
+
+I discovered StructuredText_ in late 1999 while searching for a way to
+document the Python modules in one of my projects. Version 1.1 of
+StructuredText was included in Daniel Larsson's pythondoc_. Although
+I was not able to get pythondoc to work for me, I found StructuredText
+to be almost ideal for my needs. I joined the Python Doc-SIG_
+(Documentation Special Interest Group) mailing list and found an
+ongoing discussion of the shortcomings of the StructuredText
+'standard'. This discussion has been going on since the inception of
+the mailing list in 1996, and possibly predates it.
+
+I decided to modify the original module with my own extensions and
+some suggested by the Doc-SIG members. I soon realized that the
+module was not written with extension in mind, so I embarked upon a
+general reworking, including adapting it to the 're' regular
+expression module (the original inspiration for the name of this
+project). Soon after I completed the modifications, I discovered that
+StructuredText.py was up to version 1.23 in the ZOPE distribution.
+Implementing the new syntax extensions from version 1.23 proved to be
+an exercise in frustration, as the complexity of the module had become
+overwhelming.
+
+In 2000, development on StructuredTextNG_ ("Next Generation") began at
+`Zope Corporation`_ (then Digital Creations). It seems to have many
+improvements, but still suffers from many of the problems of classic
+StructuredText.
+
+I decided that a complete rewrite was in order, and even started a
+`reStructuredText SourceForge project`_ (now inactive). My
+motivations (the 'itches' I aim to 'scratch') are as follows:
+
+- I need a standard format for inline documentation of the programs I
+ write. This inline documentation has to be convertible to other
+ useful formats, such as HTML. I believe many others have the same
+ need.
+
+- I believe in the Setext/StructuredText idea and want to help
+ formalize the standard. However, I feel the current specifications
+ and implementations have flaws that desperately need fixing.
+
+- reStructuredText could form part of the foundation for a
+ documentation extraction and processing system, greatly benefitting
+ Python. But it is only a part, not the whole. reStructuredText is
+ a markup language specification and a reference parser
+ implementation, but it does not aspire to be the entire system. I
+ don't want reStructuredText or a hypothetical Python documentation
+ processor to die stillborn because of overambition.
+
+- Most of all, I want to help ease the documentation chore, the bane
+ of many a programmer.
+
+Unfortunately I was sidetracked and stopped working on this project.
+In November 2000 I made the time to enumerate the problems of
+StructuredText and possible solutions, and complete the first draft of
+a specification. This first draft was posted to the Doc-SIG in three
+parts:
+
+- `A Plan for Structured Text`__
+- `Problems With StructuredText`__
+- `reStructuredText: Revised Structured Text Specification`__
+
+__ http://mail.python.org/pipermail/doc-sig/2000-November/001239.html
+__ http://mail.python.org/pipermail/doc-sig/2000-November/001240.html
+__ http://mail.python.org/pipermail/doc-sig/2000-November/001241.html
+
+In March 2001 a flurry of activity on the Doc-SIG spurred me to
+further revise and refine my specification, the result of which you
+are now reading. An offshoot of the reStructuredText project has been
+the realization that a single markup scheme, no matter how well
+thought out, may not be enough. In order to tame the endless debates
+on Doc-SIG, a flexible `Docstring Processing System framework`_ needed
+to be constructed. This framework has become the more important of
+the two projects; reStructuredText_ has found its place as one
+possible choice for a single component of the larger framework.
+
+The project web site and the first project release were rolled out in
+June 2001, including posting the second draft of the spec [#spec-2]_
+and the first draft of PEPs 256, 257, and 258 [#peps-1]_ to the
+Doc-SIG. These documents and the project implementation proceeded to
+evolve at a rapid pace. Implementation history details can be found
+in the project file, HISTORY.txt_.
+
+In November 2001, the reStructuredText parser was nearing completion.
+Development of the parser continued with the addition of small
+convenience features, improvements to the syntax, the filling in of
+gaps, and bug fixes. After a long holiday break, in early 2002 most
+development moved over to the other Docutils components, the
+"Readers", "Writers", and "Transforms". A "standalone" reader
+(processes standalone text file documents) was completed in February,
+and a basic HTML writer (producing HTML 4.01, using CSS-1) was
+completed in early March.
+
+`PEP 287`_, "reStructuredText Standard Docstring Format", was created
+to formally propose reStructuredText as a standard format for Python
+docstrings, PEPs, and other files. It was first posted to
+comp.lang.python_ and the Python-dev_ mailing list on 2002-04-02.
+
+Version 0.4 of the reStructuredText__ and `Docstring Processing
+System`_ projects were released in April 2002. The two projects were
+immediately merged, renamed to "Docutils_", and a 0.1 release soon
+followed.
+
+.. __: `reStructuredText SourceForge project`_
+
+.. [#spec-2]
+ - `An Introduction to reStructuredText`__
+ - `Problems With StructuredText`__
+ - `reStructuredText Markup Specification`__
+ - `Python Extensions to the reStructuredText Markup
+ Specification`__
+
+ __ http://mail.python.org/pipermail/doc-sig/2001-June/001858.html
+ __ http://mail.python.org/pipermail/doc-sig/2001-June/001859.html
+ __ http://mail.python.org/pipermail/doc-sig/2001-June/001860.html
+ __ http://mail.python.org/pipermail/doc-sig/2001-June/001861.html
+
+.. [#peps-1]
+ - `PEP 256: Docstring Processing System Framework`__
+ - `PEP 258: DPS Generic Implementation Details`__
+ - `PEP 257: Docstring Conventions`__
+
+ Current working versions of the PEPs can be found in
+ http://docutils.sourceforge.net/spec/, and official versions can be
+ found in the `master PEP repository`_.
+
+ __ http://mail.python.org/pipermail/doc-sig/2001-June/001855.html
+ __ http://mail.python.org/pipermail/doc-sig/2001-June/001856.html
+ __ http://mail.python.org/pipermail/doc-sig/2001-June/001857.html
+
+
+.. _Zope Corporation: http://www.zope.com
+.. _ZOPE: http://www.zope.org
+.. _reStructuredText SourceForge project:
+ http://structuredtext.sourceforge.net/
+.. _pythondoc: http://starship.python.net/crew/danilo/pythondoc/
+.. _StructuredTextNG:
+ http://dev.zope.org/Members/jim/StructuredTextWiki/StructuredTextNG
+.. _HISTORY.txt:
+ http://docutils.sourceforge.net/HISTORY.txt
+.. _PEP 287: http://docutils.sourceforge.net/spec/pep-0287.txt
+.. _Docstring Processing System framework:
+ http://docutils.sourceforge.net/spec/pep-0256.txt
+.. _comp.lang.python: news:comp.lang.python
+.. _Python-dev: http://mail.python.org/pipermail/python-dev/
+.. _Docstring Processing System: http://docstring.sourceforge.net/
+.. _Docutils: http://docutils.sourceforge.net/
+.. _master PEP repository: http://www.python.org/peps/
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/ref/rst/restructuredtext.txt b/docs/ref/rst/restructuredtext.txt
new file mode 100644
index 000000000..149ef3fd4
--- /dev/null
+++ b/docs/ref/rst/restructuredtext.txt
@@ -0,0 +1,2344 @@
+=======================================
+ reStructuredText Markup Specification
+=======================================
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+
+reStructuredText_ is plain text that uses simple and intuitive
+constructs to indicate the structure of a document. These constructs
+are equally easy to read in raw and processed forms. This document is
+itself an example of reStructuredText (raw, if you are reading the
+text file, or processed, if you are reading an HTML document, for
+example). The reStructuredText parser is a component of Docutils_.
+
+Simple, implicit markup is used to indicate special constructs, such
+as section headings, bullet lists, and emphasis. The markup used is
+as minimal and unobtrusive as possible. Less often-used constructs
+and extensions to the basic reStructuredText syntax may have more
+elaborate or explicit markup.
+
+reStructuredText is applicable to documents of any length, from the
+very small (such as inline program documentation fragments, e.g.
+Python docstrings) to the quite large (this document).
+
+The first section gives a quick overview of the syntax of the
+reStructuredText markup by example. A complete specification is given
+in the `Syntax Details`_ section.
+
+`Literal blocks`_ (in which no markup processing is done) are used for
+examples throughout this document, to illustrate the plain text
+markup.
+
+
+.. contents::
+
+
+-----------------------
+ Quick Syntax Overview
+-----------------------
+
+A reStructuredText document is made up of body or block-level
+elements, and may be structured into sections. Sections_ are
+indicated through title style (underlines & optional overlines).
+Sections contain body elements and/or subsections. Some body elements
+contain further elements, such as lists containing list items, which
+in turn may contain paragraphs and other body elemens. Others, such
+as paragraphs, contain text and `inline markup`_ elements.
+
+Here are examples of `body elements`_:
+
+- Paragraphs_ (and `inline markup`_)::
+
+ Paragraphs contain text and may contain inline markup:
+ *emphasis*, **strong emphasis**, `interpreted text`, ``inline
+ literals``, standalone hyperlinks (http://www.python.org),
+ external hyperlinks (Python_), internal cross-references
+ (example_), footnote references ([1]_), citation references
+ ([CIT2002]_), substitution references (|example|), and _`inline
+ hyperlink targets`.
+
+ Paragraphs are separated by blank lines and are left-aligned.
+
+- Five types of lists:
+
+ 1. `Bullet lists`_::
+
+ - This is a bullet list.
+
+ - Bullets can be "-", "*", or "+".
+
+ 2. `Enumerated lists`_::
+
+ 1. This is an enumerated list.
+
+ 2. Enumerators may be arabic numbers, letters, or roman
+ numerals.
+
+ 3. `Definition lists`_::
+
+ what
+ Definition lists associate a term with a definition.
+
+ how
+ The term is a one-line phrase, and the definition is one
+ or more paragraphs or body elements, indented relative to
+ the term.
+
+ 4. `Field lists`_::
+
+ :what: Field lists map field names to field bodies, like
+ database records. They are often part of an extension
+ syntax.
+
+ :how: The field marker is a colon, the field name, optional
+ field arguments, and a colon.
+
+ The field body may contain one or more body elements,
+ indented relative to the field marker.
+
+ 5. `Option lists`_, for listing command-line options::
+
+ -a command-line option "a"
+ -b file options can have arguments
+ and long descriptions
+ --long options can be long also
+ --input=file long options can also have
+ arguments
+ /V DOS/VMS-style options too
+
+ There must be at least two spaces between the option and the
+ description.
+
+- `Literal blocks`_::
+
+ Literal blocks are indented, and indicated with a double-colon
+ ("::") at the end of the preceding paragraph (right here -->)::
+
+ if literal_block:
+ text = 'is left as-is'
+ spaces_and_linebreaks = 'are preserved'
+ markup_processing = None
+
+- `Block quotes`_::
+
+ Block quotes consist of indented body elements:
+
+ This theory, that is mine, is mine.
+
+ Anne Elk (Miss)
+
+- `Doctest blocks`_::
+
+ >>> print 'Python-specific usage examples; begun with ">>>"'
+ Python-specific usage examples; begun with ">>>"
+ >>> print '(cut and pasted from interactive Python sessions)'
+ (cut and pasted from interactive Python sessions)
+
+- Tables_::
+
+ +------------------------+------------+----------+
+ | Header row, column 1 | Header 2 | Header 3 |
+ +========================+============+==========+
+ | body row 1, column 1 | column 2 | column 3 |
+ +------------------------+------------+----------+
+ | body row 2 | Cells may span |
+ +------------------------+-----------------------+
+
+- `Explicit markup blocks`_ all begin with an explicit block marker,
+ two periods and a space:
+
+ - Footnotes_::
+
+ .. [1] A footnote contains body elements, consistently
+ indented by at least 3 spaces.
+
+ - Citations_::
+
+ .. [CIT2002] Just like a footnote, except the label is
+ textual.
+
+ - `Hyperlink targets`_::
+
+ .. _Python: http://www.python.org
+
+ .. _example:
+
+ The "_example" target above points to this paragraph.
+
+ - Directives_::
+
+ .. image:: mylogo.png
+
+ - `Substitution definitions`_::
+
+ .. |symbol here| image:: symbol.png
+
+ - Comments_::
+
+ .. Comments begin with two dots and a space. Anything may
+ follow, except for the syntax of footnotes/citations,
+ hyperlink targets, directives, or substitution definitions.
+
+
+----------------
+ Syntax Details
+----------------
+
+Descriptions below list "DTD elements" (XML "generic identifiers")
+corresponding to syntax constructs. For details on the hierarchy of
+elements, please see `Docutils Document Tree Structure`_ and the
+`Generic Plaintext Document Interface DTD`_ XML document type
+definition.
+
+
+Whitespace
+==========
+
+Spaces are recommended for indentation_, but tabs may also be used.
+Tabs will be converted to spaces. Tab stops are at every 8th column.
+
+Other whitespace characters (form feeds [chr(12)] and vertical tabs
+[chr(11)]) are converted to single spaces before processing.
+
+
+Blank Lines
+-----------
+
+Blank lines are used to separate paragraphs and other elements.
+Multiple successive blank lines are equivalent to a single blank line,
+except within literal blocks (where all whitespace is preserved).
+Blank lines may be omitted when the markup makes element separation
+unambiguous, in conjunction with indentation. The first line of a
+document is treated as if it is preceded by a blank line, and the last
+line of a document is treated as if it is followed by a blank line.
+
+
+Indentation
+-----------
+
+Indentation is used to indicate, and is only significant in
+indicating:
+
+- multi-line contents of list items,
+- multiple body elements within a list item (including nested lists),
+- the definition part of a definition list item,
+- block quotes,
+- the extent of literal blocks, and
+- the extent of explicit markup blocks.
+
+Any text whose indentation is less than that of the current level
+(i.e., unindented text or "dedents") ends the current level of
+indentation.
+
+Since all indentation is significant, the level of indentation must be
+consistent. For example, indentation is the sole markup indicator for
+`block quotes`_::
+
+ This is a top-level paragraph.
+
+ This paragraph belongs to a first-level block quote.
+
+ Paragraph 2 of the first-level block quote.
+
+Multiple levels of indentation within a block quote will result in
+more complex structures::
+
+ This is a top-level paragraph.
+
+ This paragraph belongs to a first-level block quote.
+
+ This paragraph belongs to a second-level block quote.
+
+ Another top-level paragraph.
+
+ This paragraph belongs to a second-level block quote.
+
+ This paragraph belongs to a first-level block quote. The
+ second-level block quote above is inside this first-level
+ block quote.
+
+When a paragraph or other construct consists of more than one line of
+text, the lines must be left-aligned::
+
+ This is a paragraph. The lines of
+ this paragraph are aligned at the left.
+
+ This paragraph has problems. The
+ lines are not left-aligned. In addition
+ to potential misinterpretation, warning
+ and/or error messages will be generated
+ by the parser.
+
+Several constructs begin with a marker, and the body of the construct
+must be indented relative to the marker. For constructs using simple
+markers (`bullet lists`_, `enumerated lists`_, footnotes_, citations_,
+`hyperlink targets`_, directives_, and comments_), the level of
+indentation of the body is determined by the position of the first
+line of text, which begins on the same line as the marker. For
+example, bullet list bodies must be indented by at least two columns
+relative to the left edge of the bullet::
+
+ - This is the first line of a bullet list
+ item's paragraph. All lines must align
+ relative to the first line. [1]_
+
+ This indented paragraph is interpreted
+ as a block quote.
+
+ Because it is not sufficiently indented,
+ this paragraph does not belong to the list
+ item.
+
+ .. [1] Here's a footnote. The second line is aligned
+ with the beginning of the footnote label. The ".."
+ marker is what determines the indentation.
+
+For constructs using complex markers (`field lists`_ and `option
+lists`_), where the marker may contain arbitrary text, the indentation
+of the first line *after* the marker determines the left edge of the
+body. For example, field lists may have very long markers (containing
+the field names)::
+
+ :Hello: This field has a short field name, so aligning the field
+ body with the first line is feasible.
+
+ :Number-of-African-swallows-requried-to-carry-a-coconut: It would
+ be very difficult to align the field body with the left edge
+ of the first line. It may even be preferable not to begin the
+ body on the same line as the marker.
+
+
+Escaping Mechanism
+==================
+
+The character set universally available to plain text documents, 7-bit
+ASCII, is limited. No matter what characters are used for markup,
+they will already have multiple meanings in written text. Therefore
+markup characters *will* sometimes appear in text **without being
+intended as markup**. Any serious markup system requires an escaping
+mechanism to override the default meaning of the characters used for
+the markup. In reStructuredText we use the backslash, commonly used
+as an escaping character in other domains.
+
+A backslash followed by any character escapes that character. The
+escaped character represents the character itself, and is prevented
+from playing a role in any markup interpretation. The backslash is
+removed from the output. A literal backslash is represented by two
+backslashes in a row (the first backslash "escapes" the second,
+preventing it being interpreted in an "escaping" role).
+
+There are two contexts in which backslashes have no special meaning:
+literal blocks and inline literals. In these contexts, a single
+backslash represents a literal backslash, without having to double up.
+
+Please note that the reStructuredText specification and parser do not
+address the issue of the representation or extraction of text input
+(how and in what form the text actually *reaches* the parser).
+Backslashes and other characters may serve a character-escaping
+purpose in certain contexts and must be dealt with appropriately. For
+example, Python uses backslashes in strings to escape certain
+characters, but not others. The simplest solution when backslashes
+appear in Python docstrings is to use raw docstrings::
+
+ r"""This is a raw docstring. Backslashes (\) are not touched."""
+
+
+Reference Names
+===============
+
+Simple reference names are single words consisting of alphanumerics
+plus internal hypens, underscores, and periods; no whitespace or other
+characters are allowed. Footnote labels (Footnotes_ & `Footnote
+References`_), citation labels (Citations_ & `Citation References`_),
+`interpreted text`_ roles, and some `hyperlink references`_ use the
+simple reference name syntax.
+
+Reference names using punctuation or whose names are phrases (two or
+more space-separated words) are called "phrase-references".
+Phrase-references are expressed by enclosing the phrase in backquotes
+and treating the backquoted text as a reference name::
+
+ Want to learn about `my favorite programming language`_?
+
+ .. _my favorite programming language: http://www.python.org
+
+Simple reference names may also optionally use backquotes.
+
+Reference names are whitespace-neutral and case-insensitive. When
+resolving reference names internally:
+
+- whitespace is normalized (one or more spaces, horizontal or vertical
+ tabs, newlines, carriage returns, or form feeds, are interpreted as
+ a single space), and
+
+- case is normalized (all alphabetic characters are converted to
+ lowercase).
+
+For example, the following `hyperlink references`_ are equivalent::
+
+ - `A HYPERLINK`_
+ - `a hyperlink`_
+ - `A
+ Hyperlink`_
+
+Hyperlinks_, footnotes_, and citations_ all share the same namespace
+for reference names. The labels of citations (simple reference names)
+and manually-numbered footnotes (numbers) are entered into the same
+database as other hyperlink names. This means that a footnote
+(defined as "``.. [1]``") which can be referred to by a footnote
+reference (``[1]_``), can also be referred to by a plain hyperlink
+reference (1_). Of course, each type of reference (hyperlink,
+footnote, citation) may be processed and rendered differently. Some
+care should be taken to avoid reference name conflicts.
+
+
+Document Structure
+==================
+
+Document
+--------
+
+DTD element: document.
+
+The top-level element of a parsed reStructuredText document is the
+"document" element. After initial parsing, the document element is a
+simple container for a document fragment, consisting of `body
+elements`_, transitions_, and sections_, but lacking a document title
+or other bibliographic elements. The code that calls the parser may
+choose to run one or more optional post-parse transforms_,
+rearranging the document fragment into a complete document with a
+title and possibly other metadata elements (author, date, etc.; see
+`Bibliographic Fields`_).
+
+Specifically, there is no way to specify a document title and subtitle
+explicitly in reStructuredText. Instead, a lone top-level section
+title (see Sections_ below) can be treated as the document
+title. Similarly, a lone second-level section title immediately after
+the "document title" can become the document subtitle. See the
+`DocTitle transform`_ for details.
+
+
+Sections
+--------
+
+DTD elements: section, title.
+
+Sections are identified through their titles, which are marked up with
+adornment: "underlines" below the title text, and, in some cases,
+matching "overlines" above the title. An underline/overline is a
+single repeated punctuation character that begins in column 1 and
+forms a line extending at least as far as the right edge of the title
+text. Specifically, an underline/overline character may be any
+non-alphanumeric printable 7-bit ASCII character [#]_. An
+underline/overline must be at least 4 characters long (to avoid
+mistaking ellipses ["..."] for overlines). When an overline is used,
+the length and character used must match the underline. There may be
+any number of levels of section titles, although some output formats
+may have limits (HTML has 6 levels).
+
+.. [#] The following are all valid section title adornment
+ characters::
+
+ ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
+
+ Some characters are more suitable than others. The following are
+ recommended::
+
+ = - ` : ' " ~ ^ _ * + # < >
+
+Rather than imposing a fixed number and order of section title
+adornment styles, the order enforced will be the order as encountered.
+The first style encountered will be an outermost title (like HTML H1),
+the second style will be a subtitle, the third will be a subsubtitle,
+and so on.
+
+Below are examples of section title styles::
+
+ ===============
+ Section Title
+ ===============
+
+ ---------------
+ Section Title
+ ---------------
+
+ Section Title
+ =============
+
+ Section Title
+ -------------
+
+ Section Title
+ `````````````
+
+ Section Title
+ '''''''''''''
+
+ Section Title
+ .............
+
+ Section Title
+ ~~~~~~~~~~~~~
+
+ Section Title
+ *************
+
+ Section Title
+ +++++++++++++
+
+ Section Title
+ ^^^^^^^^^^^^^
+
+When a title has both an underline and an overline, the title text may
+be inset, as in the first two examples above. This is merely
+aesthetic and not significant. Underline-only title text may *not* be
+inset.
+
+A blank line after a title is optional. All text blocks up to the
+next title of the same or higher level are included in a section (or
+subsection, etc.).
+
+All section title styles need not be used, nor need any specific
+section title style be used. However, a document must be consistent
+in its use of section titles: once a hierarchy of title styles is
+established, sections must use that hierarchy.
+
+Each section title automatically generates a hyperlink target pointing
+to the section. The text of the hyperlink target (the "reference
+name") is the same as that of the section title. See `Implicit
+Hyperlink Targets`_ for a complete description.
+
+Sections may contain `body elements`_, transitions_, and nested
+sections.
+
+
+Transitions
+-----------
+
+DTD element: transition.
+
+ Instead of subheads, extra space or a type ornament between
+ paragraphs may be used to mark text divisions or to signal
+ changes in subject or emphasis.
+
+ (The Chicago Manual of Style, 14th edition, section 1.80)
+
+Transitions are commonly seen in novels and short fiction, as a gap
+spanning one or more lines, with or without a type ornament such as a
+row of asterisks. Transitions separate other body elements. A
+transition should not begin or end a section or document, nor should
+two transitions be immediately adjacent.
+
+The syntax for a transition marker is a horizontal line of 4 or more
+repeated punctuation characters. The syntax is the same as section
+title underlines without title text. Transition markers require blank
+lines before and after::
+
+ Para.
+
+ ----------
+
+ Para.
+
+Unlike section title underlines, no hierarchy of transition markers is
+enforced, nor do differences in transition markers accomplish
+anything. It is recommended that a single consistent style be used.
+
+The processing system is free to render transitions in output in any
+way it likes. For example, horizontal rules (````) in HTML output
+would be an obvious choice.
+
+
+Body Elements
+=============
+
+Paragraphs
+----------
+
+DTD element: paragraph.
+
+Paragraphs consist of blocks of left-aligned text with no markup
+indicating any other body element. Blank lines separate paragraphs
+from each other and from other body elements. Paragraphs may contain
+`inline markup`_.
+
+Syntax diagram::
+
+ +------------------------------+
+ | paragraph |
+ | |
+ +------------------------------+
+
+ +------------------------------+
+ | paragraph |
+ | |
+ +------------------------------+
+
+
+Bullet Lists
+------------
+
+DTD elements: bullet_list, list_item.
+
+A text block which begins with a "-", "*", or "+", followed by
+whitespace, is a bullet list item (a.k.a. "unordered" list item).
+List item bodies must be left-aligned and indented relative to the
+bullet; the text immediately after the bullet determines the
+indentation. For example::
+
+ - This is the first bullet list item. The blank line above the
+ first list item is required; blank lines between list items
+ (such as below this paragraph) are optional.
+
+ - This is the first paragraph in the second item in the list.
+
+ This is the second paragraph in the second item in the list.
+ The blank line above this paragraph is required. The left edge
+ of this paragraph lines up with the paragraph above, both
+ indented relative to the bullet.
+
+ - This is a sublist. The bullet lines up with the left edge of
+ the text blocks above. A sublist is a new list so requires a
+ blank line above and below.
+
+ - This is the third item of the main list.
+
+ This paragraph is not part of the list.
+
+Here are examples of **incorrectly** formatted bullet lists::
+
+ - This first line is fine.
+ A blank line is required between list items and paragraphs.
+ (Warning)
+
+ - The following line appears to be a new sublist, but it is not:
+ - This is a paragraph contination, not a sublist (since there's
+ no blank line). This line is also incorrectly indented.
+ - Warnings may be issued by the implementation.
+
+Syntax diagram::
+
+ +------+-----------------------+
+ | "- " | list item |
+ +------| (body elements)+ |
+ +-----------------------+
+
+
+Enumerated Lists
+----------------
+
+DTD elements: enumerated_list, list_item.
+
+Enumerated lists (a.k.a. "ordered" lists) are similar to bullet lists,
+but use enumerators instead of bullets. An enumerator consists of an
+enumeration sequence member and formatting, followed by whitespace.
+The following enumeration sequences are recognized:
+
+- arabic numerals: 1, 2, 3, ... (no upper limit).
+- uppercase alphabet characters: A, B, C, ..., Z.
+- lower-case alphabet characters: a, b, c, ..., z.
+- uppercase Roman numerals: I, II, III, IV, ..., MMMMCMXCIX (4999).
+- lowercase Roman numerals: i, ii, iii, iv, ..., mmmmcmxcix (4999).
+
+The following formatting types are recognized:
+
+- suffixed with a period: "1.", "A.", "a.", "I.", "i.".
+- surrounded by parentheses: "(1)", "(A)", "(a)", "(I)", "(i)".
+- suffixed with a right-parenthesis: "1)", "A)", "a)", "I)", "i)".
+
+A system message will be generated for each of the following cases:
+
+- The enumerators do not all have the same format and sequence type.
+
+- The enumerators are not in sequence (i.e., "1.", "3." generates a
+ level-1 [info] system message and produces two separate lists).
+
+It is recommended that the enumerator of the first list item be
+ordinal-1 ("1", "A", "a", "I", or "i"). Although other start-values
+will be recognized, they may not be supported by the output format.
+
+Lists using Roman numerals must begin with "I"/"i" or a
+multi-character value, such as "II" or "XV". Any other
+single-character Roman numeral ("V", "X", etc.) will be interpreted as
+a letter of the alphabet, not as a Roman numeral. Likewise, lists
+using letters of the alphabet may not begin with "I"/"i", since these
+are recognized as Roman numeral 1.
+
+Nested enumerated lists must be created with indentation. For
+example::
+
+ 1. Item 1.
+
+ a) Item 1a.
+ b) Item 1b.
+
+Example syntax diagram::
+
+ +-------+----------------------+
+ | "1. " | list item |
+ +-------| (body elements)+ |
+ +----------------------+
+
+
+Definition Lists
+----------------
+
+DTD elements: definition_list, definition_list_item, term, classifier,
+definition.
+
+Each definition list item contains a term, an optional classifier, and
+a definition. A term is a simple one-line word or phrase. An
+optional classifier may follow the term on the same line, after " : "
+(space, colon, space). A definition is a block indented relative to
+the term, and may contain multiple paragraphs and other body elements.
+There may be no blank line between a term and a definition (this
+distinguishes definition lists from `block quotes`_). Blank lines are
+required before the first and after the last definition list item, but
+are optional in-between. For example::
+
+ term 1
+ Definition 1.
+
+ term 2
+ Definition 2, paragraph 1.
+
+ Definition 2, paragraph 2.
+
+ term 3 : classifier
+ Definition 3.
+
+A definition list may be used in various ways, including:
+
+- As a dictionary or glossary. The term is the word itself, a
+ classifier may be used to indicate the usage of the term (noun,
+ verb, etc.), and the definition follows.
+
+- To describe program variables. The term is the variable name, a
+ classifier may be used to indicate the type of the variable (string,
+ integer, etc.), and the definition describes the variable's use in
+ the program. This usage of definition lists supports the classifier
+ syntax of Grouch_, a system for describing and enforcing a Python
+ object schema.
+
+Syntax diagram::
+
+ +---------------------------+
+ | term [ " : " classifier ] |
+ +--+------------------------+--+
+ | definition |
+ | (body elements)+ |
+ +---------------------------+
+
+
+Field Lists
+-----------
+
+DTD elements: field_list, field, field_name, field_argument,
+field_body.
+
+Field lists are mappings from field names to field bodies, modeled on
+RFC822_ headers. A field name is made up of one or more letters,
+numbers, and punctuation, except colons (":") and whitespace. Field
+names are case-insensitive. There may be additional data separated
+from the field name, called field arguments. The field name and
+optional field argument(s), along with a single colon prefix and
+suffix, together form the field marker. The field marker is followed
+by whitespace and the field body. The field body may contain multiple
+body elements, indented relative to the field marker. The first line
+after the field name marker determines the indentation of the field
+body. For example::
+
+ :Date: 2001-08-16
+ :Version: 1
+ :Authors: - Me
+ - Myself
+ - I
+ :Indentation: Since the field marker may be quite long, the second
+ and subsequent lines of the field body do not have to line up
+ with the first line, but they must be indented relative to the
+ field name marker, and they must line up with each other.
+ :Parameter i: integer
+
+Field arguments are separated from the field name and each other by
+whitespace, and may not contain colons (":"). The interpretation of
+field arguments is up to the application. For example::
+
+ :name1 word number=1:
+ Both "word" and "number=1" are single words.
+
+The syntax for field arguments may be extended in the future. For
+example, quoted phrases may be treated as a single argument, and
+direct support for the "name=value" syntax may be added.
+
+Applications of reStructuredText may recognize field names and
+transform fields or field bodies in certain contexts; they are often
+used as part of an extension syntax. See `Bibliographic Fields`_
+below for one example, or the "image" directive in `reStructuredText
+Directives`_ for another.
+
+Standard RFC822 headers cannot be used for this construct because they
+are ambiguous. A word followed by a colon at the beginning of a line
+is common in written text. However, in well-defined contexts such as
+when a field list invariably occurs at the beginning of a document
+(PEPs and email messages), standard RFC822 headers could be used.
+
+Syntax diagram (simplified)::
+
+ +------------------------------+------------+
+ | ":" name (" " argument)* ":" | field body |
+ +-------+----------------------+ |
+ | (body elements)+ |
+ +-----------------------------------+
+
+
+Bibliographic Fields
+````````````````````
+
+DTD elements: docinfo, author, authors, organization, contact,
+version, status, date, copyright, topic.
+
+When a field list is the first non-comment element in a document
+(after the document title, if there is one), it may have certain
+specific fields transformed to document bibliographic data. This
+bibliographic data corresponds to the front matter of a book, such as
+the title page and copyright page.
+
+Certain field names (listed below) are recognized and transformed to
+the corresponding DTD elements, most becoming child elements of the
+"docinfo" element. No ordering is required of these fields, although
+they may be rearranged to fit the document structure, as noted.
+Unless otherwise indicated in the list below, each of the
+bibliographic elements' field bodies may contain a single paragraph
+only. Field bodies may be checked for `RCS keywords`_ and cleaned up.
+Any unrecognized fields will remain in a generic field list in the
+document body.
+
+The registered bibliographic field names and their corresponding DTD
+elements are as follows:
+
+- Field name "Author": author element.
+- "Authors": authors. May contain either: a single paragraph
+ consisting of a list of authors, separated by ";" or ","; or a
+ bullet list whose elements each contain a single paragraph per
+ author.
+- "Organization": organization.
+- "Contact": contact.
+- "Version": version.
+- "Status": status.
+- "Date": date.
+- "Copyright": copyright.
+- "Abstract": topic. May contain arbitrary body elements. Only one
+ abstract is allowed. The abstract becomes a topic element with
+ title "Abstract" (or language equivalent) immediately following the
+ docinfo element.
+
+This field-name-to-element mapping can be extended, or replaced for
+other languages. See the `DocInfo transform`_ implementation
+documentation for details.
+
+
+RCS Keywords
+````````````
+
+`Bibliographic fields`_ recognized by the parser are normally checked
+for RCS [#]_ keywords and cleaned up [#]_. RCS keywords may be
+entered into source files as "$keyword$", and once stored under RCS or
+CVS [#]_, they are expanded to "$keyword: expansion text $". For
+example, a "Status" field will be transformed to a "status" element::
+
+ :Status: $keyword: expansion text $
+
+.. [#] Revision Control System.
+.. [#] RCS keyword processing can be turned off (unimplemented).
+.. [#] Concurrent Versions System. CVS uses the same keywords as RCS.
+
+Processed, the "status" element's text will become simply "expansion
+text". The dollar sign delimiters and leading RCS keyword name are
+removed.
+
+The RCS keyword processing only kicks in when all of these conditions
+hold:
+
+1. The field list is in bibliographic context (first non-comment
+ contstruct in the document, after a document title if there is
+ one).
+
+2. The field name is a recognized bibliographic field name.
+
+3. The sole contents of the field is an expanded RCS keyword, of the
+ form "$Keyword: data $".
+
+
+Option Lists
+------------
+
+DTD elements: option_list, option_list_item, option_group, option,
+option_string, option_argument, description.
+
+Option lists are two-column lists of command-line options and
+descriptions, documenting a program's options. For example::
+
+ -a Output all.
+ -b Output both (this description is
+ quite long).
+ -c arg Output just arg.
+ --long Output all day long.
+
+ -p This option has two paragraphs in the description.
+ This is the first.
+
+ This is the second. Blank lines may be omitted between
+ options (as above) or left in (as here and below).
+
+ --very-long-option A VMS-syle option. Note the adjustment for
+ the required two spaces.
+
+ --an-even-longer-option
+ The description can also start on the next line.
+
+ -2, --two This option has two variants.
+
+ -f FILE, --file=FILE These two options are synonyms; both have
+ arguments.
+
+ /V A VMS/DOS-style option.
+
+There are several types of options recognized by reStructuredText:
+
+- Short POSIX options consist of one dash and an option letter.
+- Long POSIX options consist of two dashes and an option word; some
+ systems use a single dash.
+- Old GNU-style "plus" options consist of one plus and an option
+ letter ("plus" options are deprecated now, their use discouraged).
+- DOS/VMS options consist of a slash and an option letter or word.
+
+Please note that both POSIX-style and DOS/VMS-style options may be
+used by DOS or Windows software. These and other variations are
+sometimes used mixed together. The names above have been chosen for
+convenience only.
+
+The syntax for short and long POSIX options is based on the syntax
+supported by Python's getopt.py_ module, which implements an option
+parser similar to the `GNU libc getopt_long()`_ function but with some
+restrictions. There are many variant option systems, and
+reStructuredText option lists do not support all of them.
+
+Although long POSIX and DOS/VMS option words may be allowed to be
+truncated by the operating system or the application when used on the
+command line, reStructuredText option lists do not show or support
+this with any special syntax. The complete option word should be
+given, supported by notes about truncation if and when applicable.
+
+Options may be followed by an argument placeholder, whose role and
+syntax should be explained in the description text. Either a space or
+an equals sign may be used as a delimiter between options and option
+argument placeholders.
+
+Multiple option "synonyms" may be listed, sharing a single
+description. They must be separated by comma-space.
+
+There must be at least two spaces between the option(s) and the
+description. The description may contain multiple body elements. The
+first line after the option marker determines the indentation of the
+description. As with other types of lists, blank lines are required
+before the first option list item and after the last, but are optional
+between option entries.
+
+Syntax diagram (simplified)::
+
+ +----------------------------+-------------+
+ | option [" " argument] " " | description |
+ +-------+--------------------+ |
+ | (body elements)+ |
+ +----------------------------------+
+
+
+Literal Blocks
+--------------
+
+DTD element: literal_block.
+
+A paragraph consisting of two colons ("::") signifies that all
+following **indented** text blocks comprise a literal block. No
+markup processing is done within a literal block. It is left as-is,
+and is typically rendered in a monospaced typeface::
+
+ This is a typical paragraph. A literal block follows.
+
+ ::
+
+ for a in [5,4,3,2,1]: # this is program code, shown as-is
+ print a
+ print "it's..."
+ # a literal block continues until the indentation ends
+
+ This text has returned to the indentation of the first paragraph,
+ is outside of the literal block, and is therefore treated as an
+ ordinary paragraph.
+
+The paragraph containing only "::" will be completely removed from the
+output; no empty paragraph will remain.
+
+As a convenience, the "::" is recognized at the end of any paragraph.
+If immediately preceded by whitespace, both colons will be removed
+from the output (this is the "partially minimized" form). When text
+immediately precedes the "::", *one* colon will be removed from the
+output, leaving only one colon visible (i.e., "::" will be replaced by
+":"; this is the "fully minimized" form).
+
+In other words, these are all equivalent (please pay attention to the
+colons after "Paragraph"):
+
+1. Expanded form::
+
+ Paragraph:
+
+ ::
+
+ Literal block
+
+2. Partially minimized form::
+
+ Paragraph: ::
+
+ Literal block
+
+3. Fully minimized form::
+
+ Paragraph::
+
+ Literal block
+
+The minimum leading whitespace will be removed from each line of the
+literal block. Other than that, all whitespace (including line
+breaks) is preserved. Blank lines are required before and after a
+literal block, but these blank lines are not included as part of the
+literal block.
+
+Syntax diagram::
+
+ +------------------------------+
+ | paragraph |
+ | (ends with "::") |
+ +------------------------------+
+ +---------------------------+
+ | literal block |
+ +---------------------------+
+
+
+Block Quotes
+------------
+
+DTD element: block_quote.
+
+A text block that is indented relative to the preceding text, without
+markup indicating it to be a literal block, is a block quote. All
+markup processing (for body elements and inline markup) continues
+within the block quote::
+
+ This is an ordinary paragraph, introducing a block quote.
+
+ "It is my business to know things. That is my trade."
+
+ -- Sherlock Holmes
+
+Blank lines are required before and after a block quote, but these
+blank lines are not included as part of the block quote.
+
+Syntax diagram::
+
+ +------------------------------+
+ | (current level of |
+ | indentation) |
+ +------------------------------+
+ +---------------------------+
+ | block quote |
+ | (body elements)+ |
+ +---------------------------+
+
+
+Doctest Blocks
+--------------
+
+DTD element: doctest_block.
+
+Doctest blocks are interactive Python sessions cut-and-pasted into
+docstrings. They are meant to illustrate usage by example, and
+provide an elegant and powerful testing environment via the `doctest
+module`_ in the Python standard library.
+
+Doctest blocks are text blocks which begin with ``">>> "``, the Python
+interactive interpreter main prompt, and end with a blank line.
+Doctest blocks are treated as a special case of literal blocks,
+without requiring the literal block syntax. If both are present, the
+literal block syntax takes priority over Doctest block syntax::
+
+ This is an ordinary paragraph.
+
+ >>> print 'this is a Doctest block'
+ this is a Doctest block
+
+ The following is a literal block::
+
+ >>> This is not recognized as a doctest block by
+ reStructuredText. It *will* be recognized by the doctest
+ module, though!
+
+Indentation is not required for doctest blocks.
+
+
+Tables
+------
+
+DTD elements: table, tgroup, colspec, thead, tbody, row, entry.
+
+Tables are described with a visual outline made up of the characters
+"-", "=", "|", and "+". The hyphen ("-") is used for horizontal lines
+(row separators). The equals sign ("=") may be used to separate
+optional header rows from the table body. The vertical bar ("|") is
+used for vertical lines (column separators). The plus sign ("+") is
+used for intersections of horizontal and vertical lines.
+
+Each table cell is treated as a miniature document; the top and bottom
+cell boundaries act as delimiting blank lines. Each cell contains
+zero or more body elements. Cell contents may include left and/or
+right margins, which are removed before processing. Example::
+
+ +------------------------+------------+----------+----------+
+ | Header row, column 1 | Header 2 | Header 3 | Header 4 |
+ | (header rows optional) | | | |
+ +========================+============+==========+==========+
+ | body row 1, column 1 | column 2 | column 3 | column 4 |
+ +------------------------+------------+----------+----------+
+ | body row 2 | Cells may span columns. |
+ +------------------------+------------+---------------------+
+ | body row 3 | Cells may | - Table cells |
+ +------------------------+ span rows. | - contain |
+ | body row 4 | | - body elements. |
+ +------------------------+------------+---------------------+
+
+As with other body elements, blank lines are required before and after
+tables. Tables' left edges should align with the left edge of
+preceding text blocks; otherwise, the table is considered to be part
+of a block quote.
+
+Some care must be taken with tables to avoid undesired interactions
+with cell text in rare cases. For example, the following table
+contains a cell in row 2 spanning from column 2 to column 4::
+
+ +--------------+----------+-----------+-----------+
+ | row 1, col 1 | column 2 | column 3 | column 4 |
+ +--------------+----------+-----------+-----------+
+ | row 2 | |
+ +--------------+----------+-----------+-----------+
+ | row 3 | | | |
+ +--------------+----------+-----------+-----------+
+
+If a vertical bar is used in the text of that cell, it could have
+unintended effects if accidentally aligned with column boundaries::
+
+ +--------------+----------+-----------+-----------+
+ | row 1, col 1 | column 2 | column 3 | column 4 |
+ +--------------+----------+-----------+-----------+
+ | row 2 | Use the command ``ls | more``. |
+ +--------------+----------+-----------+-----------+
+ | row 3 | | | |
+ +--------------+----------+-----------+-----------+
+
+Several solutions are possible. All that is needed is to break the
+continuity of the cell outline rectangle. One possibility is to shift
+the text by adding an extra space before::
+
+ +--------------+----------+-----------+-----------+
+ | row 1, col 1 | column 2 | column 3 | column 4 |
+ +--------------+----------+-----------+-----------+
+ | row 2 | Use the command ``ls | more``. |
+ +--------------+----------+-----------+-----------+
+ | row 3 | | | |
+ +--------------+----------+-----------+-----------+
+
+Another possibility is to add an extra line to row 2::
+
+ +--------------+----------+-----------+-----------+
+ | row 1, col 1 | column 2 | column 3 | column 4 |
+ +--------------+----------+-----------+-----------+
+ | row 2 | Use the command ``ls | more``. |
+ | | |
+ +--------------+----------+-----------+-----------+
+ | row 3 | | | |
+ +--------------+----------+-----------+-----------+
+
+
+Explicit Markup Blocks
+----------------------
+
+An explicit markup block is a text block:
+
+- whose first line begins with ".." followed by whitespace (the
+ "explicit markup start"),
+- whose second and subsequent lines (if any) are indented relative to
+ the first, and
+- which ends before an unindented line.
+
+Explicit markup blocks are analogous to bullet list items, with ".."
+as the bullet. The text immediately after the explicit markup start
+determines the indentation of the block body. Blank lines are
+required between explicit markup blocks and other elements, but are
+optional between explicit markup blocks where unambiguous.
+
+The explicit markup syntax is used for footnotes, citations, hyperlink
+targets, directives, and comments.
+
+
+Footnotes
+`````````
+
+DTD elements: footnote, label.
+
+Each footnote consists of an explicit markup start (".. "), a left
+square bracket, the footnote label, a right square bracket, and
+whitespace, followed by indented body elements. A footnote label can
+be:
+
+- a whole decimal number consisting of one or more digits,
+
+- a single "#" (denoting `auto-numbered footnotes`_),
+
+- a "#" followed by a simple reference name (an `autonumber label`_),
+ or
+
+- a single "*" (denoting `auto-symbol footnotes`_).
+
+If the first body element within a footnote is a simple paragraph, it
+may begin on the same line as the footnote label. Other elements must
+begin on a new line, consistently indented (by at least 3 spaces) and
+left-aligned.
+
+Footnotes may occur anywhere in the document, not only at the end.
+Where or how they appear in the processed output depends on the
+processing system.
+
+Here is a manually numbered footnote::
+
+ .. [1] Body elements go here.
+
+Each footnote automatically generates a hyperlink target pointing to
+itself. The text of the hyperlink target name is the same as that of
+the footnote label. `Auto-numbered footnotes`_ generate a number as
+their footnote label and reference name. See `Implicit Hyperlink
+Targets`_ for a complete description of the mechanism.
+
+Syntax diagram::
+
+ +-------+-------------------------+
+ | ".. " | "[" label "]" footnote |
+ +-------+ |
+ | (body elements)+ |
+ +-------------------------+
+
+
+Auto-Numbered Footnotes
+.......................
+
+A number sign ("#") may be used as the first character of a footnote
+label to request automatic numbering of the footnote or footnote
+reference.
+
+The first footnote to request automatic numbering is assigned the
+label "1", the second is assigned the label "2", and so on (assuming
+there are no manually numbered footnotes present; see `Mixed Manual
+and Auto-Numbered Footnotes`_ below). A footnote which has
+automatically received a label "1" generates an implicit hyperlink
+target with name "1", just as if the label was explicitly specified.
+
+.. _autonumber label: `autonumber labels`_
+
+A footnote may specify a label explicitly while at the same time
+requesting automatic numbering: ``[#label]``. These labels are called
+_`autonumber labels`. Autonumber labels do two things:
+
+- On the footnote itself, they generate a hyperlink target whose name
+ is the autonumber label (doesn't include the "#").
+
+- They allow an automatically numbered footnote to be referred to more
+ than once, as a footnote reference or hyperlink reference. For
+ example::
+
+ If [#note]_ is the first footnote reference, it will show up as
+ "[1]". We can refer to it again as [#note]_ and again see
+ "[1]". We can also refer to it as note_ (an ordinary internal
+ hyperlink reference).
+
+ .. [#note] This is the footnote labeled "note".
+
+The numbering is determined by the order of the footnotes, not by the
+order of the references. For footnote references without autonumber
+labels (``[#]_``), the footnotes and footnote references must be in
+the same relative order but need not alternate in lock-step. For
+example::
+
+ [#]_ is a reference to footnote 1, and [#]_ is a reference to
+ footnote 2.
+
+ .. [#] This is footnote 1.
+ .. [#] This is footnote 2.
+ .. [#] This is footnote 3.
+
+ [#]_ is a reference to footnote 3.
+
+Special care must be taken if footnotes themselves contain
+auto-numbered footnote references, or if multiple references are made
+in close proximity. Footnotes and references are noted in the order
+they are encountered in the document, which is not necessarily the
+same as the order in which a person would read them.
+
+
+Auto-Symbol Footnotes
+.....................
+
+An asterisk ("*") may be used for footnote labels to request automatic
+symbol generation for footnotes and footnote references. The asterisk
+may be the only character in the label. For example::
+
+ Here is a symbolic footnote reference: [*]_.
+
+ .. [*] This is the footnote.
+
+A transform will insert symbols as labels into corresponding footnotes
+and footnote references.
+
+The standard Docutils system uses the following symbols for footnote
+marks [#]_:
+
+- asterisk/star ("*")
+- dagger (HTML character entity "†")
+- double dagger ("‡")
+- section mark ("§")
+- pilcrow or paragraph mark ("¶")
+- number sign ("#")
+- spade suit ("♠")
+- heart suit ("♥")
+- diamond suit ("♦")
+- club suit ("♣")
+
+.. [#] This list was inspired by the list of symbols for "Note
+ Reference Marks" in The Chicago Manual of Style, 14th edition,
+ section 12.51. "Parallels" ("\|\|") were given in CMoS instead of
+ the pilcrow. The last four symbols (the card suits) were added
+ arbitrarily.
+
+If more than ten symbols are required, the same sequence will be
+reused, doubled and then tripled, and so on ("**" etc.).
+
+
+Mixed Manual and Auto-Numbered Footnotes
+........................................
+
+Manual and automatic footnote numbering may both be used within a
+single document, although the results may not be expected. Manual
+numbering takes priority. Only unused footnote numbers are assigned
+to auto-numbered footnotes. The following example should be
+illustrative::
+
+ [2]_ will be "2" (manually numbered),
+ [#]_ will be "3" (anonymous auto-numbered), and
+ [#label]_ will be "1" (labeled auto-numbered).
+
+ .. [2] This footnote is labeled manually, so its number is fixed.
+
+ .. [#label] This autonumber-labeled footnote will be labeled "1".
+ It is the first auto-numbered footnote and no other footnote
+ with label "1" exists. The order of the footnotes is used to
+ determine numbering, not the order of the footnote references.
+
+ .. [#] This footnote will be labeled "3". It is the second
+ auto-numbered footnote, but footnote label "2" is already used.
+
+
+Citations
+`````````
+
+Citations are identical to footnotes except that they use only
+non-numeric labels such as ``[note]`` or ``[GVR2001]``. Citation
+labels are simple `reference names`_ (case-insensitive single words
+consisting of alphanumerics plus internal hyphens, underscores, and
+periods; no whitespace). Citations may be rendered separately and
+differently from footnotes. For example::
+
+ Here is a citation reference: [CIT2002]_.
+
+ .. [CIT2002] This is the citation. It's just like a footnote,
+ except the label is textual.
+
+
+.. _hyperlinks:
+
+Hyperlink Targets
+`````````````````
+
+DTD element: target.
+
+These are also called _`explicit hyperlink targets`, to differentiate
+them from `implicit hyperlink targets`_ defined below.
+
+Hyperlink targets identify a location within or outside of a document,
+which may be linked to by `hyperlink references`_.
+
+Hyperlink targets may be named or anonymous. Named hyperlink targets
+consist of an explicit markup start (".. "), an underscore, the
+reference name (no trailing underscore), a colon, whitespace, and a
+link block::
+
+ .. _hyperlink-name: link-block
+
+Reference names are whitespace-neutral and case-insensitive. See
+`Reference Names`_ for details and examples.
+
+Anonymous hyperlink targets consist of an explicit markup start
+(".. "), two underscores, a colon, whitespace, and a link block; there
+is no reference name::
+
+ .. __: anonymous-hyperlink-target-link-block
+
+An alternate syntax for anonymous hyperlinks consists of two
+underscores, a space, and a link block::
+
+ __ anonymous-hyperlink-target-link-block
+
+See `Anonymous Hyperlinks`_ below.
+
+There are three types of hyperlink targets: internal, external, and
+indirect.
+
+1. _`Internal hyperlink targets` have empty link blocks. They provide
+ an end point allowing a hyperlink to connect one place to another
+ within a document. An internal hyperlink target points to the
+ element following the target. For example::
+
+ Clicking on this internal hyperlink will take us to the target_
+ below.
+
+ .. _target:
+
+ The hyperlink target above points to this paragraph.
+
+ Internal hyperlink targets may be "chained". Multiple adjacent
+ internal hyperlink targets all point to the same element::
+
+ .. _target1:
+ .. _target2:
+
+ The targets "target1" and "target2" are synonyms; they both
+ point to this paragraph.
+
+ If the element "pointed to" is an external hyperlink target (with a
+ URI in its link block; see #2 below) the URI from the external
+ hyperlink target is propagated to the internal hyperlink targets;
+ they will all "point to" the same URI. There is no need to
+ duplicate a URI. For example, all three of the following hyperlink
+ targets refer to the same URI::
+
+ .. _Python DOC-SIG mailing list archive:
+ .. _archive:
+ .. _Doc-SIG: http://mail.python.org/pipermail/doc-sig/
+
+ An inline form of internal hyperlink target is available; see
+ `Inline Hyperlink Targets`_.
+
+2. _`External hyperlink targets` have an absolute or relative URI in
+ their link blocks. For example, take the following input::
+
+ See the Python_ home page for info.
+
+ .. _Python: http://www.python.org
+
+ After processing into HTML, the hyperlink might be expressed as::
+
+ See the Python home page
+ for info.
+
+ An external hyperlink's URI may begin on the same line as the
+ explicit markup start and target name, or it may begin in an
+ indented text block immediately following, with no intervening
+ blank lines. If there are multiple lines in the link block, they
+ are stripped of leading and trailing whitespace and concatenated.
+ The following external hyperlink targets are equivalent::
+
+ .. _one-liner: http://docutils.sourceforge.net/rst.html
+
+ .. _starts-on-this-line: http://
+ docutils.sourceforge.net/rst.html
+
+ .. _entirely-below:
+ http://docutils.
+ sourceforge.net/rst.html
+
+ If an external hyperlink target's URI contains an underscore as its
+ last character, it must be escaped to avoid being mistaken for an
+ indirect hyperlink target::
+
+ This link_ refers to a file called ``underscore_``.
+
+ .. _link: underscore\_
+
+3. _`Indirect hyperlink targets` have a hyperlink reference in their
+ link blocks. In the following example, target "one" indirectly
+ references whatever target "two" references, and target "two"
+ references target "three", an internal hyperlink target. In
+ effect, all three reference the same thing::
+
+ .. _one: two_
+ .. _two: three_
+ .. _three:
+
+ Just as with `hyperlink references`_ anywhere else in a document,
+ if a phrase-reference is used in the link block it must be enclosed
+ in backquotes. As with `external hyperlink targets`_, the link
+ block of an indirect hyperlink target may begin on the same line as
+ the explicit markup start or the next line. It may also be split
+ over multiple lines, in which case the lines are joined with
+ whitespace before being normalized.
+
+ For example, the following indirect hyperlink targets are
+ equivalent::
+
+ .. _one-liner: `A HYPERLINK`_
+ .. _entirely-below:
+ `a hyperlink`_
+ .. _split: `A
+ Hyperlink`_
+
+If a reference name contains a colon followed by whitespace, either:
+
+- the phrase must be enclosed in backquotes::
+
+ .. _`FAQTS: Computers: Programming: Languages: Python`:
+ http://python.faqts.com/
+
+- or the colon(s) must be backslash-escaped in the link target::
+
+ .. _Chapter One\: "Tadpole Days":
+
+ It's not easy being green...
+
+See `Implicit Hyperlink Targets`_ below for the resolution of
+duplicate reference names.
+
+Syntax diagram::
+
+ +-------+----------------------+
+ | ".. " | "_" name ":" link |
+ +-------+ block |
+ | |
+ +----------------------+
+
+
+Anonymous Hyperlinks
+....................
+
+The `World Wide Web Consortium`_ recommends in its `HTML Techniques
+for Web Content Accessibility Guidelines`_ that authors should
+"clearly identify the target of each link." Hyperlink references
+should be as verbose as possible, but duplicating a verbose hyperlink
+name in the target is onerous and error-prone. Anonymous hyperlinks
+are designed to allow convenient verbose hyperlink references, and are
+analogous to `Auto-Numbered Footnotes`_. They are particularly useful
+in short or one-off documents.
+
+Anonymous `hyperlink references`_ are specified with two underscores
+instead of one::
+
+ See `the web site of my favorite programming language`__.
+
+Anonymous targets begin with ".. __:"; no reference name is required
+or allowed::
+
+ .. __: http://www.python.org
+
+As a convenient alternative, anonymous targets may begin with "__"
+only::
+
+ __ http://www.python.org
+
+The reference name of the reference is not used to match the reference
+to its target. Instead, the order of anonymous hyperlink references
+and targets within the document is significant: the first anonymous
+reference will link to the first anonymous target. The number of
+anonymous hyperlink references in a document must match the number of
+anonymous targets.
+
+
+Directives
+``````````
+
+DTD elements: depend on the directive.
+
+Directives are indicated by an explicit markup start (".. ") followed
+by the directive type, two colons, and whitespace. Directive types
+are case-insensitive single words (alphanumerics plus internal
+hyphens, underscores, and periods; no whitespace). Two colons are
+used after the directive type for these reasons:
+
+- To avoid clashes with common comment text like::
+
+ .. Danger: modify at your own risk!
+
+- If an implementation of reStructuredText does not recognize a
+ directive (i.e., the directive-handler is not installed), the entire
+ directive block (including the directive itself) will be treated as
+ a literal block, and a level-3 (error) system message generated.
+ Thus "::" is a natural choice.
+
+Any text on the first line after the directive indicator is directive
+data. The interpretation of directive data is up to the directive
+code. Directive data may be interpreted as arguments to the
+directive, or simply as the first line of the directive's text block.
+
+Actions taken in response to directives and the interpretation of text
+in the directive block or subsequent text block(s) are
+directive-dependent. Indented text following a directive may be
+interpreted as a directive block. Simple directives may not require
+any text beyond the directive data (if that), and will not process any
+following indented text.
+
+Directives which have been implemented and registered in the reference
+reStructuredText parser are described in the `reStructuredText
+Directives`_ document. Below are examples of implemented directives.
+
+Directives are meant for the arbitrary processing of their contents
+(the directive data & text block), which can be transformed into
+something possibly unrelated to the original text. Directives are
+used as an extension mechanism for reStructuredText, a way of adding
+support for new constructs without adding new syntax. For example,
+here's how an image may be placed::
+
+ .. image:: mylogo.png
+
+A figure (a graphic with a caption) may placed like this::
+
+ .. figure:: larch.png
+ The larch.
+
+An admonition (note, caution, etc.) contains other body elements::
+
+ .. note:: This is a paragraph
+
+ - Here is a bullet list.
+
+It may also be possible for directives to be used as pragmas, to
+modify the behavior of the parser, such as to experiment with
+alternate syntax. There is no parser support for this functionality
+at present; if a reasonable need for pragma directives is found, they
+may be supported.
+
+Directives normally do not survive as "directive" elements past the
+parsing stage; they are a *parser construct* only, and have no
+intrinsic meaning outside of reStructuredText. Instead, the parser
+will transform recognized directives into (possibly specialized)
+document elements. Unknown directives will trigger level-3 (error)
+system messages.
+
+Syntax diagram::
+
+ +-------+--------------------------+
+ | ".. " | directive type "::" data |
+ +-------+ directive block |
+ | |
+ +--------------------------+
+
+
+Substitution Definitions
+````````````````````````
+
+DTD element: substitution_definition.
+
+Substitution definitions are indicated by an explicit markup start
+(".. ") followed by a vertical bar, the substitution text, another
+vertical bar, whitespace, and the definition block. Substitution text
+may not begin or end with whitespace. A substitution definition block
+contains an embedded inline-compatible directive (without the leading
+".. "), such as an image. For example::
+
+ The |biohazard| symbol must be used on containers used to
+ dispose of medical waste.
+
+ .. |biohazard| image:: biohazard.png
+
+It is an error for a substitution definition block to directly or
+indirectly contain a circular substitution reference.
+
+`Substitution references`_ are replaced in-line by the processed
+contents of the corresponding definition (linked by matching
+substitution text). Substitution definitions allow the power and
+flexibility of block-level directives_ to be shared by inline text.
+They are a way to include arbitrarily complex inline structures within
+text, while keeping the details out of the flow of text. They are the
+equivalent of SGML/XML's named entities or programming language
+macros.
+
+Without the substitution mechanism, every time someone wants an
+application-specific new inline structure, they would have to petition
+for a syntax change. In combination with existing directive syntax,
+any inline structure can be coded without new syntax (except possibly
+a new directive).
+
+Syntax diagram::
+
+ +-------+-----------------------------------------------------+
+ | ".. " | "|" substitution text "| " directive type "::" data |
+ +-------+ directive block |
+ | |
+ +-----------------------------------------------------+
+
+Following are some use cases for the substitution mechanism. Please
+note that most of the embedded directives shown are examples only and
+have not been implemented.
+
+Objects
+ Substitution references may be used to associate ambiguous text
+ with a unique object identifier.
+
+ For example, many sites may wish to implement an inline "user"
+ directive::
+
+ |Michael| and |Jon| are our widget-wranglers.
+
+ .. |Michael| user:: mjones
+ .. |Jon| user:: jhl
+
+ Depending on the needs of the site, this may be used to index the
+ document for later searching, to hyperlink the inline text in
+ various ways (mailto, homepage, mouseover Javascript with profile
+ and contact information, etc.), or to customize presentation of
+ the text (include username in the inline text, include an icon
+ image with a link next to the text, make the text bold or a
+ different color, etc.).
+
+ The same approach can be used in documents which frequently refer
+ to a particular type of objects with unique identifiers but
+ ambiguous common names. Movies, albums, books, photos, court
+ cases, and laws are possible. For example::
+
+ |The Transparent Society| offers a fascinating alternate view
+ on privacy issues.
+
+ .. |The Transparent Society| book:: isbn=0738201448
+
+ Classes or functions, in contexts where the module or class names
+ are unclear and/or interpreted text cannot be used, are another
+ possibility::
+
+ 4XSLT has the convenience method |runString|, so you don't
+ have to mess with DOM objects if all you want is the
+ transformed output.
+
+ .. |runString| function:: module=xml.xslt class=Processor
+
+Images
+ Images are a common use for substitution references::
+
+ West led the |H| 3, covered by dummy's |H| Q, East's |H| K,
+ and trumped in hand with the |S| 2.
+
+ .. |H| image:: /images/heart.png
+ :height: 11
+ :width: 11
+ .. |S| image:: /images/spade.png
+ :height: 11
+ :width: 11
+
+ * |Red light| means stop.
+ * |Green light| means go.
+ * |Yellow light| means go really fast.
+
+ .. |Red light| image:: red_light.png
+ .. |Green light| image:: green_light.png
+ .. |Yellow light| image:: yellow_light.png
+
+ |-><-| is the official symbol of POEE_.
+
+ .. |-><-| image:: discord.png
+ .. _POEE: http://www.poee.org/
+
+ The "image" directive has been implemented.
+
+Styles [#]_
+ Substitution references may be used to associate inline text with
+ an externally defined presentation style::
+
+ Even |the text in Texas| is big.
+
+ .. |the text in Texas| style:: big
+
+ The style name may be meaningful in the context of some particular
+ output format (CSS class name for HTML output, LaTeX style name
+ for LaTeX, etc), or may be ignored for other output formats (often
+ for plain text).
+
+ .. @@@ This needs to be rethought & rewritten or removed:
+
+ Interpreted text is unsuitable for this purpose because the set
+ of style names cannot be predefined - it is the domain of the
+ content author, not the author of the parser and output
+ formatter - and there is no way to associate a stylename
+ argument with an interpreted text style role. Also, it may be
+ desirable to use the same mechanism for styling blocks::
+
+ .. style:: motto
+ At Bob's Underwear Shop, we'll do anything to get in
+ your pants.
+
+ .. style:: disclaimer
+ All rights reversed. Reprint what you like.
+
+ .. [#] There may be sufficient need for a "style" mechanism to
+ warrant simpler syntax such as an extension to the interpreted
+ text role syntax. The substitution mechanism is cumbersome for
+ simple text styling.
+
+Templates
+ Inline markup may be used for later processing by a template
+ engine. For example, a Zope_ author might write::
+
+ Welcome back, |name|!
+
+ .. |name| tal:: replace user/getUserName
+
+ After processing, this ZPT output would result::
+
+ Welcome back,
+ name!
+
+ Zope would then transform this to something like "Welcome back,
+ David!" during a session with an actual user.
+
+Replacement text
+ The substitution mechanism may be used for simple macro
+ substitution. This may be appropriate when the replacement text
+ is repeated many times throughout one or more documents,
+ especially if it may need to change later. A short example is
+ unavoidably contrived::
+
+ |RST| is a little annoying to type over and over, especially
+ when writing about |RST| itself, and spelling out the
+ bicapitalized word |RST| every time isn't really necessary for
+ |RST| source readability.
+
+ .. |RST| replace:: reStructuredText_
+ .. _reStructuredText: http://docutils.sourceforge.net/rst.html
+
+ Substitution is also appropriate when the replacement text cannot
+ be represented using other inline constructs, or is obtrusively
+ long::
+
+ But still, that's nothing compared to a name like
+ |j2ee-cas|__.
+
+ .. |j2ee-cas| replace::
+ the Java `TM`:super: 2 Platform, Enterprise Edition Client
+ Access Services
+ __ http://developer.java.sun.com/developer/earlyAccess/
+ j2eecas/
+
+
+Comments
+````````
+
+DTD element: comment.
+
+Arbitrary indented text may follow the explicit markup start and will
+be processed as a comment element. No further processing is done on
+the comment block text; a comment contains a single "text blob".
+Depending on the output formatter, comments may be removed from the
+processed output. The only restriction on comments is that they not
+use the same syntax as directives, footnotes, citations, or hyperlink
+targets.
+
+A explicit markup start followed by a blank line and nothing else
+(apart from whitespace) is an "empty comment". It serves to terminate
+a preceding construct, and does **not** consume any indented text
+following. To have a block quote follow a list or any indented
+construct, insert an unindented empty comment in-between.
+
+Syntax diagram::
+
+ +-------+----------------------+
+ | ".. " | comment |
+ +-------+ block |
+ | |
+ +----------------------+
+
+
+Implicit Hyperlink Targets
+==========================
+
+Implicit hyperlink targets are generated by section titles, footnotes,
+and citations, and may also be generated by extension constructs.
+Implicit hyperlink targets otherwise behave identically to explicit
+`hyperlink targets`_.
+
+Problems of ambiguity due to conflicting duplicate implicit and
+explicit reference names are avoided by following this procedure:
+
+1. `Explicit hyperlink targets`_ override any implicit targets having
+ the same reference name. The implicit hyperlink targets are
+ removed, and level-1 (info) system messages are inserted.
+
+2. Duplicate implicit hyperlink targets are removed, and level-1
+ (info) system messages inserted. For example, if two or more
+ sections have the same title (such as "Introduction" subsections of
+ a rigidly-structured document), there will be duplicate implicit
+ hyperlink targets.
+
+3. Duplicate explicit hyperlink targets are removed, and level-2
+ (warning) system messages are inserted. Exception: duplicate
+ `external hyperlink targets`_ (identical hyperlink names and
+ referenced URIs) do not conflict, and are not removed.
+
+System messages are inserted where target links have been removed.
+See "Error Handling" in `PEP 258`_.
+
+The parser must return a set of *unique* hyperlink targets. The
+calling software (such as the Docutils_) can warn of unresolvable
+links, giving reasons for the messages.
+
+
+Inline Markup
+=============
+
+In reStructuredText, inline markup applies to words or phrases within
+a text block. The same whitespace and punctuation that serves to
+delimit words in written text is used to delimit the inline markup
+syntax constructs. The text within inline markup may not begin or end
+with whitespace. Arbitrary character-level markup is not supported;
+it is not possible to mark up individual characters within a word.
+Inline markup cannot be nested.
+
+There are nine inline markup constructs. Five of the constructs use
+identical start-strings and end-strings to indicate the markup:
+
+- emphasis_: "*"
+- `strong emphasis`_: "**"
+- `interpreted text`_: "`"
+- `inline literals`_: "``"
+- `substitution references`_: "|"
+
+Three constructs use different start-strings and end-strings:
+
+- `inline hyperlink targets`_: "_`" and "`"
+- `footnote references`_: "[" and "]_"
+- `hyperlink references`_: "`" and "\`_" (phrases), or just a
+ trailing "_" (single words)
+
+`Standalone hyperlinks`_ are recognized implicitly, and use no extra
+markup.
+
+The inline markup start-string and end-string recognition rules are as
+follows. If any of the conditions are not met, the start-string or
+end-string will not be recognized or processed.
+
+1. Inline markup start-strings must start a text block or be
+ immediately preceded by whitespace, single or double quotes, "(",
+ "[", "{", or "<".
+
+2. Inline markup start-strings must be immediately followed by
+ non-whitespace.
+
+3. Inline markup end-strings must be immediately preceded by
+ non-whitespace.
+
+4. Inline markup end-strings must end a text block or be immediately
+ followed by whitespace or one of::
+
+ ' " . , : ; ! ? - ) ] } >
+
+5. If an inline markup start-string is immediately preceded by a
+ single or double quote, "(", "[", "{", or "<", it must not be
+ immediately followed by the corresponding single or double quote,
+ ")", "]", "}", or ">".
+
+6. An inline markup end-string must be separated by at least one
+ character from the start-string.
+
+7. An unescaped backslash preceding a start-string or end-string will
+ disable markup recognition, except for the end-string of `inline
+ literals`_. See `Escaping Mechanism`_ above for details.
+
+For example, none of the following are recognized as containing inline
+markup start-strings: " * ", '"*"', "'*'", "(*)", "(* ", "[*]", "{*}",
+"\*", " ` ", etc.
+
+The inline markup recognition rules were devised intentionally to
+allow 90% of non-markup uses of "*", "`", "_", and "|" *without*
+resorting to backslashes. For 9 of the remaining 10%, use inline
+literals or literal blocks::
+
+ "``\*``" -> "\*" (possibly in another font or quoted)
+
+Only those who understand the escaping and inline markup rules should
+attempt the remaining 1%. ;-)
+
+Inline markup delimiter characters are used for multiple constructs,
+so to avoid ambiguity there must be a specific recognition order for
+each character. The inline markup recognition order is as follows:
+
+- Asterisks: `Strong emphasis`_ ("**") is recognized before emphasis_
+ ("*").
+
+- Backquotes: `Inline literals`_ ("``"), `inline hyperlink targets`_
+ (leading "_`", trailing "`"), are mutually independent, and are
+ recognized before phrase `hyperlink references`_ (leading "`",
+ trailing "\`_") and `interpreted text`_ ("`").
+
+- Trailing underscores: Footnote references ("[" + label + "]_") and
+ simple `hyperlink references`_ (name + trailing "_") are mutually
+ independent.
+
+- Vertical bars: `Substitution references`_ ("|") are independently
+ recognized.
+
+- `Standalone hyperlinks`_ are the last to be recognized.
+
+
+Emphasis
+--------
+
+DTD element: emphasis.
+
+Start-string = end-string = "*".
+
+Text enclosed by single asterisk characters is emphasized::
+
+ This is *emphasized text*.
+
+Emphasized text is typically displayed in italics.
+
+
+Strong Emphasis
+---------------
+
+DTD element: strong.
+
+Start-string = end-string = "**".
+
+Text enclosed by double-asterisks is emphasized strongly::
+
+ This is **strong text**.
+
+Strongly emphasized text is typically displayed in boldface.
+
+
+Interpreted Text
+----------------
+
+DTD element: interpreted.
+
+Start-string = end-string = "`".
+
+Text enclosed by single backquote characters is interpreted::
+
+ This is `interpreted text`.
+
+Interpreted text is text that is meant to be related, indexed, linked,
+summarized, or otherwise processed, but the text itself is left
+alone. The text is "tagged" directly, in-place. The semantics of
+interpreted text are domain-dependent. It can be used as implicit or
+explicit descriptive markup (such as for program identifiers, as in
+the `Python Source Reader`_), for cross-reference interpretation (such
+as index entries), or for other applications where context can be
+inferred.
+
+The role of the interpreted text determines how the text is
+interpreted. It is normally inferred implicitly. The role of the
+interpreted text may also be indicated explicitly, using a role
+marker, either as a prefix or as a suffix to the interpreted text,
+depending on which reads better::
+
+ :role:`interpreted text`
+
+ `interpreted text`:role:
+
+Roles are simply extensions of the available inline constructs; to
+emphasis_, `strong emphasis`_, `inline literals`_, and `hyperlink
+references`_, we can add "index entry", "acronym", "class", "red",
+"blinking" or anything else we want.
+
+A role marker consists of a colon, the role name, and another colon.
+A role name is a single word consisting of alphanumerics plus internal
+hypens, underscores, and periods; no whitespace or other characters
+are allowed.
+
+
+Inline Literals
+---------------
+
+DTD element: literal.
+
+Start-string = end-string = "``".
+
+Text enclosed by double-backquotes is treated as inline literals::
+
+ This text is an example of ``inline literals``.
+
+Inline literals may contain any characters except two adjacent
+backquotes in an end-string context (according to the recognition
+rules above). No markup interpretation (including backslash-escape
+interpretation) is done within inline literals.
+
+Line breaks are *not* preserved in inline literals. Although a
+reStructuredText parser will preserve runs of spaces in its output,
+the final representation of the processed document is dependent on the
+output formatter, thus the preservation of whitespace cannot be
+guaranteed. If the preservation of line breaks and/or other
+whitespace is important, `literal blocks`_ should be used.
+
+Inline literals are useful for short code snippets. For example::
+
+ The regular expression ``[+-]?(\d+(\.\d*)?|\.\d+)`` matches
+ floating-point numbers (without exponents).
+
+
+Hyperlink References
+--------------------
+
+DTD element: reference.
+
+- Named hyperlink references:
+
+ - Start-string = "" (empty string), end-string = "_".
+ - Start-string = "`", end-string = "\`_". (Phrase references.)
+
+- Anonymous hyperlink references:
+
+ - Start-string = "" (empty string), end-string = "__".
+ - Start-string = "`", end-string = "\`__". (Phrase references.)
+
+Hyperlink references are indicated by a trailing underscore, "_",
+except for `standalone hyperlinks`_ which are recognized
+independently. The underscore can be thought of as a right-pointing
+arrow. The trailing underscores point away from hyperlink references,
+and the leading underscores point toward `hyperlink targets`_.
+
+Hyperlinks consist of two parts. In the text body, there is a source
+link, a reference name with a trailing underscore (or two underscores
+for `anonymous hyperlinks`_)::
+
+ See the Python_ home page for info.
+
+A target link with a matching reference name must exist somewhere else
+in the document. See `Hyperlink Targets`_ for a full description).
+
+`Anonymous hyperlinks`_ (which see) do not use reference names to
+match references to targets, but otherwise behave similarly to named
+hyperlinks.
+
+
+Inline Hyperlink Targets
+------------------------
+
+DTD element: target.
+
+Start-string = "_`", end-string = "`".
+
+Inline hyperlink targets are the equivalent of explicit `internal
+hyperlink targets`_, but may appear within running text. The syntax
+begins with an underscore and a backquote, is followed by a hyperlink
+name or phrase, and ends with a backquote. Inline hyperlink targets
+may not be anonymous.
+
+For example, the following paragraph contains a hyperlink target named
+"Norwegian Blue"::
+
+ Oh yes, the _`Norwegian Blue`. What's, um, what's wrong with it?
+
+See `Implicit Hyperlink Targets`_ for the resolution of duplicate
+reference names.
+
+
+Footnote References
+-------------------
+
+DTD element: footnote_reference.
+
+Start-string = "[", end-string = "]_".
+
+Each footnote reference consists of a square-bracketed label followed
+by a trailing underscore. Footnote labels are one of:
+
+- one or more digits (i.e., a number),
+
+- a single "#" (denoting `auto-numbered footnotes`_),
+
+- a "#" followed by a simple reference name (an `autonumber label`_),
+ or
+
+- a single "*" (denoting `auto-symbol footnotes`_).
+
+For example::
+
+ Please RTFM [1]_.
+
+ .. [1] Read The Fine Manual
+
+
+Citation References
+-------------------
+
+DTD element: citation_reference.
+
+Start-string = "[", end-string = "]_".
+
+Each citation reference consists of a square-bracketed label followed
+by a trailing underscore. Citation labels are simple `reference
+names`_ (case-insensitive single words, consisting of alphanumerics
+plus internal hyphens, underscores, and periods; no whitespace).
+
+For example::
+
+ Here is a citation reference: [CIT2002]_.
+
+See Citations_ for the citation itself.
+
+
+Substitution References
+-----------------------
+
+DTD element: substitution_reference, reference.
+
+Start-string = "|", end-string = "|" (optionally followed by "_" or
+"__").
+
+Vertical bars are used to bracket the substitution reference text. A
+substitution reference may also be a hyperlink reference by appending
+a "_" (named) or "__" (anonymous) suffix; the substitution text is
+used for the reference text in the named case.
+
+The processing system replaces substitution references with the
+processed contents of the corresponding `substitution definitions`_.
+Substitution definitions produce inline-compatible elements.
+
+Examples::
+
+ This is a simple |substitution reference|. It will be replaced by
+ the processing system.
+
+ This is a combination |substitution and hyperlink reference|_. In
+ addition to being replaced, the replacement text or element will
+ refer to the "substitution and hyperlink reference" target.
+
+
+Standalone Hyperlinks
+---------------------
+
+DTD element: link.
+
+Start-string = end-string = "" (empty string).
+
+A URI (absolute URI [#URI]_ or standalone email address) within a text
+block is treated as a general external hyperlink with the URI itself
+as the link's text. For example::
+
+ See http://www.python.org for info.
+
+would be marked up in HTML as::
+
+ See http://www.python.org for
+ info.
+
+Two forms of URI are recognized:
+
+1. Absolute URIs. These consist of a scheme, a colon (":"), and a
+ scheme-specific part whose interpretation depends on the scheme.
+
+ The scheme is the name of the protocol, such as "http", "ftp",
+ "mailto", or "telnet". The scheme consists of an initial letter,
+ followed by letters, numbers, and/or "+", "-", ".". Recognition is
+ limited to known schemes, per the W3C's `Index of WWW Addressing
+ Schemes`_.
+
+ The scheme-specific part of the resource identifier may be either
+ hierarchical or opaque:
+
+ - Hierarchical identifiers begin with one or two slashes and may
+ use slashes to separate hierarchical components of the path.
+ Examples are web pages and FTP sites::
+
+ http://www.python.org
+
+ ftp://ftp.python.org/pub/python
+
+ - Opaque identifiers do not begin with slashes. Examples are
+ email addresses and newsgroups::
+
+ mailto:someone@somewhere.com
+
+ news:comp.lang.python
+
+ With queries, fragments, and %-escape sequences, URIs can become
+ quite complicated. A reStructuredText parser must be able to
+ recognize any absolute URI, as defined in RFC2396_ and RFC2732_.
+
+2. Standalone email addresses, which are treated as if they were
+ ablsolute URIs with a "mailto:" scheme. Example::
+
+ someone@somewhere.com
+
+Punctuation at the end of a URI is not considered part of the URI.
+
+.. [#URI] Uniform Resource Identifier. URIs are a general form of
+ URLs (Uniform Resource Locators). For the syntax of URIs see
+ RFC2396_ and RFC2732_.
+
+
+----------------
+ Error Handling
+----------------
+
+DTD element: system_message, problematic.
+
+Markup errors are handled according to the specification in `PEP
+258`_.
+
+
+.. _reStructuredText: http://docutils.sourceforge.net/rst.html
+.. _Docutils: http://docutils.sourceforge.net/
+.. _Docutils Document Tree Structure:
+ http://docutils.sourceforge.net/spec/doctree.txt
+.. _Generic Plaintext Document Interface DTD:
+ http://docutils.sourceforge.net/spec/gpdi.dtd
+.. _transforms:
+ http://docutils.sourceforge.net/docutils/transforms/
+.. _Grouch: http://www.mems-exchange.org/software/grouch/
+.. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt
+.. _DocTitle transform:
+.. _DocInfo transform:
+ http://docutils.sourceforge.net/docutils/transforms/frontmatter.py
+.. _doctest module:
+ http://www.python.org/doc/current/lib/module-doctest.html
+.. _getopt.py:
+ http://www.python.org/doc/current/lib/module-getopt.html
+.. _GNU libc getopt_long():
+ http://www.gnu.org/manual/glibc-2.2.3/html_node/libc_516.html
+.. _Index of WWW Addressing Schemes:
+ http://www.w3.org/Addressing/schemes.html
+.. _World Wide Web Consortium: http://www.w3.org/
+.. _HTML Techniques for Web Content Accessibility Guidelines:
+ http://www.w3.org/TR/WCAG10-HTML-TECHS/#link-text
+.. _reStructuredText Directives: directives.html
+.. _Python Source Reader:
+ http://docutils.sourceforge.net/spec/pysource.txt
+.. _RFC2396: http://www.rfc-editor.org/rfc/rfc2396.txt
+.. _RFC2732: http://www.rfc-editor.org/rfc/rfc2732.txt
+.. _Zope: http://www.zope.com/
+.. _PEP 258: http://docutils.sourceforge.net/spec/pep-0258.txt
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/ref/soextblx.dtd b/docs/ref/soextblx.dtd
new file mode 100644
index 000000000..56ba311ba
--- /dev/null
+++ b/docs/ref/soextblx.dtd
@@ -0,0 +1,312 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/user/rst/images/ball1.gif b/docs/user/rst/images/ball1.gif
new file mode 100644
index 000000000..3e14441d9
Binary files /dev/null and b/docs/user/rst/images/ball1.gif differ
diff --git a/docs/user/rst/images/biohazard.bmp b/docs/user/rst/images/biohazard.bmp
new file mode 100644
index 000000000..aceb52948
Binary files /dev/null and b/docs/user/rst/images/biohazard.bmp differ
diff --git a/docs/user/rst/images/biohazard.gif b/docs/user/rst/images/biohazard.gif
new file mode 100644
index 000000000..7e1ea34ed
Binary files /dev/null and b/docs/user/rst/images/biohazard.gif differ
diff --git a/docs/user/rst/images/biohazard.png b/docs/user/rst/images/biohazard.png
new file mode 100644
index 000000000..ae4629d8b
Binary files /dev/null and b/docs/user/rst/images/biohazard.png differ
diff --git a/docs/user/rst/images/title.png b/docs/user/rst/images/title.png
new file mode 100644
index 000000000..cc6218efe
Binary files /dev/null and b/docs/user/rst/images/title.png differ
diff --git a/docs/user/rst/quickref.html b/docs/user/rst/quickref.html
new file mode 100644
index 000000000..886a02107
--- /dev/null
+++ b/docs/user/rst/quickref.html
@@ -0,0 +1,1096 @@
+
+
+
+ Quick reStructuredText
+
+
+
+
+
Asterisk, backquote, vertical bar, and underscore are inline
+ delimiter characters. Asterisk, backquote, and vertical bar act
+ like quote marks; matching characters surround the marked-up word
+ or phrase, whitespace or other quoting is required outside them,
+ and there can't be whitespace just inside them. If you want to use
+ inline delimiter characters literally, escape
+ (with backslash) or quote them (with double backquotes; i.e.
+ use inline literals).
+
+
In detail, the reStructuredText specifications says that in
+ inline markup:
+
+
The start-string must start a text block or be
+ immediately preceded by whitespace,
+ ' " ( [ { or <.
+
The start-string must be immediately followed by non-whitespace.
+
The end-string must be immediately preceded by non-whitespace.
+
The end-string must end a text block or be immediately
+ followed by whitespace,
+ ' " . , : ; ! ? - ) ] } or >.
+
If a start-string is immediately preceded by one of
+ ' " ( [ { or <, it must not be
+ immediately followed by the corresponding character from
+ ' " ) ] } or >.
+
An end-string must be separated by at least one
+ character from the start-string.
+
An unescaped backslash preceding a start-string or end-string will
+ disable markup recognition, except for the end-string of inline
+ literals.
+
+
+
Also remember that inline markup may not be nested (well,
+ except that inline literals can contain any of the other inline
+ markup delimiter characters, but that doesn't count because
+ nothing is processed).
+
+
reStructuredText uses backslashes ("\") to override the special
+ meaning given to markup characters and get the literal characters
+ themselves. To get a literal backslash, use an escaped backslash
+ ("\\"). For example:
+
+
+
+
+
Raw reStructuredText
+
Typical result
+
+
+
+ *escape* ``with`` "\"
+
escapewith ""
+
+ \*escape* \``with`` "\\"
+
*escape* ``with`` "\"
+
+
+
In Python strings it will, of course, be necessary
+ to escape any backslash characters so that they actually
+ reach reStructuredText.
+ The simplest way to do this is to use raw strings:
+
+
3. This is the first item
+ 4. This is the second item
+ 5. Enumerators are arabic numbers,
+ single letters, or roman numerals
+ 6. List items should be sequentially
+ numbered, but need not start at 1
+ (although not all formatters will
+ honour the first index).
+
Enumerated lists:
+
+
This is the first item
+
This is the second item
+
Enumerators are arabic numbers, single letters,
+ or roman numerals
+
List items should be sequentially numbered,
+ but need not start at 1 (although not all
+ formatters will honour the first index).
+
+Definition lists:
+
+ what
+ Definition lists associate a term with
+ a definition.
+
+ how
+ The term is a one-line phrase, and the
+ definition is one or more paragraphs or
+ body elements, indented relative to the
+ term. Blank lines are not allowed
+ between term and definition.
+
Definition lists:
+
+
what
+
Definition lists associate a term with
+ a definition.
+
+
how
+
The term is a one-line phrase, and the
+ definition is one or more paragraphs or
+ body elements, indented relative to the
+ term. Blank lines are not allowed
+ between term and definition.
+
+-a command-line option "a"
+ -b file options can have arguments
+ and long descriptions
+ --long options can be long also
+ --input=file long options can also have
+ arguments
+ /V DOS/VMS-style options too
+
+
+
+
+
+
+
-a
+
command-line option "a"
+
+
-b file
+
options can have arguments and long descriptions
+
+
--long
+
options can be long also
+
+
--input=file
+
long options can also have arguments
+
+
/V
+
DOS/VMS-style options too
+
+
+
+
There must be at least two spaces between the option and the
+ description.
+
+
+A paragraph containing only two colons
+ indicates that the following indented
+ text is a literal block.
+
+ ::
+
+ Whitespace, newlines, blank lines, and
+ all kinds of markup (like *this* or
+ \this) is preserved by literal blocks.
+
+ The paragraph containing only '::'
+ will be omitted from the result.
+
+ The ``::`` may be tacked onto the very
+ end of any paragraph. The ``::`` will be
+ omitted if it is preceded by whitespace.
+ The ``::`` will be converted to a single
+ colon if preceded by text, like this::
+
+ It's very convenient to use this form.
+
+ Literal blocks end when text returns to
+ the preceding paragraph's indentation.
+ This means that something like::
+
+ We start here
+ and continue here
+ and end here.
+
+ is possible.
+
+
+
A paragraph containing only two colons
+indicates that the following indented
+text is a literal block.
+
+
+ Whitespace, newlines, blank lines, and
+ all kinds of markup (like *this* or
+ \this) is preserved by literal blocks.
+
+ The paragraph containing only '::'
+ will be omitted from the result.
+
+
The :: may be tacked onto the very
+end of any paragraph. The :: will be
+omitted if it is preceded by whitespace.
+The :: will be converted to a single
+colon if preceded by text, like this:
+
+
+ It's very convenient to use this form.
+
+
Literal blocks end when text returns to
+the preceding paragraph's indentation.
+This means that something like:
+
+
+ We start here
+ and continue here
+ and end here.
Doctest blocks are interactive
+ Python sessions. They begin with
+ "``>>>``" and end with a blank line.
+
+
>>> print "This is a doctest block."
+ This is a doctest block.
+
+
+
Doctest blocks are interactive
+ Python sessions. They begin with
+ ">>>" and end with a blank line.
+
+
>>> print "This is a doctest block."
+ This is a doctest block.
+
+
+
"The doctest
+ module searches a module's docstrings for text that looks like an
+ interactive Python session, then executes all such sessions to
+ verify they still work exactly as shown." (From the doctest docs.)
+
+
+A transition marker is a horizontal line
+ of 4 or more repeated punctuation
+ characters.
+
+
------------
+
+
A transition should not begin or end a
+ section or document, nor should two
+ transitions be immediately adjacent.
+
+
+
A transition marker is a horizontal line
+ of 4 or more repeated punctuation
+ characters.
+
+
+
+
A transition should not begin or end a
+ section or document, nor should two
+ transitions be immediately adjacent.
+
+
+
Transitions are commonly seen in novels and short fiction, as a
+ gap spanning one or more lines, marking text divisions or
+ signaling changes in subject, time, point of view, or emphasis.
+
+
The numbering of auto-numbered footnotes is determined by the
+ order of the footnotes, not of the references. For auto-numbered
+ footnote references without autonumber labels
+ ("[#]_"), the references and footnotes must be in the
+ same relative order. Similarly for auto-symbol footnotes
+ ("[*]_").
+
+
"Fold-in" is the representation typically used in HTML
+ documents (think of the indirect hyperlink being "folded in" like
+ ingredients into a cake), and "call-out" is more suitable for
+ printed documents, where the link needs to be presented explicitly, for
+ example as a footnote.
+
+
The second hyperlink target (the line beginning with
+ "__") is both an indirect hyperlink target
+ (indirectly pointing at the Python website via the
+ "Python_" reference) and an anonymous hyperlink
+ target. In the text, a double-underscore suffix is used to
+ indicate an anonymous hyperlink reference.
+
+
Section titles, footnotes, and citations automatically generate
+ hyperlink targets (the title text or footnote/citation label is
+ used as the hyperlink name).
+
+
+
Plain text
+
Typical result
+
+
+
+
+
+ Titles are targets, too
+ =======================
+ Implict references, like `Titles are
+ targets, too`_.
+
+
+
+
diff --git a/docs/user/rst/quickstart.txt b/docs/user/rst/quickstart.txt
new file mode 100644
index 000000000..be9139d60
--- /dev/null
+++ b/docs/user/rst/quickstart.txt
@@ -0,0 +1,301 @@
+A ReStructuredText Primer
+=========================
+
+:Author: Richard Jones
+:Version: $Revision$
+
+The text below contains links that look like "(quickref__)". These
+are relative links that point to the `Quick reStructuredText`_ user
+reference. If these links don't work, please refer to the `master
+quick reference`_ document.
+
+__
+.. _Quick reStructuredText: quickref.html
+.. _master quick reference:
+ http://docutils.sourceforge.net/docs/rst/quickref.html
+
+
+Structure
+---------
+
+From the outset, let me say that "Structured Text" is probably a bit
+of a misnomer. It's more like "Relaxed Text" that uses certain
+consistent patterns. These patterns are interpreted by a HTML
+converter to produce "Very Structured Text" that can be used by a web
+browser.
+
+The most basic pattern recognised is a **paragraph** (quickref__).
+That's a chunk of text that is separated by blank lines (one is
+enough). Paragraphs must have the same indentation -- that is, line
+up at their left edge. Paragraphs that start indented will result in
+indented quote paragraphs. For example::
+
+ This is a paragraph. It's quite
+ short.
+
+ This paragraph will result in an indented block of
+ text, typically used for quoting other text.
+
+ This is another one.
+
+Results in:
+
+ This is a paragraph. It's quite
+ short.
+
+ This paragraph will result in an indented block of
+ text, typically used for quoting other text.
+
+ This is another one.
+
+__ quickref.html#paragraphs
+
+Text styles
+-----------
+
+(quickref__)
+
+__ quickref.html#inline-markup
+
+Inside paragraphs and other bodies of text, you may additionally mark
+text for *italics* with "``*italics*``" or **bold** with
+"``**bold**``".
+
+If you want something to appear as a fixed-space literal, use
+"````double back-quotes````". Note that no further fiddling is done
+inside the double back-quotes -- so asterisks "``*``" etc. are left
+alone.
+
+If you find that you want to use one of the "special" characters in
+text, it will generally be OK -- ReST is pretty smart. For example,
+this * asterisk is handled just fine. If you actually want text
+\*surrounded by asterisks* to **not** be italicised, then you need to
+indicate that the asterisk is not special. You do this by placing a
+backslash just before it, like so "``\*``" (quickref__).
+
+__ quickref.html#escaping
+
+Lists
+-----
+
+Lists of items come in three main flavours: **enumerated**,
+**bulleted** and **definitions**. In all list cases, you may have as
+many paragraphs, sublists, etc. as you want, as long as the left-hand
+side of the paragraph or whatever aligns with the first line of text
+in the list item.
+
+Lists must always start a new paragraph -- that is, they must appear
+after a blank line.
+
+**enumerated** lists (numbers, letters or roman numerals; quickref__)
+ __ quickref.html#enumerated-lists
+
+ Start a line off with a number or letter followed by a period ".",
+ right bracket ")" or surrounded by brackets "( )" -- whatever you're
+ comfortable with. All of the following forms are recognised::
+
+ 1. numbers
+
+ A. upper-case letters
+ and it goes over many lines
+
+ with two paragraphs and all!
+
+ a. lower-case letters
+
+ 3. with a sub-list starting at a different number
+ 4. make sure the numbers are in the correct sequence though!
+
+ I. upper-case roman numerals
+
+ i. lower-case roman numerals
+
+ (1) numbers again
+
+ 1) and again
+
+ Results in (note: the different enumerated list styles are not
+ always supported by every web browser, so you may not get the full
+ effect here):
+
+ 1. numbers
+
+ A. upper-case letters
+ and it goes over many lines
+
+ with two paragraphs and all!
+
+ a. lower-case letters
+
+ 3. with a sub-list starting at a different number
+ 4. make sure the numbers are in the correct sequence though!
+
+ I. upper-case roman numerals
+
+ i. lower-case roman numerals
+
+ (1) numbers again
+
+ 1) and again
+
+**bulleted** lists (quickref__)
+ __ quickref.html#bullet-lists
+
+ Just like enumerated lists, start the line off with a bullet point
+ character - either "-", "+" or "*"::
+
+ * a bullet point using "*"
+
+ - a sub-list using "-"
+
+ + yet another sub-list
+
+ - another item
+
+ Results in:
+
+ * a bullet point using "*"
+
+ - a sub-list using "-"
+
+ + yet another sub-list
+
+ - another item
+
+**definition** lists (quickref__)
+ __ quickref.html#definition-lists
+
+ Unlike the other two, the definition lists consist of a term, and
+ the definition of that term. The format of a definition list is::
+
+ what
+ Definition lists associate a term with a definition.
+
+ *how*
+ The term is a one-line phrase, and the definition is one or more
+ paragraphs or body elements, indented relative to the term.
+ Blank lines are not allowed between term and definition.
+
+ Results in:
+
+ what
+ Definition lists associate a term with a definition.
+
+ *how*
+ The term is a one-line phrase, and the definition is one or more
+ paragraphs or body elements, indented relative to the term.
+ Blank lines are not allowed between term and definition.
+
+Preformatting (code samples)
+----------------------------
+(quickref__)
+
+__ quickref.html#literal-blocks
+
+To just include a chunk of preformatted, never-to-be-fiddled-with
+text, finish the prior paragraph with "``::``". The preformatted
+block is finished when the text falls back to the same indentation
+level as a paragraph prior to the preformatted block. For example::
+
+ An example::
+
+ Whitespace, newlines, blank lines, and all kinds of markup
+ (like *this* or \this) is preserved by literal blocks.
+ Lookie here, I've dropped an indentation level
+ (but not far enough)
+
+ no more example
+
+Results in:
+
+ An example::
+
+ Whitespace, newlines, blank lines, and all kinds of markup
+ (like *this* or \this) is preserved by literal blocks.
+ Lookie here, I've dropped an indentation level
+ (but not far enough)
+
+ no more example
+
+Note that if a paragraph consists only of "``::``", then it's removed
+from the output::
+
+ ::
+
+ This is preformatted text, and the
+ last "::" paragraph is removed
+
+Results in:
+
+::
+
+ This is preformatted text, and the
+ last "::" paragraph is removed
+
+Sections
+--------
+
+(quickref__)
+
+__ quickref.html#section-structure
+
+To break longer text up into sections, you use **section headers**.
+These are a single line of text (one or more words) with an underline
+(and optionally an overline) in dashes "``-----``", equals
+"``======``", tildes "``~~~~~~``" or any of the non-alphanumeric
+characters ``= - ` : ' " ~ ^ _ * + # < >`` that you feel comfortable
+with. Be consistent though, since all sections marked with the same
+underline style are deemed to be at the same level::
+
+ Chapter 1 Title
+ ===============
+
+ Section 1.1 Title
+ -----------------
+
+ Subsection 1.1.1 Title
+ ~~~~~~~~~~~~~~~~~~~~~~
+
+ Section 1.2 Title
+ -----------------
+
+ Chapter 2 Title
+ ===============
+
+results in:
+
+.. sorry, I change the heading style here, but it's only an example :)
+
+Chapter 1 Title
+~~~~~~~~~~~~~~~
+
+Section 1.1 Title
+'''''''''''''''''
+
+Subsection 1.1.1 Title
+""""""""""""""""""""""
+
+Section 1.2 Title
+'''''''''''''''''
+
+Chapter 2 Title
+~~~~~~~~~~~~~~~
+
+Note that section headers are available as link targets, just using
+their name. To link to the Lists_ heading, I write "``Lists_``". If
+the heading has a space in it like `text styles`_, we need to quote
+the heading "```text styles`_``".
+
+What Next?
+----------
+
+This primer introduces the most common features of reStructuredText,
+but there are a lot more to explore. The `Quick reStructuredText`_
+user reference is a good place to go next. For complete details, the
+`reStructuredText Markup Specification`_ is the place to go [#]_.
+
+.. _reStructuredText Markup Specification:
+ ../../spec/rst/reStructuredText.html
+
+.. [#] If that relative link doesn't work, try the master document:
+ http://docutils.sourceforge.net/spec/rst/reStructuredText.html.
diff --git a/docutils/__init__.py b/docutils/__init__.py
new file mode 100644
index 000000000..0ee88d94a
--- /dev/null
+++ b/docutils/__init__.py
@@ -0,0 +1,51 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+This is the Docutils (Python Documentation Utilities) package.
+
+Package Structure
+=================
+
+Modules:
+
+- __init__.py: Contains the package docstring only (this text).
+
+- core.py: Contains the ``Publisher`` class and ``publish()`` convenience
+ function.
+
+- nodes.py: DPS document tree (doctree) node class library.
+
+- roman.py: Conversion to and from Roman numerals. Courtesy of Mark
+ Pilgrim (http://diveintopython.org/).
+
+- statemachine.py: A finite state machine specialized for
+ regular-expression-based text filters.
+
+- urischemes.py: Contains a complete mapping of known URI addressing
+ scheme names to descriptions.
+
+- utils.py: Contains the ``Reporter`` system warning class and miscellaneous
+ utilities.
+
+Subpackages:
+
+- languages: Language-specific mappings of terms.
+
+- parsers: Syntax-specific input parser modules or packages.
+
+- readers: Context-specific input handlers which understand the data
+ source and manage a parser.
+
+- transforms: Modules used by readers and writers to modify DPS
+ doctrees.
+
+- writers: Format-specific output translators.
+"""
+
+__docformat__ = 'reStructuredText'
diff --git a/docutils/core.py b/docutils/core.py
new file mode 100644
index 000000000..b553b07b7
--- /dev/null
+++ b/docutils/core.py
@@ -0,0 +1,85 @@
+#! /usr/bin/env python
+
+"""
+:Authors: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+import readers, parsers, writers, utils
+
+
+class Publisher:
+
+ """
+ Publisher encapsulates the high-level logic of a Docutils system.
+ """
+
+ reporter = None
+ """A `utils.Reporter` instance used for all document processing."""
+
+ def __init__(self, reader=None, parser=None, writer=None, reporter=None,
+ languagecode='en', warninglevel=2, errorlevel=4,
+ warningstream=None, debug=0):
+ """
+ Initial setup. If any of `reader`, `parser`, or `writer` are
+ not specified, the corresponding 'set*' method should be
+ called.
+ """
+ self.reader = reader
+ self.parser = parser
+ self.writer = writer
+ if not reporter:
+ reporter = utils.Reporter(warninglevel, errorlevel, warningstream,
+ debug)
+ self.reporter = reporter
+ self.languagecode = languagecode
+
+ def setreader(self, readername, languagecode=None):
+ """Set `self.reader` by name."""
+ readerclass = readers.get_reader_class(readername)
+ self.reader = readerclass(self.reporter,
+ languagecode or self.languagecode)
+
+ def setparser(self, parsername):
+ """Set `self.parser` by name."""
+ parserclass = parsers.get_parser_class(parsername)
+ self.parser = parserclass()
+
+ def setwriter(self, writername):
+ """Set `self.writer` by name."""
+ writerclass = writers.get_writer_class(writername)
+ self.writer = writerclass()
+
+ def publish(self, source, destination):
+ """
+ Run `source` through `self.reader`, then through `self.writer` to
+ `destination`.
+ """
+ document = self.reader.read(source, self.parser)
+ self.writer.write(document, destination)
+
+
+def publish(source=None, destination=None,
+ reader=None, readername='standalone',
+ parser=None, parsername='restructuredtext',
+ writer=None, writername='pprint',
+ reporter=None, languagecode='en',
+ warninglevel=2, errorlevel=4, warningstream=None, debug=0):
+ """Set up & run a `Publisher`."""
+ pub = Publisher(reader, parser, writer, reporter, languagecode,
+ warninglevel, errorlevel, warningstream, debug)
+ if reader is None:
+ pub.setreader(readername)
+ if parser is None:
+ pub.setparser(parsername)
+ if writer is None:
+ pub.setwriter(writername)
+ pub.publish(source, destination)
diff --git a/docutils/languages/__init__.py b/docutils/languages/__init__.py
new file mode 100644
index 000000000..4c10d9124
--- /dev/null
+++ b/docutils/languages/__init__.py
@@ -0,0 +1,22 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+This package contains modules for language-dependent features of Docutils.
+"""
+
+__docformat__ = 'reStructuredText'
+
+_languages = {}
+
+def getlanguage(languagecode):
+ if _languages.has_key(languagecode):
+ return _languages[languagecode]
+ module = __import__(languagecode, globals(), locals())
+ _languages[languagecode] = module
+ return module
diff --git a/docutils/languages/en.py b/docutils/languages/en.py
new file mode 100644
index 000000000..5b97dadb7
--- /dev/null
+++ b/docutils/languages/en.py
@@ -0,0 +1,58 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+English-language mappings for language-dependent features of Docutils.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+from docutils import nodes
+
+
+labels = {
+ 'author': 'Author',
+ 'authors': 'Authors',
+ 'organization': 'Organization',
+ 'contact': 'Contact',
+ 'version': 'Version',
+ 'revision': 'Revision',
+ 'status': 'Status',
+ 'date': 'Date',
+ 'copyright': 'Copyright',
+ 'abstract': 'Abstract',
+ 'attention': 'Attention!',
+ 'caution': 'Caution!',
+ 'danger': '!DANGER!',
+ 'error': 'Error',
+ 'hint': 'Hint',
+ 'important': 'Important',
+ 'note': 'Note',
+ 'tip': 'Tip',
+ 'warning': 'Warning',
+ 'contents': 'Contents'}
+"""Mapping of node class name to label text."""
+
+bibliographic_fields = {
+ 'author': nodes.author,
+ 'authors': nodes.authors,
+ 'organization': nodes.organization,
+ 'contact': nodes.contact,
+ 'version': nodes.version,
+ 'revision': nodes.revision,
+ 'status': nodes.status,
+ 'date': nodes.date,
+ 'copyright': nodes.copyright,
+ 'abstract': nodes.topic}
+"""Field name (lowcased) to node class name mapping for bibliographic fields
+(field_list)."""
+
+author_separators = [';', ',']
+"""List of separator strings for the 'Authors' bibliographic field. Tried in
+order."""
diff --git a/docutils/nodes.py b/docutils/nodes.py
new file mode 100644
index 000000000..ece182c85
--- /dev/null
+++ b/docutils/nodes.py
@@ -0,0 +1,1112 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Docutils document tree element class library.
+
+Classes in CamelCase are abstract base classes or auxiliary classes. The one
+exception is `Text`, for a text node; uppercase is used to differentiate from
+element classes.
+
+Classes in lower_case_with_underscores are element classes, matching the XML
+element generic identifiers in the DTD_.
+
+.. _DTD: http://docstring.sourceforge.net/spec/gpdi.dtd
+"""
+
+import sys, os
+import xml.dom.minidom
+from types import IntType, SliceType, StringType, TupleType, ListType
+from UserString import MutableString
+import utils
+import docutils
+
+
+# ==============================
+# Functional Node Base Classes
+# ==============================
+
+class Node:
+
+ """Abstract base class of nodes in a document tree."""
+
+ parent = None
+ """Back-reference to the `Node` containing this `Node`."""
+
+ def __nonzero__(self):
+ """Node instances are always true."""
+ return 1
+
+ def asdom(self, dom=xml.dom.minidom):
+ """Return a DOM representation of this Node."""
+ return self._dom_node(dom)
+
+ def pformat(self, indent=' ', level=0):
+ """Return an indented pseudo-XML representation, for test purposes."""
+ raise NotImplementedError
+
+ def walk(self, visitor):
+ """
+ Traverse a tree of `Node` objects, calling ``visit_...`` methods of
+ `visitor` when entering each node. If there is no
+ ``visit_particular_node`` method for a node of type
+ ``particular_node``, the ``unknown_visit`` method is called.
+
+ Doesn't handle arbitrary modification in-place during the traversal.
+ Replacing one element with one element is OK.
+
+ Parameter `visitor`: A `NodeVisitor` object, containing a
+ ``visit_...`` method for each `Node` subclass encountered.
+ """
+ name = 'visit_' + self.__class__.__name__
+ method = getattr(visitor, name, visitor.unknown_visit)
+ visitor.doctree.reporter.debug(name, category='nodes.Node.walk')
+ try:
+ method(self)
+ children = self.getchildren()
+ try:
+ for i in range(len(children)):
+ children[i].walk(visitor)
+ except SkipSiblings:
+ pass
+ except (SkipChildren, SkipNode):
+ pass
+
+ def walkabout(self, visitor):
+ """
+ Perform a tree traversal similarly to `Node.walk()`, except also call
+ ``depart_...`` methods before exiting each node. If there is no
+ ``depart_particular_node`` method for a node of type
+ ``particular_node``, the ``unknown_departure`` method is called.
+
+ Parameter `visitor`: A `NodeVisitor` object, containing ``visit_...``
+ and ``depart_...`` methods for each `Node` subclass encountered.
+ """
+ name = 'visit_' + self.__class__.__name__
+ method = getattr(visitor, name, visitor.unknown_visit)
+ visitor.doctree.reporter.debug(name, category='nodes.Node.walkabout')
+ try:
+ method(self)
+ children = self.getchildren()
+ try:
+ for i in range(len(children)):
+ children[i].walkabout(visitor)
+ except SkipSiblings:
+ pass
+ except SkipChildren:
+ pass
+ except SkipNode:
+ return
+ name = 'depart_' + self.__class__.__name__
+ method = getattr(visitor, name, visitor.unknown_departure)
+ visitor.doctree.reporter.debug(name, category='nodes.Node.walkabout')
+ method(self)
+
+
+class Text(Node, MutableString):
+
+ tagname = '#text'
+
+ def __repr__(self):
+ data = repr(self.data)
+ if len(data) > 70:
+ data = repr(self.data[:64] + ' ...')
+ return '<%s: %s>' % (self.tagname, data)
+
+ def shortrepr(self):
+ data = repr(self.data)
+ if len(data) > 20:
+ data = repr(self.data[:16] + ' ...')
+ return '<%s: %s>' % (self.tagname, data)
+
+ def _dom_node(self, dom):
+ return dom.Text(self.data)
+
+ def _rooted_dom_node(self, domroot):
+ return domroot.createTextNode(self.data)
+
+ def astext(self):
+ return self.data
+
+ def pformat(self, indent=' ', level=0):
+ result = []
+ indent = indent * level
+ for line in self.data.splitlines():
+ result.append(indent + line + '\n')
+ return ''.join(result)
+
+ def getchildren(self):
+ """Text nodes have no children. Return []."""
+ return []
+
+
+class Element(Node):
+
+ """
+ `Element` is the superclass to all specific elements.
+
+ Elements contain attributes and child nodes. Elements emulate dictionaries
+ for attributes, indexing by attribute name (a string). To set the
+ attribute 'att' to 'value', do::
+
+ element['att'] = 'value'
+
+ Elements also emulate lists for child nodes (element nodes and/or text
+ nodes), indexing by integer. To get the first child node, use::
+
+ element[0]
+
+ Elements may be constructed using the ``+=`` operator. To add one new
+ child node to element, do::
+
+ element += node
+
+ To add a list of multiple child nodes at once, use the same ``+=``
+ operator::
+
+ element += [node1, node2]
+ """
+
+ tagname = None
+ """The element generic identifier. If None, it is set as an instance
+ attribute to the name of the class."""
+
+ child_text_separator = '\n\n'
+ """Separator for child nodes, used by `astext()` method."""
+
+ def __init__(self, rawsource='', *children, **attributes):
+ self.rawsource = rawsource
+ """The raw text from which this element was constructed."""
+
+ self.children = []
+ """List of child nodes (elements and/or `Text`)."""
+
+ self.extend(children) # extend self.children w/ attributes
+
+ self.attributes = {}
+ """Dictionary of attribute {name: value}."""
+
+ for att, value in attributes.items():
+ self.attributes[att.lower()] = value
+
+ if self.tagname is None:
+ self.tagname = self.__class__.__name__
+
+ def _dom_node(self, dom):
+ element = dom.Element(self.tagname)
+ for attribute, value in self.attributes.items():
+ element.setAttribute(attribute, str(value))
+ for child in self.children:
+ element.appendChild(child._dom_node(dom))
+ return element
+
+ def _rooted_dom_node(self, domroot):
+ element = domroot.createElement(self.tagname)
+ for attribute, value in self.attributes.items():
+ element.setAttribute(attribute, str(value))
+ for child in self.children:
+ element.appendChild(child._rooted_dom_node(domroot))
+ return element
+
+ def __repr__(self):
+ data = ''
+ for c in self.children:
+ data += c.shortrepr()
+ if len(data) > 60:
+ data = data[:56] + ' ...'
+ break
+ if self.hasattr('name'):
+ return '<%s "%s": %s>' % (self.__class__.__name__,
+ self.attributes['name'], data)
+ else:
+ return '<%s: %s>' % (self.__class__.__name__, data)
+
+ def shortrepr(self):
+ if self.hasattr('name'):
+ return '<%s "%s"...>' % (self.__class__.__name__,
+ self.attributes['name'])
+ else:
+ return '<%s...>' % self.tagname
+
+ def __str__(self):
+ if self.children:
+ return '%s%s%s' % (self.starttag(),
+ ''.join([str(c) for c in self.children]),
+ self.endtag())
+ else:
+ return self.emptytag()
+
+ def starttag(self):
+ parts = [self.tagname]
+ for name, value in self.attlist():
+ if value is None: # boolean attribute
+ parts.append(name)
+ elif isinstance(value, ListType):
+ values = [str(v) for v in value]
+ parts.append('%s="%s"' % (name, ' '.join(values)))
+ else:
+ parts.append('%s="%s"' % (name, str(value)))
+ return '<%s>' % ' '.join(parts)
+
+ def endtag(self):
+ return '%s>' % self.tagname
+
+ def emptytag(self):
+ return '<%s/>' % ' '.join([self.tagname] +
+ ['%s="%s"' % (n, v)
+ for n, v in self.attlist()])
+
+ def __len__(self):
+ return len(self.children)
+
+ def __getitem__(self, key):
+ if isinstance(key, StringType):
+ return self.attributes[key]
+ elif isinstance(key, IntType):
+ return self.children[key]
+ elif isinstance(key, SliceType):
+ assert key.step is None, 'cannot handle slice with stride'
+ return self.children[key.start:key.stop]
+ else:
+ raise TypeError, ('element index must be an integer, a slice, or '
+ 'an attribute name string')
+
+ def __setitem__(self, key, item):
+ if isinstance(key, StringType):
+ self.attributes[key] = item
+ elif isinstance(key, IntType):
+ item.parent = self
+ self.children[key] = item
+ elif isinstance(key, SliceType):
+ assert key.step is None, 'cannot handle slice with stride'
+ for node in item:
+ node.parent = self
+ self.children[key.start:key.stop] = item
+ else:
+ raise TypeError, ('element index must be an integer, a slice, or '
+ 'an attribute name string')
+
+ def __delitem__(self, key):
+ if isinstance(key, StringType):
+ del self.attributes[key]
+ elif isinstance(key, IntType):
+ del self.children[key]
+ elif isinstance(key, SliceType):
+ assert key.step is None, 'cannot handle slice with stride'
+ del self.children[key.start:key.stop]
+ else:
+ raise TypeError, ('element index must be an integer, a simple '
+ 'slice, or an attribute name string')
+
+ def __add__(self, other):
+ return self.children + other
+
+ def __radd__(self, other):
+ return other + self.children
+
+ def __iadd__(self, other):
+ """Append a node or a list of nodes to `self.children`."""
+ if isinstance(other, Node):
+ other.parent = self
+ self.children.append(other)
+ elif other is not None:
+ for node in other:
+ node.parent = self
+ self.children.extend(other)
+ return self
+
+ def astext(self):
+ return self.child_text_separator.join(
+ [child.astext() for child in self.children])
+
+ def attlist(self):
+ attlist = self.attributes.items()
+ attlist.sort()
+ return attlist
+
+ def get(self, key, failobj=None):
+ return self.attributes.get(key, failobj)
+
+ def hasattr(self, attr):
+ return self.attributes.has_key(attr)
+
+ def delattr(self, attr):
+ if self.attributes.has_key(attr):
+ del self.attributes[attr]
+
+ def setdefault(self, key, failobj=None):
+ return self.attributes.setdefault(key, failobj)
+
+ has_key = hasattr
+
+ def append(self, item):
+ item.parent = self
+ self.children.append(item)
+
+ def extend(self, item):
+ for node in item:
+ node.parent = self
+ self.children.extend(item)
+
+ def insert(self, i, item):
+ assert isinstance(item, Node)
+ item.parent = self
+ self.children.insert(i, item)
+
+ def pop(self, i=-1):
+ return self.children.pop(i)
+
+ def remove(self, item):
+ self.children.remove(item)
+
+ def index(self, item):
+ return self.children.index(item)
+
+ def replace(self, old, new):
+ """Replace one child `Node` with another child or children."""
+ index = self.index(old)
+ if isinstance(new, Node):
+ self[index] = new
+ elif new is not None:
+ self[index:index+1] = new
+
+ def findclass(self, childclass, start=0, end=sys.maxint):
+ """
+ Return the index of the first child whose class exactly matches.
+
+ Parameters:
+
+ - `childclass`: A `Node` subclass to search for, or a tuple of `Node`
+ classes. If a tuple, any of the classes may match.
+ - `start`: Initial index to check.
+ - `end`: Initial index to *not* check.
+ """
+ if not isinstance(childclass, TupleType):
+ childclass = (childclass,)
+ for index in range(start, min(len(self), end)):
+ for c in childclass:
+ if isinstance(self[index], c):
+ return index
+ return None
+
+ def findnonclass(self, childclass, start=0, end=sys.maxint):
+ """
+ Return the index of the first child whose class does *not* match.
+
+ Parameters:
+
+ - `childclass`: A `Node` subclass to skip, or a tuple of `Node`
+ classes. If a tuple, none of the classes may match.
+ - `start`: Initial index to check.
+ - `end`: Initial index to *not* check.
+ """
+ if not isinstance(childclass, TupleType):
+ childclass = (childclass,)
+ for index in range(start, min(len(self), end)):
+ match = 0
+ for c in childclass:
+ if isinstance(self.children[index], c):
+ match = 1
+ if not match:
+ return index
+ return None
+
+ def pformat(self, indent=' ', level=0):
+ return ''.join(['%s%s\n' % (indent * level, self.starttag())] +
+ [child.pformat(indent, level+1)
+ for child in self.children])
+
+ def getchildren(self):
+ """Return this element's children."""
+ return self.children
+
+
+class TextElement(Element):
+
+ """
+ An element which directly contains text.
+
+ Its children are all Text or TextElement nodes.
+ """
+
+ child_text_separator = ''
+ """Separator for child nodes, used by `astext()` method."""
+
+ def __init__(self, rawsource='', text='', *children, **attributes):
+ if text != '':
+ textnode = Text(text)
+ Element.__init__(self, rawsource, textnode, *children,
+ **attributes)
+ else:
+ Element.__init__(self, rawsource, *children, **attributes)
+
+
+# ========
+# Mixins
+# ========
+
+class Resolvable:
+
+ resolved = 0
+
+
+class BackLinkable:
+
+ def add_backref(self, refid):
+ self.setdefault('backrefs', []).append(refid)
+
+
+# ====================
+# Element Categories
+# ====================
+
+class Root: pass
+
+class Titular: pass
+
+class Bibliographic: pass
+
+
+class PreBibliographic:
+ """Category of Node which may occur before Bibliographic Nodes."""
+ pass
+
+
+class Structural: pass
+
+class Body: pass
+
+class General(Body): pass
+
+class Sequential(Body): pass
+
+class Admonition(Body): pass
+
+
+class Special(Body):
+ """Special internal body elements, not true document components."""
+ pass
+
+
+class Component: pass
+
+class Inline: pass
+
+class Referential(Resolvable): pass
+ #refnode = None
+ #"""Resolved reference to a node."""
+
+
+class Targetable(Resolvable):
+
+ referenced = 0
+
+
+# ==============
+# Root Element
+# ==============
+
+class document(Root, Structural, Element):
+
+ def __init__(self, reporter, languagecode, *args, **kwargs):
+ Element.__init__(self, *args, **kwargs)
+
+ self.reporter = reporter
+ """System message generator."""
+
+ self.languagecode = languagecode
+ """ISO 639 2-letter language identifier."""
+
+ self.explicit_targets = {}
+ """Mapping of target names to explicit target nodes."""
+
+ self.implicit_targets = {}
+ """Mapping of target names to implicit (internal) target
+ nodes."""
+
+ self.external_targets = []
+ """List of external target nodes."""
+
+ self.internal_targets = []
+ """List of internal target nodes."""
+
+ self.indirect_targets = []
+ """List of indirect target nodes."""
+
+ self.substitution_defs = {}
+ """Mapping of substitution names to substitution_definition nodes."""
+
+ self.refnames = {}
+ """Mapping of names to lists of referencing nodes."""
+
+ self.refids = {}
+ """Mapping of ids to lists of referencing nodes."""
+
+ self.nameids = {}
+ """Mapping of names to unique id's."""
+
+ self.ids = {}
+ """Mapping of ids to nodes."""
+
+ self.substitution_refs = {}
+ """Mapping of substitution names to lists of substitution_reference
+ nodes."""
+
+ self.footnote_refs = {}
+ """Mapping of footnote labels to lists of footnote_reference nodes."""
+
+ self.citation_refs = {}
+ """Mapping of citation labels to lists of citation_reference nodes."""
+
+ self.anonymous_targets = []
+ """List of anonymous target nodes."""
+
+ self.anonymous_refs = []
+ """List of anonymous reference nodes."""
+
+ self.autofootnotes = []
+ """List of auto-numbered footnote nodes."""
+
+ self.autofootnote_refs = []
+ """List of auto-numbered footnote_reference nodes."""
+
+ self.symbol_footnotes = []
+ """List of symbol footnote nodes."""
+
+ self.symbol_footnote_refs = []
+ """List of symbol footnote_reference nodes."""
+
+ self.footnotes = []
+ """List of manually-numbered footnote nodes."""
+
+ self.citations = []
+ """List of citation nodes."""
+
+ self.pending = []
+ """List of pending elements @@@."""
+
+ self.autofootnote_start = 1
+ """Initial auto-numbered footnote number."""
+
+ self.symbol_footnote_start = 0
+ """Initial symbol footnote symbol index."""
+
+ self.id_start = 1
+ """Initial ID number."""
+
+ self.messages = Element()
+ """System messages generated after parsing."""
+
+ def asdom(self, dom=xml.dom.minidom):
+ domroot = dom.Document()
+ domroot.appendChild(Element._rooted_dom_node(self, domroot))
+ return domroot
+
+ def set_id(self, node, msgnode=None):
+ if msgnode == None:
+ msgnode = self.messages
+ if node.has_key('id'):
+ id = node['id']
+ if self.ids.has_key(id) and self.ids[id] is not node:
+ msg = self.reporter.severe('Duplicate ID: "%s".' % id)
+ msgnode += msg
+ else:
+ if node.has_key('name'):
+ id = utils.id(node['name'])
+ else:
+ id = ''
+ while not id or self.ids.has_key(id):
+ id = 'id%s' % self.id_start
+ self.id_start += 1
+ node['id'] = id
+ self.ids[id] = node
+ if node.has_key('name'):
+ self.nameids[node['name']] = id
+ return id
+
+ def note_implicit_target(self, target, msgnode=None):
+ if msgnode == None:
+ msgnode = self.messages
+ id = self.set_id(target, msgnode)
+ name = target['name']
+ if self.explicit_targets.has_key(name) \
+ or self.implicit_targets.has_key(name):
+ msg = self.reporter.info(
+ 'Duplicate implicit target name: "%s".' % name, backrefs=[id])
+ msgnode += msg
+ self.clear_target_names(name, self.implicit_targets)
+ del target['name']
+ target['dupname'] = name
+ self.implicit_targets[name] = None
+ else:
+ self.implicit_targets[name] = target
+
+ def note_explicit_target(self, target, msgnode=None):
+ if msgnode == None:
+ msgnode = self.messages
+ id = self.set_id(target, msgnode)
+ name = target['name']
+ if self.explicit_targets.has_key(name):
+ level = 2
+ if target.has_key('refuri'): # external target, dups OK
+ refuri = target['refuri']
+ t = self.explicit_targets[name]
+ if t.has_key('name') and t.has_key('refuri') \
+ and t['refuri'] == refuri:
+ level = 1 # just inform if refuri's identical
+ msg = self.reporter.system_message(
+ level, 'Duplicate explicit target name: "%s".' % name,
+ backrefs=[id])
+ msgnode += msg
+ self.clear_target_names(name, self.explicit_targets,
+ self.implicit_targets)
+ if level > 1:
+ del target['name']
+ target['dupname'] = name
+ elif self.implicit_targets.has_key(name):
+ msg = self.reporter.info(
+ 'Duplicate implicit target name: "%s".' % name, backrefs=[id])
+ msgnode += msg
+ self.clear_target_names(name, self.implicit_targets)
+ self.explicit_targets[name] = target
+
+ def clear_target_names(self, name, *targetdicts):
+ for targetdict in targetdicts:
+ if not targetdict.has_key(name):
+ continue
+ node = targetdict[name]
+ if node.has_key('name'):
+ node['dupname'] = node['name']
+ del node['name']
+
+ def note_refname(self, node):
+ self.refnames.setdefault(node['refname'], []).append(node)
+
+ def note_refid(self, node):
+ self.refids.setdefault(node['refid'], []).append(node)
+
+ def note_external_target(self, target):
+ self.external_targets.append(target)
+
+ def note_internal_target(self, target):
+ self.internal_targets.append(target)
+
+ def note_indirect_target(self, target):
+ self.indirect_targets.append(target)
+ if target.has_key('name'):
+ self.note_refname(target)
+
+ def note_anonymous_target(self, target):
+ self.set_id(target)
+ self.anonymous_targets.append(target)
+
+ def note_anonymous_ref(self, ref):
+ self.anonymous_refs.append(ref)
+
+ def note_autofootnote(self, footnote):
+ self.set_id(footnote)
+ self.autofootnotes.append(footnote)
+
+ def note_autofootnote_ref(self, ref):
+ self.set_id(ref)
+ self.autofootnote_refs.append(ref)
+
+ def note_symbol_footnote(self, footnote):
+ self.set_id(footnote)
+ self.symbol_footnotes.append(footnote)
+
+ def note_symbol_footnote_ref(self, ref):
+ self.set_id(ref)
+ self.symbol_footnote_refs.append(ref)
+
+ def note_footnote(self, footnote):
+ self.set_id(footnote)
+ self.footnotes.append(footnote)
+
+ def note_footnote_ref(self, ref):
+ self.set_id(ref)
+ self.footnote_refs.setdefault(ref['refname'], []).append(ref)
+ self.note_refname(ref)
+
+ def note_citation(self, citation):
+ self.set_id(citation)
+ self.citations.append(citation)
+
+ def note_citation_ref(self, ref):
+ self.set_id(ref)
+ self.citation_refs.setdefault(ref['refname'], []).append(ref)
+ self.note_refname(ref)
+
+ def note_substitution_def(self, subdef, msgnode=None):
+ name = subdef['name']
+ if self.substitution_defs.has_key(name):
+ msg = self.reporter.error(
+ 'Duplicate substitution definition name: "%s".' % name)
+ if msgnode == None:
+ msgnode = self.messages
+ msgnode += msg
+ oldnode = self.substitution_defs[name]
+ oldnode['dupname'] = oldnode['name']
+ del oldnode['name']
+ # keep only the last definition
+ self.substitution_defs[name] = subdef
+
+ def note_substitution_ref(self, subref):
+ self.substitution_refs.setdefault(
+ subref['refname'], []).append(subref)
+
+ def note_pending(self, pending):
+ self.pending.append(pending)
+
+
+# ================
+# Title Elements
+# ================
+
+class title(Titular, PreBibliographic, TextElement): pass
+class subtitle(Titular, PreBibliographic, TextElement): pass
+
+
+# ========================
+# Bibliographic Elements
+# ========================
+
+class docinfo(Bibliographic, Element): pass
+class author(Bibliographic, TextElement): pass
+class authors(Bibliographic, Element): pass
+class organization(Bibliographic, TextElement): pass
+class contact(Bibliographic, TextElement): pass
+class version(Bibliographic, TextElement): pass
+class revision(Bibliographic, TextElement): pass
+class status(Bibliographic, TextElement): pass
+class date(Bibliographic, TextElement): pass
+class copyright(Bibliographic, TextElement): pass
+
+
+# =====================
+# Structural Elements
+# =====================
+
+class section(Structural, Element): pass
+
+class topic(Structural, Element):
+
+ """
+ Topics are terminal, "leaf" mini-sections, like block quotes with titles,
+ or textual figures. A topic is just like a section, except that it has no
+ subsections, and it doesn't have to conform to section placement rules.
+
+ Topics are allowed wherever body elements (list, table, etc.) are allowed,
+ but only at the top level of a section or document. Topics cannot nest
+ inside topics or body elements; you can't have a topic inside a table,
+ list, block quote, etc.
+ """
+
+ pass
+
+
+class transition(Structural, Element): pass
+
+
+# ===============
+# Body Elements
+# ===============
+
+class paragraph(General, TextElement): pass
+class bullet_list(Sequential, Element): pass
+class enumerated_list(Sequential, Element): pass
+class list_item(Component, Element): pass
+class definition_list(Sequential, Element): pass
+class definition_list_item(Component, Element): pass
+class term(Component, TextElement): pass
+class classifier(Component, TextElement): pass
+class definition(Component, Element): pass
+class field_list(Sequential, Element): pass
+class field(Component, Element): pass
+class field_name(Component, TextElement): pass
+class field_argument(Component, TextElement): pass
+class field_body(Component, Element): pass
+
+
+class option(Component, Element):
+
+ child_text_separator = ''
+
+
+class option_argument(Component, TextElement):
+
+ def astext(self):
+ return self.get('delimiter', ' ') + TextElement.astext(self)
+
+
+class option_group(Component, Element):
+
+ child_text_separator = ', '
+
+
+class option_list(Sequential, Element): pass
+
+
+class option_list_item(Component, Element):
+
+ child_text_separator = ' '
+
+
+class option_string(Component, TextElement): pass
+class description(Component, Element): pass
+class literal_block(General, TextElement): pass
+class block_quote(General, Element): pass
+class doctest_block(General, TextElement): pass
+class attention(Admonition, Element): pass
+class caution(Admonition, Element): pass
+class danger(Admonition, Element): pass
+class error(Admonition, Element): pass
+class important(Admonition, Element): pass
+class note(Admonition, Element): pass
+class tip(Admonition, Element): pass
+class hint(Admonition, Element): pass
+class warning(Admonition, Element): pass
+class comment(Special, PreBibliographic, TextElement): pass
+class substitution_definition(Special, TextElement): pass
+class target(Special, Inline, TextElement, Targetable): pass
+class footnote(General, Element, BackLinkable): pass
+class citation(General, Element, BackLinkable): pass
+class label(Component, TextElement): pass
+class figure(General, Element): pass
+class caption(Component, TextElement): pass
+class legend(Component, Element): pass
+class table(General, Element): pass
+class tgroup(Component, Element): pass
+class colspec(Component, Element): pass
+class thead(Component, Element): pass
+class tbody(Component, Element): pass
+class row(Component, Element): pass
+class entry(Component, Element): pass
+
+
+class system_message(Special, PreBibliographic, Element, BackLinkable):
+
+ def __init__(self, comment=None, *children, **attributes):
+ if comment:
+ p = paragraph('', comment)
+ children = (p,) + children
+ Element.__init__(self, '', *children, **attributes)
+
+ def astext(self):
+ return '%s (%s) %s' % (self['type'], self['level'],
+ Element.astext(self))
+
+
+class pending(Special, PreBibliographic, Element):
+
+ """
+ The "pending" element is used to encapsulate a pending operation: the
+ operation, the point at which to apply it, and any data it requires. Only
+ the pending operation's location within the document is stored in the
+ public document tree; the operation itself and its data are stored in
+ internal instance attributes.
+
+ For example, say you want a table of contents in your reStructuredText
+ document. The easiest way to specify where to put it is from within the
+ document, with a directive::
+
+ .. contents::
+
+ But the "contents" directive can't do its work until the entire document
+ has been parsed (and possibly transformed to some extent). So the
+ directive code leaves a placeholder behind that will trigger the second
+ phase of the its processing, something like this::
+
+ + internal attributes
+
+ The "pending" node is also appended to `document.pending`, so that a later
+ stage of processing can easily run all pending transforms.
+ """
+
+ def __init__(self, transform, stage, details,
+ rawsource='', *children, **attributes):
+ Element.__init__(self, rawsource, *children, **attributes)
+
+ self.transform = transform
+ """The `docutils.transforms.Transform` class implementing the pending
+ operation."""
+
+ self.stage = stage
+ """The stage of processing when the function will be called."""
+
+ self.details = details
+ """Detail data (dictionary) required by the pending operation."""
+
+ def pformat(self, indent=' ', level=0):
+ internals = [
+ '.. internal attributes:',
+ ' .transform: %s.%s' % (self.transform.__module__,
+ self.transform.__name__),
+ ' .stage: %r' % self.stage,
+ ' .details:']
+ details = self.details.items()
+ details.sort()
+ for key, value in details:
+ if isinstance(value, Node):
+ internals.append('%7s%s:' % ('', key))
+ internals.extend(['%9s%s' % ('', line)
+ for line in value.pformat().splitlines()])
+ else:
+ internals.append('%7s%s: %r' % ('', key, value))
+ return (Element.pformat(self, indent, level)
+ + ''.join([(' %s%s\n' % (indent * level, line))
+ for line in internals]))
+
+
+class raw(Special, Inline, PreBibliographic, TextElement):
+
+ """
+ Raw data that is to be passed untouched to the Writer.
+ """
+
+ pass
+
+
+# =================
+# Inline Elements
+# =================
+
+class emphasis(Inline, TextElement): pass
+class strong(Inline, TextElement): pass
+class interpreted(Inline, Referential, TextElement): pass
+class literal(Inline, TextElement): pass
+class reference(Inline, Referential, TextElement): pass
+class footnote_reference(Inline, Referential, TextElement): pass
+class citation_reference(Inline, Referential, TextElement): pass
+class substitution_reference(Inline, TextElement): pass
+
+
+class image(General, Inline, TextElement):
+
+ def astext(self):
+ return self.get('alt', '')
+
+
+class problematic(Inline, TextElement): pass
+
+
+# ========================================
+# Auxiliary Classes, Functions, and Data
+# ========================================
+
+node_class_names = """
+ Text
+ attention author authors
+ block_quote bullet_list
+ caption caution citation citation_reference classifier colspec
+ comment contact copyright
+ danger date definition definition_list definition_list_item
+ description docinfo doctest_block document
+ emphasis entry enumerated_list error
+ field field_argument field_body field_list field_name figure
+ footnote footnote_reference
+ hint
+ image important interpreted
+ label legend list_item literal literal_block
+ note
+ option option_argument option_group option_list option_list_item
+ option_string organization
+ paragraph pending problematic
+ raw reference revision row
+ section status strong substitution_definition
+ substitution_reference subtitle system_message
+ table target tbody term tgroup thead tip title topic transition
+ version
+ warning""".split()
+"""A list of names of all concrete Node subclasses."""
+
+
+class NodeVisitor:
+
+ """
+ "Visitor" pattern [GoF95]_ abstract superclass implementation for document
+ tree traversals.
+
+ Each node class has corresponding methods, doing nothing by default;
+ override individual methods for specific and useful behaviour. The
+ "``visit_`` + node class name" method is called by `Node.walk()` upon
+ entering a node. `Node.walkabout()` also calls the "``depart_`` + node
+ class name" method before exiting a node.
+
+ .. [GoF95] Gamma, Helm, Johnson, Vlissides. *Design Patterns: Elements of
+ Reusable Object-Oriented Software*. Addison-Wesley, Reading, MA, USA,
+ 1995.
+ """
+
+ def __init__(self, doctree):
+ self.doctree = doctree
+
+ def unknown_visit(self, node):
+ """
+ Called when entering unknown `Node` types.
+
+ Raise an exception unless overridden.
+ """
+ raise NotImplementedError('visiting unknown node type: %s'
+ % node.__class__.__name__)
+
+ def unknown_departure(self, node):
+ """
+ Called before exiting unknown `Node` types.
+
+ Raise exception unless overridden.
+ """
+ raise NotImplementedError('departing unknown node type: %s'
+ % node.__class__.__name__)
+
+ # Save typing with dynamic definitions.
+ for name in node_class_names:
+ exec """def visit_%s(self, node): pass\n""" % name
+ exec """def depart_%s(self, node): pass\n""" % name
+ del name
+
+
+class GenericNodeVisitor(NodeVisitor):
+
+ """
+ Generic "Visitor" abstract superclass, for simple traversals.
+
+ Unless overridden, each ``visit_...`` method calls `default_visit()`, and
+ each ``depart_...`` method (when using `Node.walkabout()`) calls
+ `default_departure()`. `default_visit()` (`default_departure()`) must be
+ overridden in subclasses.
+
+ Define fully generic visitors by overriding `default_visit()`
+ (`default_departure()`) only. Define semi-generic visitors by overriding
+ individual ``visit_...()`` (``depart_...()``) methods also.
+
+ `NodeVisitor.unknown_visit()` (`NodeVisitor.unknown_departure()`) should
+ be overridden for default behavior.
+ """
+
+ def default_visit(self, node):
+ """Override for generic, uniform traversals."""
+ raise NotImplementedError
+
+ def default_departure(self, node):
+ """Override for generic, uniform traversals."""
+ raise NotImplementedError
+
+ # Save typing with dynamic definitions.
+ for name in node_class_names:
+ exec """def visit_%s(self, node):
+ self.default_visit(node)\n""" % name
+ exec """def depart_%s(self, node):
+ self.default_departure(node)\n""" % name
+ del name
+
+
+class VisitorException(Exception): pass
+class SkipChildren(VisitorException): pass
+class SkipSiblings(VisitorException): pass
+class SkipNode(VisitorException): pass
diff --git a/docutils/parsers/__init__.py b/docutils/parsers/__init__.py
new file mode 100644
index 000000000..72e2e4e49
--- /dev/null
+++ b/docutils/parsers/__init__.py
@@ -0,0 +1,37 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+class Parser:
+
+ def parse(self, inputstring, docroot):
+ """Override to parse `inputstring` into document tree `docroot`."""
+ raise NotImplementedError('subclass must override this method')
+
+ def setup_parse(self, inputstring, docroot):
+ """Initial setup, used by `parse()`."""
+ self.inputstring = inputstring
+ self.docroot = docroot
+
+
+_parser_aliases = {
+ 'restructuredtext': 'rst',
+ 'rest': 'rst',
+ 'rtxt': 'rst',}
+
+def get_parser_class(parsername):
+ """Return the Parser class from the `parsername` module."""
+ parsername = parsername.lower()
+ if _parser_aliases.has_key(parsername):
+ parsername = _parser_aliases[parsername]
+ module = __import__(parsername, globals(), locals())
+ return module.Parser
diff --git a/docutils/parsers/rst/__init__.py b/docutils/parsers/rst/__init__.py
new file mode 100644
index 000000000..06589513b
--- /dev/null
+++ b/docutils/parsers/rst/__init__.py
@@ -0,0 +1,68 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+This is ``the docutils.parsers.restructuredtext`` package. It exports a single
+class, `Parser`.
+
+Usage
+=====
+
+1. Create a parser::
+
+ parser = docutils.parsers.restructuredtext.Parser()
+
+ Several optional arguments may be passed to modify the parser's behavior.
+ Please see `docutils.parsers.Parser` for details.
+
+2. Gather input (a multi-line string), by reading a file or the standard
+ input::
+
+ input = sys.stdin.read()
+
+3. Create a new empty `docutils.nodes.document` tree::
+
+ docroot = docutils.utils.newdocument()
+
+ See `docutils.utils.newdocument()` for parameter details.
+
+4. Run the parser, populating the document tree::
+
+ document = parser.parse(input, docroot)
+
+Parser Overview
+===============
+
+The reStructuredText parser is implemented as a state machine, examining its
+input one line at a time. To understand how the parser works, please first
+become familiar with the `docutils.statemachine` module, then see the
+`states` module.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+import docutils.parsers
+import docutils.statemachine
+import states
+
+
+class Parser(docutils.parsers.Parser):
+
+ """The reStructuredText parser."""
+
+ def parse(self, inputstring, docroot):
+ """Parse `inputstring` and populate `docroot`, a document tree."""
+ self.setup_parse(inputstring, docroot)
+ debug = docroot.reporter[''].debug
+ self.statemachine = states.RSTStateMachine(
+ stateclasses=states.stateclasses, initialstate='Body',
+ debug=debug)
+ inputlines = docutils.statemachine.string2lines(
+ inputstring, convertwhitespace=1)
+ self.statemachine.run(inputlines, docroot)
diff --git a/docutils/parsers/rst/directives/__init__.py b/docutils/parsers/rst/directives/__init__.py
new file mode 100644
index 000000000..43b0c1dd3
--- /dev/null
+++ b/docutils/parsers/rst/directives/__init__.py
@@ -0,0 +1,88 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+This package contains directive implementation modules.
+
+The interface for directive functions is as follows::
+
+ def directivefn(match, type, data, state, statemachine, attributes)
+
+Where:
+
+- ``match`` is a regular expression match object which matched the first line
+ of the directive. ``match.group(1)`` gives the directive name.
+- ``type`` is the directive type or name.
+- ``data`` contains the remainder of the first line of the directive after the
+ "::".
+- ``state`` is the state which called the directive function.
+- ``statemachine`` is the state machine which controls the state which called
+ the directive function.
+- ``attributes`` is a dictionary of extra attributes which may be added to the
+ element the directive produces. Currently, only an "alt" attribute is passed
+ by substitution definitions (value: the substitution name), which may by
+ used by an embedded image directive.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+_directive_registry = {
+ 'attention': ('admonitions', 'attention'),
+ 'caution': ('admonitions', 'caution'),
+ 'danger': ('admonitions', 'danger'),
+ 'error': ('admonitions', 'error'),
+ 'important': ('admonitions', 'important'),
+ 'note': ('admonitions', 'note'),
+ 'tip': ('admonitions', 'tip'),
+ 'hint': ('admonitions', 'hint'),
+ 'warning': ('admonitions', 'warning'),
+ 'image': ('images', 'image'),
+ 'figure': ('images', 'figure'),
+ 'contents': ('components', 'contents'),
+ 'footnotes': ('components', 'footnotes'),
+ 'citations': ('components', 'citations'),
+ 'topic': ('components', 'topic'),
+ 'meta': ('html', 'meta'),
+ 'imagemap': ('html', 'imagemap'),
+ 'raw': ('misc', 'raw'),
+ 'restructuredtext-test-directive': ('misc', 'directive_test_function'),}
+"""Mapping of directive name to (module name, function name). The directive
+'name' is canonical & must be lowercase; language-dependent names are defined
+in the language package."""
+
+_modules = {}
+"""Cache of imported directive modules."""
+
+_directives = {}
+"""Cache of imported directive functions."""
+
+def directive(directivename, languagemodule):
+ """
+ Locate and return a directive function from its language-dependent name.
+ """
+ normname = directivename.lower()
+ if _directives.has_key(normname):
+ return _directives[normname]
+ try:
+ canonicalname = languagemodule.directives[normname]
+ modulename, functionname = _directive_registry[canonicalname]
+ except KeyError:
+ return None
+ if _modules.has_key(modulename):
+ module = _modules[modulename]
+ else:
+ try:
+ module = __import__(modulename, globals(), locals())
+ except ImportError:
+ return None
+ try:
+ function = getattr(module, functionname)
+ except AttributeError:
+ return None
+ return function
diff --git a/docutils/parsers/rst/directives/admonitions.py b/docutils/parsers/rst/directives/admonitions.py
new file mode 100644
index 000000000..f594cd431
--- /dev/null
+++ b/docutils/parsers/rst/directives/admonitions.py
@@ -0,0 +1,55 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Admonition directives.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+from docutils.parsers.rst import states
+from docutils import nodes
+
+
+def admonition(nodeclass, match, typename, data, state, statemachine,
+ attributes):
+ indented, indent, lineoffset, blankfinish \
+ = statemachine.getfirstknownindented(match.end())
+ text = '\n'.join(indented)
+ admonitionnode = nodeclass(text)
+ if text:
+ state.nestedparse(indented, lineoffset, admonitionnode)
+ return [admonitionnode], blankfinish
+
+def attention(*args, **kwargs):
+ return admonition(nodes.attention, *args, **kwargs)
+
+def caution(*args, **kwargs):
+ return admonition(nodes.caution, *args, **kwargs)
+
+def danger(*args, **kwargs):
+ return admonition(nodes.danger, *args, **kwargs)
+
+def error(*args, **kwargs):
+ return admonition(nodes.error, *args, **kwargs)
+
+def important(*args, **kwargs):
+ return admonition(nodes.important, *args, **kwargs)
+
+def note(*args, **kwargs):
+ return admonition(nodes.note, *args, **kwargs)
+
+def tip(*args, **kwargs):
+ return admonition(nodes.tip, *args, **kwargs)
+
+def hint(*args, **kwargs):
+ return admonition(nodes.hint, *args, **kwargs)
+
+def warning(*args, **kwargs):
+ return admonition(nodes.warning, *args, **kwargs)
diff --git a/docutils/parsers/rst/directives/components.py b/docutils/parsers/rst/directives/components.py
new file mode 100644
index 000000000..8463f41b0
--- /dev/null
+++ b/docutils/parsers/rst/directives/components.py
@@ -0,0 +1,59 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Document component directives.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+from docutils import nodes
+import docutils.transforms.components
+
+
+contents_attribute_spec = {'depth': int,
+ 'local': (lambda x: x)}
+
+def contents(match, typename, data, state, statemachine, attributes):
+ lineno = statemachine.abslineno()
+ lineoffset = statemachine.lineoffset
+ datablock, indent, offset, blankfinish = \
+ statemachine.getfirstknownindented(match.end(), uptoblank=1)
+ blocktext = '\n'.join(statemachine.inputlines[
+ lineoffset : lineoffset + len(datablock) + 1])
+ for i in range(len(datablock)):
+ if datablock[i][:1] == ':':
+ attlines = datablock[i:]
+ datablock = datablock[:i]
+ break
+ else:
+ attlines = []
+ i = 0
+ titletext = ' '.join([line.strip() for line in datablock])
+ if titletext:
+ textnodes, messages = state.inline_text(titletext, lineno)
+ title = nodes.title(titletext, '', *textnodes)
+ else:
+ messages = []
+ title = None
+ pending = nodes.pending(docutils.transforms.components.Contents,
+ 'last_reader', {'title': title}, blocktext)
+ if attlines:
+ success, data, blankfinish = state.parse_extension_attributes(
+ contents_attribute_spec, attlines, blankfinish)
+ if success: # data is a dict of attributes
+ pending.details.update(data)
+ else: # data is an error string
+ error = statemachine.memo.reporter.error(
+ 'Error in "%s" directive attributes at line %s:\n%s.'
+ % (match.group(1), lineno, data), '',
+ nodes.literal_block(blocktext, blocktext))
+ return [error] + messages, blankfinish
+ statemachine.memo.document.note_pending(pending)
+ return [pending] + messages, blankfinish
diff --git a/docutils/parsers/rst/directives/html.py b/docutils/parsers/rst/directives/html.py
new file mode 100644
index 000000000..d971300e0
--- /dev/null
+++ b/docutils/parsers/rst/directives/html.py
@@ -0,0 +1,89 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Directives for typically HTML-specific constructs.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+from docutils import nodes, utils
+from docutils.parsers.rst import states
+
+
+def meta(match, typename, data, state, statemachine, attributes):
+ lineoffset = statemachine.lineoffset
+ block, indent, offset, blankfinish = \
+ statemachine.getfirstknownindented(match.end(), uptoblank=1)
+ node = nodes.Element()
+ if block:
+ newlineoffset, blankfinish = state.nestedlistparse(
+ block, offset, node, initialstate='MetaBody',
+ blankfinish=blankfinish, statemachinekwargs=metaSMkwargs)
+ if (newlineoffset - offset) != len(block): # incomplete parse of block?
+ blocktext = '\n'.join(statemachine.inputlines[
+ lineoffset : statemachine.lineoffset+1])
+ msg = statemachine.memo.reporter.error(
+ 'Invalid meta directive at line %s.'
+ % statemachine.abslineno(), '',
+ nodes.literal_block(blocktext, blocktext))
+ node += msg
+ else:
+ msg = statemachine.memo.reporter.error(
+ 'Empty meta directive at line %s.' % statemachine.abslineno())
+ node += msg
+ return node.getchildren(), blankfinish
+
+def imagemap(match, typename, data, state, statemachine, attributes):
+ return [], 0
+
+
+class MetaBody(states.SpecializedBody):
+
+ class meta(nodes.Special, nodes.PreBibliographic, nodes.Element):
+ """HTML-specific "meta" element."""
+ pass
+
+ def field_marker(self, match, context, nextstate):
+ """Meta element."""
+ node, blankfinish = self.parsemeta(match)
+ self.statemachine.node += node
+ return [], nextstate, []
+
+ def parsemeta(self, match):
+ name, args = self.parse_field_marker(match)
+ indented, indent, lineoffset, blankfinish = \
+ self.statemachine.getfirstknownindented(match.end())
+ node = self.meta()
+ node['content'] = ' '.join(indented)
+ if not indented:
+ line = self.statemachine.line
+ msg = self.statemachine.memo.reporter.info(
+ 'No content for meta tag "%s".' % name, '',
+ nodes.literal_block(line, line))
+ self.statemachine.node += msg
+ try:
+ attname, val = utils.extract_name_value(name)[0]
+ node[attname.lower()] = val
+ except utils.NameValueError:
+ node['name'] = name
+ for arg in args:
+ try:
+ attname, val = utils.extract_name_value(arg)[0]
+ node[attname.lower()] = val
+ except utils.NameValueError, detail:
+ line = self.statemachine.line
+ msg = self.statemachine.memo.reporter.error(
+ 'Error parsing meta tag attribute "%s": %s'
+ % (arg, detail), '', nodes.literal_block(line, line))
+ self.statemachine.node += msg
+ return node, blankfinish
+
+
+metaSMkwargs = {'stateclasses': (MetaBody,)}
diff --git a/docutils/parsers/rst/directives/images.py b/docutils/parsers/rst/directives/images.py
new file mode 100644
index 000000000..7a719333b
--- /dev/null
+++ b/docutils/parsers/rst/directives/images.py
@@ -0,0 +1,97 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Directives for figures and simple images.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+import sys
+from docutils.parsers.rst import states
+from docutils import nodes, utils
+
+def unchanged(arg):
+ return arg # unchanged!
+
+image_attribute_spec = {'alt': unchanged,
+ 'height': int,
+ 'width': int,
+ 'scale': int}
+
+def image(match, typename, data, state, statemachine, attributes):
+ lineno = statemachine.abslineno()
+ lineoffset = statemachine.lineoffset
+ datablock, indent, offset, blankfinish = \
+ statemachine.getfirstknownindented(match.end(), uptoblank=1)
+ blocktext = '\n'.join(statemachine.inputlines[
+ lineoffset : lineoffset + len(datablock) + 1])
+ for i in range(len(datablock)):
+ if datablock[i][:1] == ':':
+ attlines = datablock[i:]
+ datablock = datablock[:i]
+ break
+ else:
+ attlines = []
+ if not datablock:
+ error = statemachine.memo.reporter.error(
+ 'Missing image URI argument at line %s.' % lineno, '',
+ nodes.literal_block(blocktext, blocktext))
+ return [error], blankfinish
+ attoffset = lineoffset + i
+ reference = ''.join([line.strip() for line in datablock])
+ if reference.find(' ') != -1:
+ error = statemachine.memo.reporter.error(
+ 'Image URI at line %s contains whitespace.' % lineno, '',
+ nodes.literal_block(blocktext, blocktext))
+ return [error], blankfinish
+ if attlines:
+ success, data, blankfinish = state.parse_extension_attributes(
+ image_attribute_spec, attlines, blankfinish)
+ if success: # data is a dict of attributes
+ attributes.update(data)
+ else: # data is an error string
+ error = statemachine.memo.reporter.error(
+ 'Error in "%s" directive attributes at line %s:\n%s.'
+ % (match.group(1), lineno, data), '',
+ nodes.literal_block(blocktext, blocktext))
+ return [error], blankfinish
+ attributes['uri'] = reference
+ imagenode = nodes.image(blocktext, **attributes)
+ return [imagenode], blankfinish
+
+def figure(match, typename, data, state, statemachine, attributes):
+ lineoffset = statemachine.lineoffset
+ (imagenode,), blankfinish = image(match, typename, data, state,
+ statemachine, attributes)
+ indented, indent, offset, blankfinish \
+ = statemachine.getfirstknownindented(sys.maxint)
+ blocktext = '\n'.join(statemachine.inputlines[lineoffset:
+ statemachine.lineoffset+1])
+ if isinstance(imagenode, nodes.system_message):
+ if indented:
+ imagenode[-1] = nodes.literal_block(blocktext, blocktext)
+ return [imagenode], blankfinish
+ figurenode = nodes.figure('', imagenode)
+ if indented:
+ node = nodes.Element() # anonymous container for parsing
+ state.nestedparse(indented, lineoffset, node)
+ firstnode = node[0]
+ if isinstance(firstnode, nodes.paragraph):
+ caption = nodes.caption(firstnode.rawsource, '',
+ *firstnode.children)
+ figurenode += caption
+ elif not (isinstance(firstnode, nodes.comment) and len(firstnode) == 0):
+ error = statemachine.memo.reporter.error(
+ 'Figure caption must be a paragraph or empty comment.', '',
+ nodes.literal_block(blocktext, blocktext))
+ return [figurenode, error], blankfinish
+ if len(node) > 1:
+ figurenode += nodes.legend('', *node[1:])
+ return [figurenode], blankfinish
diff --git a/docutils/parsers/rst/directives/misc.py b/docutils/parsers/rst/directives/misc.py
new file mode 100644
index 000000000..f8a9d5217
--- /dev/null
+++ b/docutils/parsers/rst/directives/misc.py
@@ -0,0 +1,39 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Miscellaneous directives.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+from docutils import nodes
+
+
+def raw(match, typename, data, state, statemachine, attributes):
+ return [], 1
+
+def directive_test_function(match, typename, data, state, statemachine,
+ attributes):
+ try:
+ statemachine.nextline()
+ indented, indent, offset, blankfinish = statemachine.getindented()
+ text = '\n'.join(indented)
+ except IndexError:
+ text = ''
+ blankfinish = 1
+ if text:
+ info = statemachine.memo.reporter.info(
+ 'Directive processed. Type="%s", data="%s", directive block:'
+ % (typename, data), '', nodes.literal_block(text, text))
+ else:
+ info = statemachine.memo.reporter.info(
+ 'Directive processed. Type="%s", data="%s", directive block: None'
+ % (typename, data))
+ return [info], blankfinish
diff --git a/docutils/parsers/rst/languages/__init__.py b/docutils/parsers/rst/languages/__init__.py
new file mode 100644
index 000000000..ee36d1148
--- /dev/null
+++ b/docutils/parsers/rst/languages/__init__.py
@@ -0,0 +1,23 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+This package contains modules for language-dependent features of
+reStructuredText.
+"""
+
+__docformat__ = 'reStructuredText'
+
+_languages = {}
+
+def getlanguage(languagecode):
+ if _languages.has_key(languagecode):
+ return _languages[languagecode]
+ module = __import__(languagecode, globals(), locals())
+ _languages[languagecode] = module
+ return module
diff --git a/docutils/parsers/rst/languages/en.py b/docutils/parsers/rst/languages/en.py
new file mode 100644
index 000000000..2b1c52649
--- /dev/null
+++ b/docutils/parsers/rst/languages/en.py
@@ -0,0 +1,38 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+English-language mappings for language-dependent features of
+reStructuredText.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+directives = {
+ 'attention': 'attention',
+ 'caution': 'caution',
+ 'danger': 'danger',
+ 'error': 'error',
+ 'hint': 'hint',
+ 'important': 'important',
+ 'note': 'note',
+ 'tip': 'tip',
+ 'warning': 'warning',
+ 'image': 'image',
+ 'figure': 'figure',
+ 'contents': 'contents',
+ 'footnotes': 'footnotes',
+ 'citations': 'citations',
+ 'topic': 'topic',
+ 'meta': 'meta',
+ 'imagemap': 'imagemap',
+ 'raw': 'raw',
+ 'restructuredtext-test-directive': 'restructuredtext-test-directive'}
+"""English name to registered (in directives/__init__.py) directive name
+mapping."""
diff --git a/docutils/parsers/rst/states.py b/docutils/parsers/rst/states.py
new file mode 100644
index 000000000..b2dbf9b3e
--- /dev/null
+++ b/docutils/parsers/rst/states.py
@@ -0,0 +1,2115 @@
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+This is the ``docutils.parsers.restructuredtext.states`` module, the core of
+the reStructuredText parser. It defines the following:
+
+:Classes:
+ - `RSTStateMachine`: reStructuredText parser's entry point.
+ - `NestedStateMachine`: recursive StateMachine.
+ - `RSTState`: reStructuredText State superclass.
+ - `Body`: Generic classifier of the first line of a block.
+ - `BulletList`: Second and subsequent bullet_list list_items
+ - `DefinitionList`: Second and subsequent definition_list_items.
+ - `EnumeratedList`: Second and subsequent enumerated_list list_items.
+ - `FieldList`: Second and subsequent fields.
+ - `OptionList`: Second and subsequent option_list_items.
+ - `Explicit`: Second and subsequent explicit markup constructs.
+ - `SubstitutionDef`: For embedded directives in substitution definitions.
+ - `Text`: Classifier of second line of a text block.
+ - `Definition`: Second line of potential definition_list_item.
+ - `Line`: Second line of overlined section title or transition marker.
+ - `Stuff`: An auxilliary collection class.
+
+:Exception classes:
+ - `MarkupError`
+ - `ParserError`
+ - `TransformationError`
+
+:Functions:
+ - `escape2null()`: Return a string, escape-backslashes converted to nulls.
+ - `unescape()`: Return a string, nulls removed or restored to backslashes.
+ - `normname()`: Return a case- and whitespace-normalized name.
+
+:Attributes:
+ - `stateclasses`: set of State classes used with `RSTStateMachine`.
+
+Parser Overview
+===============
+
+The reStructuredText parser is implemented as a state machine, examining its
+input one line at a time. To understand how the parser works, please first
+become familiar with the `docutils.statemachine` module. In the description
+below, references are made to classes defined in this module; please see the
+individual classes for details.
+
+Parsing proceeds as follows:
+
+1. The state machine examines each line of input, checking each of the
+ transition patterns of the state `Body`, in order, looking for a match. The
+ implicit transitions (blank lines and indentation) are checked before any
+ others. The 'text' transition is a catch-all (matches anything).
+
+2. The method associated with the matched transition pattern is called.
+
+ A. Some transition methods are self-contained, appending elements to the
+ document tree ('doctest' parses a doctest block). The parser's current
+ line index is advanced to the end of the element, and parsing continues
+ with step 1.
+
+ B. Others trigger the creation of a nested state machine, whose job is to
+ parse a compound construct ('indent' does a block quote, 'bullet' does a
+ bullet list, 'overline' does a section [first checking for a valid
+ section header]).
+
+ - In the case of lists and explicit markup, a new state machine is
+ created and run to parse the first item.
+
+ - A new state machine is created and its initial state is set to the
+ appropriate specialized state (`BulletList` in the case of the
+ 'bullet' transition). This state machine is run to parse the compound
+ element (or series of explicit markup elements), and returns as soon
+ as a non-member element is encountered. For example, the `BulletList`
+ state machine aborts as soon as it encounters an element which is not
+ a list item of that bullet list. The optional omission of
+ inter-element blank lines is handled by the nested state machine.
+
+ - The current line index is advanced to the end of the elements parsed,
+ and parsing continues with step 1.
+
+ C. The result of the 'text' transition depends on the next line of text.
+ The current state is changed to `Text`, under which the second line is
+ examined. If the second line is:
+
+ - Indented: The element is a definition list item, and parsing proceeds
+ similarly to step 2.B, using the `DefinitionList` state.
+
+ - A line of uniform punctuation characters: The element is a section
+ header; again, parsing proceeds as in step 2.B, and `Body` is still
+ used.
+
+ - Anything else: The element is a paragraph, which is examined for
+ inline markup and appended to the parent element. Processing continues
+ with step 1.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+import sys, re, string
+from docutils import nodes, statemachine, utils, roman, urischemes
+from docutils.statemachine import StateMachineWS, StateWS
+from docutils.utils import normname
+import directives, languages
+from tableparser import TableParser, TableMarkupError
+
+
+class MarkupError(Exception): pass
+class ParserError(Exception): pass
+
+
+class Stuff:
+
+ """Stores a bunch of stuff for dotted-attribute access."""
+
+ def __init__(self, **keywordargs):
+ self.__dict__.update(keywordargs)
+
+
+class RSTStateMachine(StateMachineWS):
+
+ """
+ reStructuredText's master StateMachine.
+
+ The entry point to reStructuredText parsing is the `run()` method.
+ """
+
+ def run(self, inputlines, docroot, inputoffset=0, matchtitles=1):
+ """
+ Parse `inputlines` and return a `docutils.nodes.document` instance.
+
+ Extend `StateMachineWS.run()`: set up parse-global data, run the
+ StateMachine, and return the resulting
+ document.
+ """
+ self.language = languages.getlanguage(docroot.languagecode)
+ self.matchtitles = matchtitles
+ self.memo = Stuff(document=docroot,
+ reporter=docroot.reporter,
+ language=self.language,
+ titlestyles=[],
+ sectionlevel=0)
+ self.node = docroot
+ results = StateMachineWS.run(self, inputlines, inputoffset)
+ assert results == [], 'RSTStateMachine.run() results should be empty.'
+ self.node = self.memo = None # remove unneeded references
+
+
+class NestedStateMachine(StateMachineWS):
+
+ """
+ StateMachine run from within other StateMachine runs, to parse nested
+ document structures.
+ """
+
+ def run(self, inputlines, inputoffset, memo, node, matchtitles=1):
+ """
+ Parse `inputlines` and populate a `docutils.nodes.document` instance.
+
+ Extend `StateMachineWS.run()`: set up document-wide data.
+ """
+ self.matchtitles = matchtitles
+ self.memo = memo
+ self.node = node
+ results = StateMachineWS.run(self, inputlines, inputoffset)
+ assert results == [], 'NestedStateMachine.run() results should be empty'
+ return results
+
+
+class RSTState(StateWS):
+
+ """
+ reStructuredText State superclass.
+
+ Contains methods used by all State subclasses.
+ """
+
+ nestedSM = NestedStateMachine
+
+ def __init__(self, statemachine, debug=0):
+ self.nestedSMkwargs = {'stateclasses': stateclasses,
+ 'initialstate': 'Body'}
+ StateWS.__init__(self, statemachine, debug)
+
+ def gotoline(self, abslineoffset):
+ """Jump to input line `abslineoffset`, ignoring jumps past the end."""
+ try:
+ self.statemachine.gotoline(abslineoffset)
+ except IndexError:
+ pass
+
+ def bof(self, context):
+ """Called at beginning of file."""
+ return [], []
+
+ def nestedparse(self, block, inputoffset, node, matchtitles=0,
+ statemachineclass=None, statemachinekwargs=None):
+ """
+ Create a new StateMachine rooted at `node` and run it over the input
+ `block`.
+ """
+ if statemachineclass is None:
+ statemachineclass = self.nestedSM
+ if statemachinekwargs is None:
+ statemachinekwargs = self.nestedSMkwargs
+ statemachine = statemachineclass(debug=self.debug, **statemachinekwargs)
+ statemachine.run(block, inputoffset, memo=self.statemachine.memo,
+ node=node, matchtitles=matchtitles)
+ statemachine.unlink()
+ return statemachine.abslineoffset()
+
+ def nestedlistparse(self, block, inputoffset, node, initialstate,
+ blankfinish, blankfinishstate=None, extrasettings={},
+ matchtitles=0, statemachineclass=None,
+ statemachinekwargs=None):
+ """
+ Create a new StateMachine rooted at `node` and run it over the input
+ `block`. Also keep track of optional intermdediate blank lines and the
+ required final one.
+ """
+ if statemachineclass is None:
+ statemachineclass = self.nestedSM
+ if statemachinekwargs is None:
+ statemachinekwargs = self.nestedSMkwargs.copy()
+ statemachinekwargs['initialstate'] = initialstate
+ statemachine = statemachineclass(debug=self.debug, **statemachinekwargs)
+ if blankfinishstate is None:
+ blankfinishstate = initialstate
+ statemachine.states[blankfinishstate].blankfinish = blankfinish
+ for key, value in extrasettings.items():
+ setattr(statemachine.states[initialstate], key, value)
+ statemachine.run(block, inputoffset, memo=self.statemachine.memo,
+ node=node, matchtitles=matchtitles)
+ blankfinish = statemachine.states[blankfinishstate].blankfinish
+ statemachine.unlink()
+ return statemachine.abslineoffset(), blankfinish
+
+ def section(self, title, source, style, lineno):
+ """
+ When a new section is reached that isn't a subsection of the current
+ section, back up the line count (use previousline(-x)), then raise
+ EOFError. The current StateMachine will finish, then the calling
+ StateMachine can re-examine the title. This will work its way back up
+ the calling chain until the correct section level isreached.
+
+ Alternative: Evaluate the title, store the title info & level, and
+ back up the chain until that level is reached. Store in memo? Or
+ return in results?
+ """
+ if self.checksubsection(source, style, lineno):
+ self.newsubsection(title, lineno)
+
+ def checksubsection(self, source, style, lineno):
+ """
+ Check for a valid subsection header. Return 1 (true) or None (false).
+
+ :Exception: `EOFError` when a sibling or supersection encountered.
+ """
+ memo = self.statemachine.memo
+ titlestyles = memo.titlestyles
+ mylevel = memo.sectionlevel
+ try: # check for existing title style
+ level = titlestyles.index(style) + 1
+ except ValueError: # new title style
+ if len(titlestyles) == memo.sectionlevel: # new subsection
+ titlestyles.append(style)
+ return 1
+ else: # not at lowest level
+ self.statemachine.node += self.titleinconsistent(source, lineno)
+ return None
+ if level <= mylevel: # sibling or supersection
+ memo.sectionlevel = level # bubble up to parent section
+ # back up 2 lines for underline title, 3 for overline title
+ self.statemachine.previousline(len(style) + 1)
+ raise EOFError # let parent section re-evaluate
+ if level == mylevel + 1: # immediate subsection
+ return 1
+ else: # invalid subsection
+ self.statemachine.node += self.titleinconsistent(source, lineno)
+ return None
+
+ def titleinconsistent(self, sourcetext, lineno):
+ literalblock = nodes.literal_block('', sourcetext)
+ error = self.statemachine.memo.reporter.severe(
+ 'Title level inconsistent at line %s:' % lineno, '', literalblock)
+ return error
+
+ def newsubsection(self, title, lineno):
+ """Append new subsection to document tree. On return, check level."""
+ memo = self.statemachine.memo
+ mylevel = memo.sectionlevel
+ memo.sectionlevel += 1
+ sectionnode = nodes.section()
+ self.statemachine.node += sectionnode
+ textnodes, messages = self.inline_text(title, lineno)
+ titlenode = nodes.title(title, '', *textnodes)
+ name = normname(titlenode.astext())
+ sectionnode['name'] = name
+ sectionnode += titlenode
+ sectionnode += messages
+ memo.document.note_implicit_target(sectionnode, sectionnode)
+ offset = self.statemachine.lineoffset + 1
+ absoffset = self.statemachine.abslineoffset() + 1
+ newabsoffset = self.nestedparse(
+ self.statemachine.inputlines[offset:], inputoffset=absoffset,
+ node=sectionnode, matchtitles=1)
+ self.gotoline(newabsoffset)
+ if memo.sectionlevel <= mylevel: # can't handle next section?
+ raise EOFError # bubble up to supersection
+ # reset sectionlevel; next pass will detect it properly
+ memo.sectionlevel = mylevel
+
+ def paragraph(self, lines, lineno):
+ """
+ Return a list (paragraph & messages) and a boolean: literal_block next?
+ """
+ data = '\n'.join(lines).rstrip()
+ if data[-2:] == '::':
+ if len(data) == 2:
+ return [], 1
+ elif data[-3] == ' ':
+ text = data[:-3].rstrip()
+ else:
+ text = data[:-1]
+ literalnext = 1
+ else:
+ text = data
+ literalnext = 0
+ textnodes, messages = self.inline_text(text, lineno)
+ p = nodes.paragraph(data, '', *textnodes)
+ return [p] + messages, literalnext
+
+ inline = Stuff()
+ """Patterns and constants used for inline markup recognition."""
+
+ inline.openers = '\'"([{<'
+ inline.closers = '\'")]}>'
+ inline.start_string_prefix = (r'(?:(?<=^)|(?<=[ \n%s]))'
+ % re.escape(inline.openers))
+ inline.end_string_suffix = (r'(?:(?=$)|(?=[- \n.,:;!?%s]))'
+ % re.escape(inline.closers))
+ inline.non_whitespace_before = r'(? 0:
+ textnodes.append(nodes.Text(unescape(
+ remainder[:match.start(whole)])))
+ if match.group(email):
+ addscheme = 'mailto:'
+ else:
+ addscheme = ''
+ text = match.group(whole)
+ unescaped = unescape(text, 0)
+ textnodes.append(
+ nodes.reference(unescape(text, 1), unescaped,
+ refuri=addscheme + unescaped))
+ remainder = remainder[match.end(whole):]
+ start = 0
+ else: # not a valid scheme
+ start = match.end(whole)
+ else:
+ if remainder:
+ textnodes.append(nodes.Text(unescape(remainder)))
+ break
+ return textnodes
+
+ inline.dispatch = {'*': emphasis,
+ '**': strong,
+ '`': interpreted_or_phrase_ref,
+ '``': literal,
+ '_`': inline_target,
+ ']_': footnote_reference,
+ '|': substitution_reference,
+ '_': reference,
+ '__': anonymous_reference}
+
+ def inline_text(self, text, lineno):
+ """
+ Return 2 lists: nodes (text and inline elements), and system_messages.
+
+ Using a `pattern` matching start-strings (for emphasis, strong,
+ interpreted, phrase reference, literal, substitution reference, and
+ inline target) or complete constructs (simple reference, footnote
+ reference) we search for a candidate. When one is found, we check for
+ validity (e.g., not a quoted '*' character). If valid, search for the
+ corresponding end string if applicable, and check for validity. If not
+ found or invalid, generate a warning and ignore the start-string.
+ Standalone hyperlinks are found last.
+ """
+ pattern = self.inline.patterns.initial
+ dispatch = self.inline.dispatch
+ start = self.inline.groups.initial.start - 1
+ backquote = self.inline.groups.initial.backquote - 1
+ refend = self.inline.groups.initial.refend - 1
+ fnend = self.inline.groups.initial.fnend - 1
+ remaining = escape2null(text)
+ processed = []
+ unprocessed = []
+ messages = []
+ while remaining:
+ match = pattern.search(remaining)
+ if match:
+ groups = match.groups()
+ before, inlines, remaining, sysmessages = \
+ dispatch[groups[start] or groups[backquote]
+ or groups[refend]
+ or groups[fnend]](self, match, lineno)
+ unprocessed.append(before)
+ messages += sysmessages
+ if inlines:
+ processed += self.standalone_uri(''.join(unprocessed),
+ lineno)
+ processed += inlines
+ unprocessed = []
+ else:
+ break
+ remaining = ''.join(unprocessed) + remaining
+ if remaining:
+ processed += self.standalone_uri(remaining, lineno)
+ return processed, messages
+
+ def unindentwarning(self):
+ return self.statemachine.memo.reporter.warning(
+ ('Unindent without blank line at line %s.'
+ % (self.statemachine.abslineno() + 1)))
+
+
+class Body(RSTState):
+
+ """
+ Generic classifier of the first line of a block.
+ """
+
+ enum = Stuff()
+ """Enumerated list parsing information."""
+
+ enum.formatinfo = {
+ 'parens': Stuff(prefix='(', suffix=')', start=1, end=-1),
+ 'rparen': Stuff(prefix='', suffix=')', start=0, end=-1),
+ 'period': Stuff(prefix='', suffix='.', start=0, end=-1)}
+ enum.formats = enum.formatinfo.keys()
+ enum.sequences = ['arabic', 'loweralpha', 'upperalpha',
+ 'lowerroman', 'upperroman'] # ORDERED!
+ enum.sequencepats = {'arabic': '[0-9]+',
+ 'loweralpha': '[a-z]',
+ 'upperalpha': '[A-Z]',
+ 'lowerroman': '[ivxlcdm]+',
+ 'upperroman': '[IVXLCDM]+',}
+ enum.converters = {'arabic': int,
+ 'loweralpha':
+ lambda s, zero=(ord('a')-1): ord(s) - zero,
+ 'upperalpha':
+ lambda s, zero=(ord('A')-1): ord(s) - zero,
+ 'lowerroman':
+ lambda s: roman.fromRoman(s.upper()),
+ 'upperroman': roman.fromRoman}
+
+ enum.sequenceregexps = {}
+ for sequence in enum.sequences:
+ enum.sequenceregexps[sequence] = re.compile(enum.sequencepats[sequence]
+ + '$')
+
+ tabletoppat = re.compile(r'\+-[-+]+-\+ *$')
+ """Matches the top (& bottom) of a table)."""
+
+ tableparser = TableParser()
+
+ pats = {}
+ """Fragments of patterns used by transitions."""
+
+ pats['nonalphanum7bit'] = '[!-/:-@[-`{-~]'
+ pats['alpha'] = '[a-zA-Z]'
+ pats['alphanum'] = '[a-zA-Z0-9]'
+ pats['alphanumplus'] = '[a-zA-Z0-9_-]'
+ pats['enum'] = ('(%(arabic)s|%(loweralpha)s|%(upperalpha)s|%(lowerroman)s'
+ '|%(upperroman)s)' % enum.sequencepats)
+ pats['optname'] = '%(alphanum)s%(alphanumplus)s*' % pats
+ pats['optarg'] = '%(alpha)s%(alphanumplus)s*' % pats
+ pats['option'] = r'(--?|\+|/)%(optname)s([ =]%(optarg)s)?' % pats
+
+ for format in enum.formats:
+ pats[format] = '(?P<%s>%s%s%s)' % (
+ format, re.escape(enum.formatinfo[format].prefix),
+ pats['enum'], re.escape(enum.formatinfo[format].suffix))
+
+ patterns = {'bullet': r'[-+*]( +|$)',
+ 'enumerator': r'(%(parens)s|%(rparen)s|%(period)s)( +|$)'
+ % pats,
+ 'field_marker': r':[^: ]([^:]*[^: ])?:( +|$)',
+ 'option_marker': r'%(option)s(, %(option)s)*( +| ?$)' % pats,
+ 'doctest': r'>>>( +|$)',
+ 'tabletop': tabletoppat,
+ 'explicit_markup': r'\.\.( +|$)',
+ 'anonymous': r'__( +|$)',
+ 'line': r'(%(nonalphanum7bit)s)\1\1\1+ *$' % pats,
+ #'rfc822': r'[!-9;-~]+:( +|$)',
+ 'text': r''}
+ initialtransitions = ['bullet',
+ 'enumerator',
+ 'field_marker',
+ 'option_marker',
+ 'doctest',
+ 'tabletop',
+ 'explicit_markup',
+ 'anonymous',
+ 'line',
+ 'text']
+
+ def indent(self, match, context, nextstate):
+ """Block quote."""
+ indented, indent, lineoffset, blankfinish = \
+ self.statemachine.getindented()
+ blockquote = self.block_quote(indented, lineoffset)
+ self.statemachine.node += blockquote
+ if not blankfinish:
+ self.statemachine.node += self.unindentwarning()
+ return context, nextstate, []
+
+ def block_quote(self, indented, lineoffset):
+ blockquote = nodes.block_quote()
+ self.nestedparse(indented, lineoffset, blockquote)
+ return blockquote
+
+ def bullet(self, match, context, nextstate):
+ """Bullet list item."""
+ bulletlist = nodes.bullet_list()
+ self.statemachine.node += bulletlist
+ bulletlist['bullet'] = match.string[0]
+ i, blankfinish = self.list_item(match.end())
+ bulletlist += i
+ offset = self.statemachine.lineoffset + 1 # next line
+ newlineoffset, blankfinish = self.nestedlistparse(
+ self.statemachine.inputlines[offset:],
+ inputoffset=self.statemachine.abslineoffset() + 1,
+ node=bulletlist, initialstate='BulletList',
+ blankfinish=blankfinish)
+ if not blankfinish:
+ self.statemachine.node += self.unindentwarning()
+ self.gotoline(newlineoffset)
+ return [], nextstate, []
+
+ def list_item(self, indent):
+ indented, lineoffset, blankfinish = \
+ self.statemachine.getknownindented(indent)
+ listitem = nodes.list_item('\n'.join(indented))
+ if indented:
+ self.nestedparse(indented, inputoffset=lineoffset, node=listitem)
+ return listitem, blankfinish
+
+ def enumerator(self, match, context, nextstate):
+ """Enumerated List Item"""
+ format, sequence, text, ordinal = self.parse_enumerator(match)
+ if ordinal is None:
+ msg = self.statemachine.memo.reporter.error(
+ ('Enumerated list start value invalid at line %s: '
+ '%r (sequence %r)' % (self.statemachine.abslineno(),
+ text, sequence)))
+ self.statemachine.node += msg
+ indented, lineoffset, blankfinish = \
+ self.statemachine.getknownindented(match.end())
+ bq = self.block_quote(indented, lineoffset)
+ self.statemachine.node += bq
+ if not blankfinish:
+ self.statemachine.node += self.unindentwarning()
+ return [], nextstate, []
+ if ordinal != 1:
+ msg = self.statemachine.memo.reporter.info(
+ ('Enumerated list start value not ordinal-1 at line %s: '
+ '%r (ordinal %s)' % (self.statemachine.abslineno(),
+ text, ordinal)))
+ self.statemachine.node += msg
+ enumlist = nodes.enumerated_list()
+ self.statemachine.node += enumlist
+ enumlist['enumtype'] = sequence
+ if ordinal != 1:
+ enumlist['start'] = ordinal
+ enumlist['prefix'] = self.enum.formatinfo[format].prefix
+ enumlist['suffix'] = self.enum.formatinfo[format].suffix
+ listitem, blankfinish = self.list_item(match.end())
+ enumlist += listitem
+ offset = self.statemachine.lineoffset + 1 # next line
+ newlineoffset, blankfinish = self.nestedlistparse(
+ self.statemachine.inputlines[offset:],
+ inputoffset=self.statemachine.abslineoffset() + 1,
+ node=enumlist, initialstate='EnumeratedList',
+ blankfinish=blankfinish,
+ extrasettings={'lastordinal': ordinal, 'format': format})
+ if not blankfinish:
+ self.statemachine.node += self.unindentwarning()
+ self.gotoline(newlineoffset)
+ return [], nextstate, []
+
+ def parse_enumerator(self, match, expectedsequence=None):
+ """
+ Analyze an enumerator and return the results.
+
+ :Return:
+ - the enumerator format ('period', 'parens', or 'rparen'),
+ - the sequence used ('arabic', 'loweralpha', 'upperroman', etc.),
+ - the text of the enumerator, stripped of formatting, and
+ - the ordinal value of the enumerator ('a' -> 1, 'ii' -> 2, etc.;
+ ``None`` is returned for invalid enumerator text).
+
+ The enumerator format has already been determined by the regular
+ expression match. If `expectedsequence` is given, that sequence is
+ tried first. If not, we check for Roman numeral 1. This way,
+ single-character Roman numerals (which are also alphabetical) can be
+ matched. If no sequence has been matched, all sequences are checked in
+ order.
+ """
+ groupdict = match.groupdict()
+ sequence = ''
+ for format in self.enum.formats:
+ if groupdict[format]: # was this the format matched?
+ break # yes; keep `format`
+ else: # shouldn't happen
+ raise ParserError, 'enumerator format not matched'
+ text = groupdict[format][self.enum.formatinfo[format].start
+ :self.enum.formatinfo[format].end]
+ if expectedsequence:
+ try:
+ if self.enum.sequenceregexps[expectedsequence].match(text):
+ sequence = expectedsequence
+ except KeyError: # shouldn't happen
+ raise ParserError, 'unknown sequence: %s' % sequence
+ else:
+ if text == 'i':
+ sequence = 'lowerroman'
+ elif text == 'I':
+ sequence = 'upperroman'
+ if not sequence:
+ for sequence in self.enum.sequences:
+ if self.enum.sequenceregexps[sequence].match(text):
+ break
+ else: # shouldn't happen
+ raise ParserError, 'enumerator sequence not matched'
+ try:
+ ordinal = self.enum.converters[sequence](text)
+ except roman.InvalidRomanNumeralError:
+ ordinal = None
+ return format, sequence, text, ordinal
+
+ def field_marker(self, match, context, nextstate):
+ """Field list item."""
+ fieldlist = nodes.field_list()
+ self.statemachine.node += fieldlist
+ field, blankfinish = self.field(match)
+ fieldlist += field
+ offset = self.statemachine.lineoffset + 1 # next line
+ newlineoffset, blankfinish = self.nestedlistparse(
+ self.statemachine.inputlines[offset:],
+ inputoffset=self.statemachine.abslineoffset() + 1,
+ node=fieldlist, initialstate='FieldList',
+ blankfinish=blankfinish)
+ if not blankfinish:
+ self.statemachine.node += self.unindentwarning()
+ self.gotoline(newlineoffset)
+ return [], nextstate, []
+
+ def field(self, match):
+ name, args = self.parse_field_marker(match)
+ indented, indent, lineoffset, blankfinish = \
+ self.statemachine.getfirstknownindented(match.end())
+ fieldnode = nodes.field()
+ fieldnode += nodes.field_name(name, name)
+ for arg in args:
+ fieldnode += nodes.field_argument(arg, arg)
+ fieldbody = nodes.field_body('\n'.join(indented))
+ fieldnode += fieldbody
+ if indented:
+ self.nestedparse(indented, inputoffset=lineoffset, node=fieldbody)
+ return fieldnode, blankfinish
+
+ def parse_field_marker(self, match):
+ """Extract & return name & argument list from a field marker match."""
+ field = match.string[1:] # strip off leading ':'
+ field = field[:field.find(':')] # strip off trailing ':' etc.
+ tokens = field.split()
+ return tokens[0], tokens[1:] # first == name, others == args
+
+ def option_marker(self, match, context, nextstate):
+ """Option list item."""
+ optionlist = nodes.option_list()
+ try:
+ listitem, blankfinish = self.option_list_item(match)
+ except MarkupError, detail: # shouldn't happen; won't match pattern
+ msg = self.statemachine.memo.reporter.error(
+ ('Invalid option list marker at line %s: %s'
+ % (self.statemachine.abslineno(), detail)))
+ self.statemachine.node += msg
+ indented, indent, lineoffset, blankfinish = \
+ self.statemachine.getfirstknownindented(match.end())
+ blockquote = self.block_quote(indented, lineoffset)
+ self.statemachine.node += blockquote
+ if not blankfinish:
+ self.statemachine.node += self.unindentwarning()
+ return [], nextstate, []
+ self.statemachine.node += optionlist
+ optionlist += listitem
+ offset = self.statemachine.lineoffset + 1 # next line
+ newlineoffset, blankfinish = self.nestedlistparse(
+ self.statemachine.inputlines[offset:],
+ inputoffset=self.statemachine.abslineoffset() + 1,
+ node=optionlist, initialstate='OptionList',
+ blankfinish=blankfinish)
+ if not blankfinish:
+ self.statemachine.node += self.unindentwarning()
+ self.gotoline(newlineoffset)
+ return [], nextstate, []
+
+ def option_list_item(self, match):
+ options = self.parse_option_marker(match)
+ indented, indent, lineoffset, blankfinish = \
+ self.statemachine.getfirstknownindented(match.end())
+ if not indented: # not an option list item
+ raise statemachine.TransitionCorrection('text')
+ option_group = nodes.option_group('', *options)
+ description = nodes.description('\n'.join(indented))
+ option_list_item = nodes.option_list_item('', option_group, description)
+ if indented:
+ self.nestedparse(indented, inputoffset=lineoffset, node=description)
+ return option_list_item, blankfinish
+
+ def parse_option_marker(self, match):
+ """
+ Return a list of `node.option` and `node.option_argument` objects,
+ parsed from an option marker match.
+
+ :Exception: `MarkupError` for invalid option markers.
+ """
+ optlist = []
+ optionstrings = match.group().rstrip().split(', ')
+ for optionstring in optionstrings:
+ tokens = optionstring.split()
+ delimiter = ' '
+ firstopt = tokens[0].split('=')
+ if len(firstopt) > 1:
+ tokens[:1] = firstopt
+ delimiter = '='
+ if 0 < len(tokens) <= 2:
+ option = nodes.option(optionstring)
+ option += nodes.option_string(tokens[0], tokens[0])
+ if len(tokens) > 1:
+ option += nodes.option_argument(tokens[1], tokens[1],
+ delimiter=delimiter)
+ optlist.append(option)
+ else:
+ raise MarkupError('wrong numer of option tokens (=%s), '
+ 'should be 1 or 2: %r' % (len(tokens),
+ optionstring))
+ return optlist
+
+ def doctest(self, match, context, nextstate):
+ data = '\n'.join(self.statemachine.gettextblock())
+ self.statemachine.node += nodes.doctest_block(data, data)
+ return [], nextstate, []
+
+ def tabletop(self, match, context, nextstate):
+ """Top border of a table."""
+ nodelist, blankfinish = self.table()
+ self.statemachine.node += nodelist
+ if not blankfinish:
+ msg = self.statemachine.memo.reporter.warning(
+ 'Blank line required after table at line %s.'
+ % (self.statemachine.abslineno() + 1))
+ self.statemachine.node += msg
+ return [], nextstate, []
+
+ def table(self):
+ """Parse a table."""
+ block, messages, blankfinish = self.isolatetable()
+ if block:
+ try:
+ tabledata = self.tableparser.parse(block)
+ tableline = self.statemachine.abslineno() - len(block) + 1
+ table = self.buildtable(tabledata, tableline)
+ nodelist = [table] + messages
+ except TableMarkupError, detail:
+ nodelist = self.malformedtable(block, str(detail)) + messages
+ else:
+ nodelist = messages
+ return nodelist, blankfinish
+
+ def isolatetable(self):
+ messages = []
+ blankfinish = 1
+ try:
+ block = self.statemachine.getunindented()
+ except statemachine.UnexpectedIndentationError, instance:
+ block, lineno = instance.args
+ messages.append(self.statemachine.memo.reporter.error(
+ 'Unexpected indentation at line %s.' % lineno))
+ blankfinish = 0
+ width = len(block[0].strip())
+ for i in range(len(block)):
+ block[i] = block[i].strip()
+ if block[i][0] not in '+|': # check left edge
+ blankfinish = 0
+ self.statemachine.previousline(len(block) - i)
+ del block[i:]
+ break
+ if not self.tabletoppat.match(block[-1]): # find bottom
+ blankfinish = 0
+ # from second-last to third line of table:
+ for i in range(len(block) - 2, 1, -1):
+ if self.tabletoppat.match(block[i]):
+ self.statemachine.previousline(len(block) - i + 1)
+ del block[i+1:]
+ break
+ else:
+ messages.extend(self.malformedtable(block))
+ return [], messages, blankfinish
+ for i in range(len(block)): # check right edge
+ if len(block[i]) != width or block[i][-1] not in '+|':
+ messages.extend(self.malformedtable(block))
+ return [], messages, blankfinish
+ return block, messages, blankfinish
+
+ def malformedtable(self, block, detail=''):
+ data = '\n'.join(block)
+ message = 'Malformed table at line %s; formatting as a ' \
+ 'literal block.' % (self.statemachine.abslineno()
+ - len(block) + 1)
+ if detail:
+ message += '\n' + detail
+ nodelist = [self.statemachine.memo.reporter.error(message),
+ nodes.literal_block(data, data)]
+ return nodelist
+
+ def buildtable(self, tabledata, tableline):
+ colspecs, headrows, bodyrows = tabledata
+ table = nodes.table()
+ tgroup = nodes.tgroup(cols=len(colspecs))
+ table += tgroup
+ for colspec in colspecs:
+ tgroup += nodes.colspec(colwidth=colspec)
+ if headrows:
+ thead = nodes.thead()
+ tgroup += thead
+ for row in headrows:
+ thead += self.buildtablerow(row, tableline)
+ tbody = nodes.tbody()
+ tgroup += tbody
+ for row in bodyrows:
+ tbody += self.buildtablerow(row, tableline)
+ return table
+
+ def buildtablerow(self, rowdata, tableline):
+ row = nodes.row()
+ for cell in rowdata:
+ if cell is None:
+ continue
+ morerows, morecols, offset, cellblock = cell
+ attributes = {}
+ if morerows:
+ attributes['morerows'] = morerows
+ if morecols:
+ attributes['morecols'] = morecols
+ entry = nodes.entry(**attributes)
+ row += entry
+ if ''.join(cellblock):
+ self.nestedparse(cellblock, inputoffset=tableline+offset,
+ node=entry)
+ return row
+
+
+ explicit = Stuff()
+ """Patterns and constants used for explicit markup recognition."""
+
+ explicit.patterns = Stuff(
+ target=re.compile(r"""
+ (?:
+ _ # anonymous target
+ | # *OR*
+ (`?) # optional open quote
+ (?![ `]) # first char. not space or backquote
+ ( # reference name
+ .+?
+ )
+ %s # not whitespace or escape
+ \1 # close quote if open quote used
+ )
+ %s # not whitespace or escape
+ : # end of reference name
+ (?:[ ]+|$) # followed by whitespace
+ """
+ % (RSTState.inline.non_whitespace_escape_before,
+ RSTState.inline.non_whitespace_escape_before),
+ re.VERBOSE),
+ reference=re.compile(r"""
+ (?:
+ (%s)_ # simple reference name
+ | # *OR*
+ ` # open backquote
+ (?![ ]) # not space
+ (.+?) # hyperlink phrase
+ %s # not whitespace or escape
+ `_ # close backquote & reference mark
+ )
+ $ # end of string
+ """ %
+ (RSTState.inline.simplename,
+ RSTState.inline.non_whitespace_escape_before,),
+ re.VERBOSE),
+ substitution=re.compile(r"""
+ (?:
+ (?![ ]) # first char. not space
+ (.+?) # substitution text
+ %s # not whitespace or escape
+ \| # close delimiter
+ )
+ (?:[ ]+|$) # followed by whitespace
+ """ %
+ RSTState.inline.non_whitespace_escape_before,
+ re.VERBOSE),)
+ explicit.groups = Stuff(
+ target=Stuff(quote=1, name=2),
+ reference=Stuff(simple=1, phrase=2),
+ substitution=Stuff(name=1))
+
+ def footnote(self, match):
+ indented, indent, offset, blankfinish = \
+ self.statemachine.getfirstknownindented(match.end())
+ label = match.group(1)
+ name = normname(label)
+ footnote = nodes.footnote('\n'.join(indented))
+ if name[0] == '#': # auto-numbered
+ name = name[1:] # autonumber label
+ footnote['auto'] = 1
+ if name:
+ footnote['name'] = name
+ self.statemachine.memo.document.note_autofootnote(footnote)
+ elif name == '*': # auto-symbol
+ name = ''
+ footnote['auto'] = '*'
+ self.statemachine.memo.document.note_symbol_footnote(footnote)
+ else: # manually numbered
+ footnote += nodes.label('', label)
+ footnote['name'] = name
+ self.statemachine.memo.document.note_footnote(footnote)
+ if name:
+ self.statemachine.memo.document.note_explicit_target(footnote,
+ footnote)
+ if indented:
+ self.nestedparse(indented, inputoffset=offset, node=footnote)
+ return [footnote], blankfinish
+
+ def citation(self, match):
+ indented, indent, offset, blankfinish = \
+ self.statemachine.getfirstknownindented(match.end())
+ label = match.group(1)
+ name = normname(label)
+ citation = nodes.citation('\n'.join(indented))
+ citation += nodes.label('', label)
+ citation['name'] = name
+ self.statemachine.memo.document.note_citation(citation)
+ self.statemachine.memo.document.note_explicit_target(citation, citation)
+ if indented:
+ self.nestedparse(indented, inputoffset=offset, node=citation)
+ return [citation], blankfinish
+
+ def hyperlink_target(self, match):
+ pattern = self.explicit.patterns.target
+ namegroup = self.explicit.groups.target.name
+ lineno = self.statemachine.abslineno()
+ block, indent, offset, blankfinish = \
+ self.statemachine.getfirstknownindented(match.end(), uptoblank=1,
+ stripindent=0)
+ blocktext = match.string[:match.end()] + '\n'.join(block)
+ block = [escape2null(line) for line in block]
+ escaped = block[0]
+ blockindex = 0
+ while 1:
+ targetmatch = pattern.match(escaped)
+ if targetmatch:
+ break
+ blockindex += 1
+ try:
+ escaped += block[blockindex]
+ except (IndexError, MarkupError):
+ raise MarkupError('malformed hyperlink target at line %s.'
+ % lineno)
+ del block[:blockindex]
+ block[0] = (block[0] + ' ')[targetmatch.end()-len(escaped)-1:].strip()
+ if block and block[-1].strip()[-1:] == '_': # possible indirect target
+ reference = ' '.join([line.strip() for line in block])
+ refname = self.isreference(reference)
+ if refname:
+ target = nodes.target(blocktext, '', refname=refname)
+ self.addtarget(targetmatch.group(namegroup), '', target)
+ self.statemachine.memo.document.note_indirect_target(target)
+ return [target], blankfinish
+ nodelist = []
+ reference = ''.join([line.strip() for line in block])
+ if reference.find(' ') != -1:
+ warning = self.statemachine.memo.reporter.warning(
+ 'Hyperlink target at line %s contains whitespace. '
+ 'Perhaps a footnote was intended?'
+ % (self.statemachine.abslineno() - len(block) + 1), '',
+ nodes.literal_block(blocktext, blocktext))
+ nodelist.append(warning)
+ else:
+ unescaped = unescape(reference)
+ target = nodes.target(blocktext, '')
+ self.addtarget(targetmatch.group(namegroup), unescaped, target)
+ nodelist.append(target)
+ return nodelist, blankfinish
+
+ def isreference(self, reference):
+ match = self.explicit.patterns.reference.match(normname(reference))
+ if not match:
+ return None
+ return unescape(match.group(self.explicit.groups.reference.simple)
+ or match.group(self.explicit.groups.reference.phrase))
+
+ def addtarget(self, targetname, refuri, target):
+ if targetname:
+ name = normname(unescape(targetname))
+ target['name'] = name
+ if refuri:
+ target['refuri'] = refuri
+ self.statemachine.memo.document.note_external_target(target)
+ else:
+ self.statemachine.memo.document.note_internal_target(target)
+ self.statemachine.memo.document.note_explicit_target(
+ target, self.statemachine.node)
+ else: # anonymous target
+ if refuri:
+ target['refuri'] = refuri
+ target['anonymous'] = 1
+ self.statemachine.memo.document.note_anonymous_target(target)
+
+ def substitutiondef(self, match):
+ pattern = self.explicit.patterns.substitution
+ lineno = self.statemachine.abslineno()
+ block, indent, offset, blankfinish = \
+ self.statemachine.getfirstknownindented(match.end(),
+ stripindent=0)
+ blocktext = (match.string[:match.end()] + '\n'.join(block))
+ block = [escape2null(line) for line in block]
+ escaped = block[0].rstrip()
+ blockindex = 0
+ while 1:
+ subdefmatch = pattern.match(escaped)
+ if subdefmatch:
+ break
+ blockindex += 1
+ try:
+ escaped = escaped + ' ' + block[blockindex].strip()
+ except (IndexError, MarkupError):
+ raise MarkupError('malformed substitution definition '
+ 'at line %s.' % lineno)
+ del block[:blockindex] # strip out the substitution marker
+ block[0] = (block[0] + ' ')[subdefmatch.end()-len(escaped)-1:].strip()
+ if not block[0]:
+ del block[0]
+ offset += 1
+ subname = subdefmatch.group(self.explicit.groups.substitution.name)
+ name = normname(subname)
+ substitutionnode = nodes.substitution_definition(
+ blocktext, name=name, alt=subname)
+ if block:
+ block[0] = block[0].strip()
+ newabsoffset, blankfinish = self.nestedlistparse(
+ block, inputoffset=offset, node=substitutionnode,
+ initialstate='SubstitutionDef', blankfinish=blankfinish)
+ self.statemachine.previousline(
+ len(block) + offset - newabsoffset - 1)
+ i = 0
+ for node in substitutionnode[:]:
+ if not (isinstance(node, nodes.Inline) or
+ isinstance(node, nodes.Text)):
+ self.statemachine.node += substitutionnode[i]
+ del substitutionnode[i]
+ else:
+ i += 1
+ if len(substitutionnode) == 0:
+ msg = self.statemachine.memo.reporter.warning(
+ 'Substitution definition "%s" empty or invalid at line '
+ '%s.' % (subname, self.statemachine.abslineno()), '',
+ nodes.literal_block(blocktext, blocktext))
+ self.statemachine.node += msg
+ else:
+ del substitutionnode['alt']
+ self.statemachine.memo.document.note_substitution_def(
+ substitutionnode, self.statemachine.node)
+ return [substitutionnode], blankfinish
+ else:
+ msg = self.statemachine.memo.reporter.warning(
+ 'Substitution definition "%s" missing contents at line %s.'
+ % (subname, self.statemachine.abslineno()), '',
+ nodes.literal_block(blocktext, blocktext))
+ self.statemachine.node += msg
+ return [], blankfinish
+
+ def directive(self, match, **attributes):
+ typename = match.group(1)
+ directivefunction = directives.directive(
+ typename, self.statemachine.memo.language)
+ data = match.string[match.end():].strip()
+ if directivefunction:
+ return directivefunction(match, typename, data, self,
+ self.statemachine, attributes)
+ else:
+ return self.unknowndirective(typename, data)
+
+ def unknowndirective(self, typename, data):
+ lineno = self.statemachine.abslineno()
+ indented, indent, offset, blankfinish = \
+ self.statemachine.getfirstknownindented(0, stripindent=0)
+ text = '\n'.join(indented)
+ error = self.statemachine.memo.reporter.error(
+ 'Unknown directive type "%s" at line %s.' % (typename, lineno),
+ '', nodes.literal_block(text, text))
+ return [error], blankfinish
+
+ def parse_extension_attributes(self, attribute_spec, datalines, blankfinish):
+ """
+ Parse `datalines` for a field list containing extension attributes
+ matching `attribute_spec`.
+
+ :Parameters:
+ - `attribute_spec`: a mapping of attribute name to conversion
+ function, which should raise an exception on bad input.
+ - `datalines`: a list of input strings.
+ - `blankfinish`:
+
+ :Return:
+ - Success value, 1 or 0.
+ - An attribute dictionary on success, an error string on failure.
+ - Updated `blankfinish` flag.
+ """
+ node = nodes.field_list()
+ newlineoffset, blankfinish = self.nestedlistparse(
+ datalines, 0, node, initialstate='FieldList',
+ blankfinish=blankfinish)
+ if newlineoffset != len(datalines): # incomplete parse of block
+ return 0, 'invalid attribute block', blankfinish
+ try:
+ attributes = utils.extract_extension_attributes(node, attribute_spec)
+ except KeyError, detail:
+ return 0, ('unknown attribute: "%s"' % detail), blankfinish
+ except (ValueError, TypeError), detail:
+ return 0, ('invalid attribute value:\n%s' % detail), blankfinish
+ except utils.ExtensionAttributeError, detail:
+ return 0, ('invalid attribute data: %s' % detail), blankfinish
+ return 1, attributes, blankfinish
+
+ def comment(self, match):
+ if not match.string[match.end():].strip() \
+ and self.statemachine.nextlineblank(): # an empty comment?
+ return [nodes.comment()], 1 # "A tiny but practical wart."
+ indented, indent, offset, blankfinish = \
+ self.statemachine.getfirstknownindented(match.end())
+ text = '\n'.join(indented)
+ return [nodes.comment(text, text)], blankfinish
+
+ explicit.constructs = [
+ (footnote,
+ re.compile(r"""
+ \.\.[ ]+ # explicit markup start
+ \[
+ ( # footnote label:
+ [0-9]+ # manually numbered footnote
+ | # *OR*
+ \# # anonymous auto-numbered footnote
+ | # *OR*
+ \#%s # auto-number ed?) footnote label
+ | # *OR*
+ \* # auto-symbol footnote
+ )
+ \]
+ (?:[ ]+|$) # whitespace or end of line
+ """ % RSTState.inline.simplename, re.VERBOSE)),
+ (citation,
+ re.compile(r"""
+ \.\.[ ]+ # explicit markup start
+ \[(%s)\] # citation label
+ (?:[ ]+|$) # whitespace or end of line
+ """ % RSTState.inline.simplename, re.VERBOSE)),
+ (hyperlink_target,
+ re.compile(r"""
+ \.\.[ ]+ # explicit markup start
+ _ # target indicator
+ (?![ ]) # first char. not space
+ """, re.VERBOSE)),
+ (substitutiondef,
+ re.compile(r"""
+ \.\.[ ]+ # explicit markup start
+ \| # substitution indicator
+ (?![ ]) # first char. not space
+ """, re.VERBOSE)),
+ (directive,
+ re.compile(r"""
+ \.\.[ ]+ # explicit markup start
+ (%s) # directive name
+ :: # directive delimiter
+ (?:[ ]+|$) # whitespace or end of line
+ """ % RSTState.inline.simplename, re.VERBOSE))]
+
+ def explicit_markup(self, match, context, nextstate):
+ """Footnotes, hyperlink targets, directives, comments."""
+ nodelist, blankfinish = self.explicit_construct(match)
+ self.statemachine.node += nodelist
+ self.explicitlist(blankfinish)
+ return [], nextstate, []
+
+ def explicit_construct(self, match):
+ """Determine which explicit construct this is, parse & return it."""
+ errors = []
+ for method, pattern in self.explicit.constructs:
+ expmatch = pattern.match(match.string)
+ if expmatch:
+ try:
+ return method(self, expmatch)
+ except MarkupError, detail: # never reached?
+ errors.append(
+ self.statemachine.memo.reporter.warning('%s: %s'
+ % (detail.__class__.__name__, detail)))
+ break
+ nodelist, blankfinish = self.comment(match)
+ return nodelist + errors, blankfinish
+
+ def explicitlist(self, blankfinish):
+ """
+ Create a nested state machine for a series of explicit markup constructs
+ (including anonymous hyperlink targets).
+ """
+ offset = self.statemachine.lineoffset + 1 # next line
+ newlineoffset, blankfinish = self.nestedlistparse(
+ self.statemachine.inputlines[offset:],
+ inputoffset=self.statemachine.abslineoffset() + 1,
+ node=self.statemachine.node, initialstate='Explicit',
+ blankfinish=blankfinish)
+ self.gotoline(newlineoffset)
+ if not blankfinish:
+ self.statemachine.node += self.unindentwarning()
+
+ def anonymous(self, match, context, nextstate):
+ """Anonymous hyperlink targets."""
+ nodelist, blankfinish = self.anonymous_target(match)
+ self.statemachine.node += nodelist
+ self.explicitlist(blankfinish)
+ return [], nextstate, []
+
+ def anonymous_target(self, match):
+ block, indent, offset, blankfinish \
+ = self.statemachine.getfirstknownindented(match.end(),
+ uptoblank=1)
+ blocktext = match.string[:match.end()] + '\n'.join(block)
+ if block and block[-1].strip()[-1:] == '_': # possible indirect target
+ reference = escape2null(' '.join([line.strip() for line in block]))
+ refname = self.isreference(reference)
+ if refname:
+ target = nodes.target(blocktext, '', refname=refname,
+ anonymous=1)
+ self.statemachine.memo.document.note_anonymous_target(target)
+ self.statemachine.memo.document.note_indirect_target(target)
+ return [target], blankfinish
+ nodelist = []
+ reference = escape2null(''.join([line.strip() for line in block]))
+ if reference.find(' ') != -1:
+ warning = self.statemachine.memo.reporter.warning(
+ 'Anonymous hyperlink target at line %s contains whitespace. '
+ 'Perhaps a footnote was intended?'
+ % (self.statemachine.abslineno() - len(block) + 1), '',
+ nodes.literal_block(blocktext, blocktext))
+ nodelist.append(warning)
+ else:
+ target = nodes.target(blocktext, '', anonymous=1)
+ if reference:
+ unescaped = unescape(reference)
+ target['refuri'] = unescaped
+ self.statemachine.memo.document.note_anonymous_target(target)
+ nodelist.append(target)
+ return nodelist, blankfinish
+
+ def line(self, match, context, nextstate):
+ """Section title overline or transition marker."""
+ if self.statemachine.matchtitles:
+ return [match.string], 'Line', []
+ else:
+ blocktext = self.statemachine.line
+ msg = self.statemachine.memo.reporter.severe(
+ 'Unexpected section title or transition at line %s.'
+ % self.statemachine.abslineno(), '',
+ nodes.literal_block(blocktext, blocktext))
+ self.statemachine.node += msg
+ return [], nextstate, []
+
+ def text(self, match, context, nextstate):
+ """Titles, definition lists, paragraphs."""
+ return [match.string], 'Text', []
+
+
+class SpecializedBody(Body):
+
+ """
+ Superclass for second and subsequent compound element members.
+
+ All transition methods are disabled. Override individual methods in
+ subclasses to re-enable.
+ """
+
+ def invalid_input(self, match=None, context=None, nextstate=None):
+ """Not a compound element member. Abort this state machine."""
+ self.statemachine.previousline() # back up so parent SM can reassess
+ raise EOFError
+
+ indent = invalid_input
+ bullet = invalid_input
+ enumerator = invalid_input
+ field_marker = invalid_input
+ option_marker = invalid_input
+ doctest = invalid_input
+ tabletop = invalid_input
+ explicit_markup = invalid_input
+ anonymous = invalid_input
+ line = invalid_input
+ text = invalid_input
+
+
+class BulletList(SpecializedBody):
+
+ """Second and subsequent bullet_list list_items."""
+
+ def bullet(self, match, context, nextstate):
+ """Bullet list item."""
+ if match.string[0] != self.statemachine.node['bullet']:
+ # different bullet: new list
+ self.invalid_input()
+ listitem, blankfinish = self.list_item(match.end())
+ self.statemachine.node += listitem
+ self.blankfinish = blankfinish
+ return [], 'BulletList', []
+
+
+class DefinitionList(SpecializedBody):
+
+ """Second and subsequent definition_list_items."""
+
+ def text(self, match, context, nextstate):
+ """Definition lists."""
+ return [match.string], 'Definition', []
+
+
+class EnumeratedList(SpecializedBody):
+
+ """Second and subsequent enumerated_list list_items."""
+
+ def enumerator(self, match, context, nextstate):
+ """Enumerated list item."""
+ format, sequence, text, ordinal = self.parse_enumerator(
+ match, self.statemachine.node['enumtype'])
+ if (sequence != self.statemachine.node['enumtype'] or
+ format != self.format or
+ ordinal != self.lastordinal + 1):
+ # different enumeration: new list
+ self.invalid_input()
+ listitem, blankfinish = self.list_item(match.end())
+ self.statemachine.node += listitem
+ self.blankfinish = blankfinish
+ self.lastordinal = ordinal
+ return [], 'EnumeratedList', []
+
+
+class FieldList(SpecializedBody):
+
+ """Second and subsequent field_list fields."""
+
+ def field_marker(self, match, context, nextstate):
+ """Field list field."""
+ field, blankfinish = self.field(match)
+ self.statemachine.node += field
+ self.blankfinish = blankfinish
+ return [], 'FieldList', []
+
+
+class OptionList(SpecializedBody):
+
+ """Second and subsequent option_list option_list_items."""
+
+ def option_marker(self, match, context, nextstate):
+ """Option list item."""
+ try:
+ option_list_item, blankfinish = self.option_list_item(match)
+ except MarkupError, detail:
+ self.invalid_input()
+ self.statemachine.node += option_list_item
+ self.blankfinish = blankfinish
+ return [], 'OptionList', []
+
+
+class RFC822List(SpecializedBody):
+
+ """Second and subsequent RFC822 field_list fields."""
+
+ pass
+
+
+class Explicit(SpecializedBody):
+
+ """Second and subsequent explicit markup construct."""
+
+ def explicit_markup(self, match, context, nextstate):
+ """Footnotes, hyperlink targets, directives, comments."""
+ nodelist, blankfinish = self.explicit_construct(match)
+ self.statemachine.node += nodelist
+ self.blankfinish = blankfinish
+ return [], nextstate, []
+
+ def anonymous(self, match, context, nextstate):
+ """Anonymous hyperlink targets."""
+ nodelist, blankfinish = self.anonymous_target(match)
+ self.statemachine.node += nodelist
+ self.blankfinish = blankfinish
+ return [], nextstate, []
+
+
+class SubstitutionDef(Body):
+
+ """
+ Parser for the contents of a substitution_definition element.
+ """
+
+ patterns = {
+ 'embedded_directive': r'(%s)::( +|$)' % RSTState.inline.simplename,
+ 'text': r''}
+ initialtransitions = ['embedded_directive', 'text']
+
+ def embedded_directive(self, match, context, nextstate):
+ if self.statemachine.node.has_key('alt'):
+ attributes = {'alt': self.statemachine.node['alt']}
+ else:
+ attributes = {}
+ nodelist, blankfinish = self.directive(match, **attributes)
+ self.statemachine.node += nodelist
+ if not self.statemachine.ateof():
+ self.blankfinish = blankfinish
+ raise EOFError
+
+ def text(self, match, context, nextstate):
+ if not self.statemachine.ateof():
+ self.blankfinish = self.statemachine.nextlineblank()
+ raise EOFError
+
+
+class Text(RSTState):
+
+ """
+ Classifier of second line of a text block.
+
+ Could be a paragraph, a definition list item, or a title.
+ """
+
+ patterns = {'underline': Body.patterns['line'],
+ 'text': r''}
+ initialtransitions = [('underline', 'Body'), ('text', 'Body')]
+
+ def blank(self, match, context, nextstate):
+ """End of paragraph."""
+ paragraph, literalnext = self.paragraph(
+ context, self.statemachine.abslineno() - 1)
+ self.statemachine.node += paragraph
+ if literalnext:
+ self.statemachine.node += self.literal_block()
+ return [], 'Body', []
+
+ def eof(self, context):
+ if context:
+ paragraph, literalnext = self.paragraph(
+ context, self.statemachine.abslineno() - 1)
+ self.statemachine.node += paragraph
+ if literalnext:
+ self.statemachine.node += self.literal_block()
+ return []
+
+ def indent(self, match, context, nextstate):
+ """Definition list item."""
+ definitionlist = nodes.definition_list()
+ definitionlistitem, blankfinish = self.definition_list_item(context)
+ definitionlist += definitionlistitem
+ self.statemachine.node += definitionlist
+ offset = self.statemachine.lineoffset + 1 # next line
+ newlineoffset, blankfinish = self.nestedlistparse(
+ self.statemachine.inputlines[offset:],
+ inputoffset=self.statemachine.abslineoffset() + 1,
+ node=definitionlist, initialstate='DefinitionList',
+ blankfinish=blankfinish, blankfinishstate='Definition')
+ if not blankfinish:
+ self.statemachine.node += self.unindentwarning()
+ self.gotoline(newlineoffset)
+ return [], 'Body', []
+
+ def underline(self, match, context, nextstate):
+ """Section title."""
+ lineno = self.statemachine.abslineno()
+ if not self.statemachine.matchtitles:
+ blocktext = context[0] + '\n' + self.statemachine.line
+ msg = self.statemachine.memo.reporter.severe(
+ 'Unexpected section title at line %s.' % lineno, '',
+ nodes.literal_block(blocktext, blocktext))
+ self.statemachine.node += msg
+ return [], nextstate, []
+ title = context[0].rstrip()
+ underline = match.string.rstrip()
+ source = title + '\n' + underline
+ if len(title) > len(underline):
+ blocktext = context[0] + '\n' + self.statemachine.line
+ msg = self.statemachine.memo.reporter.info(
+ 'Title underline too short at line %s.' % lineno, '',
+ nodes.literal_block(blocktext, blocktext))
+ self.statemachine.node += msg
+ style = underline[0]
+ context[:] = []
+ self.section(title, source, style, lineno - 1)
+ return [], nextstate, []
+
+ def text(self, match, context, nextstate):
+ """Paragraph."""
+ startline = self.statemachine.abslineno() - 1
+ msg = None
+ try:
+ block = self.statemachine.getunindented()
+ except statemachine.UnexpectedIndentationError, instance:
+ block, lineno = instance.args
+ msg = self.statemachine.memo.reporter.error(
+ 'Unexpected indentation at line %s.' % lineno)
+ lines = context + block
+ paragraph, literalnext = self.paragraph(lines, startline)
+ self.statemachine.node += paragraph
+ self.statemachine.node += msg
+ if literalnext:
+ try:
+ self.statemachine.nextline()
+ except IndexError:
+ pass
+ self.statemachine.node += self.literal_block()
+ return [], nextstate, []
+
+ def literal_block(self):
+ """Return a list of nodes."""
+ indented, indent, offset, blankfinish = \
+ self.statemachine.getindented()
+ nodelist = []
+ while indented and not indented[-1].strip():
+ indented.pop()
+ if indented:
+ data = '\n'.join(indented)
+ nodelist.append(nodes.literal_block(data, data))
+ if not blankfinish:
+ nodelist.append(self.unindentwarning())
+ else:
+ nodelist.append(self.statemachine.memo.reporter.warning(
+ 'Literal block expected at line %s; none found.'
+ % self.statemachine.abslineno()))
+ return nodelist
+
+ def definition_list_item(self, termline):
+ indented, indent, lineoffset, blankfinish = \
+ self.statemachine.getindented()
+ definitionlistitem = nodes.definition_list_item('\n'.join(termline
+ + indented))
+ termlist, messages = self.term(termline,
+ self.statemachine.abslineno() - 1)
+ definitionlistitem += termlist
+ definition = nodes.definition('', *messages)
+ definitionlistitem += definition
+ if termline[0][-2:] == '::':
+ definition += self.statemachine.memo.reporter.info(
+ 'Blank line missing before literal block? Interpreted as a '
+ 'definition list item. At line %s.' % (lineoffset + 1))
+ self.nestedparse(indented, inputoffset=lineoffset, node=definition)
+ return definitionlistitem, blankfinish
+
+ def term(self, lines, lineno):
+ """Return a definition_list's term and optional classifier."""
+ assert len(lines) == 1
+ nodelist = []
+ parts = lines[0].split(' : ', 1) # split into 1 or 2 parts
+ termpart = parts[0].rstrip()
+ textnodes, messages = self.inline_text(termpart, lineno)
+ nodelist = [nodes.term(termpart, '', *textnodes)]
+ if len(parts) == 2:
+ classifierpart = parts[1].lstrip()
+ textnodes, cpmessages = self.inline_text(classifierpart, lineno)
+ nodelist.append(nodes.classifier(classifierpart, '', *textnodes))
+ messages += cpmessages
+ return nodelist, messages
+
+
+class SpecializedText(Text):
+
+ """
+ Superclass for second and subsequent lines of Text-variants.
+
+ All transition methods are disabled. Override individual methods in
+ subclasses to re-enable.
+ """
+
+ def eof(self, context):
+ """Incomplete construct."""
+ return []
+
+ def invalid_input(self, match=None, context=None, nextstate=None):
+ """Not a compound element member. Abort this state machine."""
+ raise EOFError
+
+ blank = invalid_input
+ indent = invalid_input
+ underline = invalid_input
+ text = invalid_input
+
+
+class Definition(SpecializedText):
+
+ """Second line of potential definition_list_item."""
+
+ def eof(self, context):
+ """Not a definition."""
+ self.statemachine.previousline(2) # back up so parent SM can reassess
+ return []
+
+ def indent(self, match, context, nextstate):
+ """Definition list item."""
+ definitionlistitem, blankfinish = self.definition_list_item(context)
+ self.statemachine.node += definitionlistitem
+ self.blankfinish = blankfinish
+ return [], 'DefinitionList', []
+
+
+class Line(SpecializedText):
+
+ """Second line of over- & underlined section title or transition marker."""
+
+ eofcheck = 1 # @@@ ???
+ """Set to 0 while parsing sections, so that we don't catch the EOF."""
+
+ def eof(self, context):
+ """Transition marker at end of section or document."""
+ if self.eofcheck: # ignore EOFError with sections
+ transition = nodes.transition(context[0])
+ self.statemachine.node += transition
+ msg = self.statemachine.memo.reporter.error(
+ 'Document or section may not end with a transition '
+ '(line %s).' % (self.statemachine.abslineno() - 1))
+ self.statemachine.node += msg
+ self.eofcheck = 1
+ return []
+
+ def blank(self, match, context, nextstate):
+ """Transition marker."""
+ transition = nodes.transition(context[0])
+ if len(self.statemachine.node) == 0:
+ msg = self.statemachine.memo.reporter.error(
+ 'Document or section may not begin with a transition '
+ '(line %s).' % (self.statemachine.abslineno() - 1))
+ self.statemachine.node += msg
+ elif isinstance(self.statemachine.node[-1], nodes.transition):
+ msg = self.statemachine.memo.reporter.error(
+ 'At least one body element must separate transitions; '
+ 'adjacent transitions at line %s.'
+ % (self.statemachine.abslineno() - 1))
+ self.statemachine.node += msg
+ self.statemachine.node += transition
+ return [], 'Body', []
+
+ def text(self, match, context, nextstate):
+ """Potential over- & underlined title."""
+ lineno = self.statemachine.abslineno() - 1
+ overline = context[0]
+ title = match.string
+ underline = ''
+ try:
+ underline = self.statemachine.nextline()
+ except IndexError:
+ blocktext = overline + '\n' + title
+ msg = self.statemachine.memo.reporter.severe(
+ 'Incomplete section title at line %s.' % lineno, '',
+ nodes.literal_block(blocktext, blocktext))
+ self.statemachine.node += msg
+ return [], 'Body', []
+ source = '%s\n%s\n%s' % (overline, title, underline)
+ overline = overline.rstrip()
+ underline = underline.rstrip()
+ if not self.transitions['underline'][0].match(underline):
+ msg = self.statemachine.memo.reporter.severe(
+ 'Missing underline for overline at line %s.' % lineno, '',
+ nodes.literal_block(source, source))
+ self.statemachine.node += msg
+ return [], 'Body', []
+ elif overline != underline:
+ msg = self.statemachine.memo.reporter.severe(
+ 'Title overline & underline mismatch at ' 'line %s.' % lineno,
+ '', nodes.literal_block(source, source))
+ self.statemachine.node += msg
+ return [], 'Body', []
+ title = title.rstrip()
+ if len(title) > len(overline):
+ msg = self.statemachine.memo.reporter.info(
+ 'Title overline too short at line %s.'% lineno, '',
+ nodes.literal_block(source, source))
+ self.statemachine.node += msg
+ style = (overline[0], underline[0])
+ self.eofcheck = 0 # @@@ not sure this is correct
+ self.section(title.lstrip(), source, style, lineno + 1)
+ self.eofcheck = 1
+ return [], 'Body', []
+
+ indent = text # indented title
+
+ def underline(self, match=None, context=None, nextstate=None):
+ blocktext = context[0] + '\n' + self.statemachine.line
+ msg = self.statemachine.memo.reporter.error(
+ 'Invalid section title or transition marker at line %s.'
+ % (self.statemachine.abslineno() - 1), '',
+ nodes.literal_block(blocktext, blocktext))
+ self.statemachine.node += msg
+ return [], 'Body', []
+
+
+stateclasses = [Body, BulletList, DefinitionList, EnumeratedList, FieldList,
+ OptionList, RFC822List, Explicit, Text, Definition, Line,
+ SubstitutionDef]
+"""Standard set of State classes used to start `RSTStateMachine`."""
+
+
+def escape2null(text):
+ """Return a string with escape-backslashes converted to nulls."""
+ parts = []
+ start = 0
+ while 1:
+ found = text.find('\\', start)
+ if found == -1:
+ parts.append(text[start:])
+ return ''.join(parts)
+ parts.append(text[start:found])
+ parts.append('\x00' + text[found+1:found+2])
+ start = found + 2 # skip character after escape
+
+def unescape(text, restorebackslashes=0):
+ """Return a string with nulls removed or restored to backslashes."""
+ if restorebackslashes:
+ return text.translate(RSTState.inline.null2backslash)
+ else:
+ return text.translate(RSTState.inline.identity, '\x00')
diff --git a/docutils/parsers/rst/tableparser.py b/docutils/parsers/rst/tableparser.py
new file mode 100644
index 000000000..7bacf99cd
--- /dev/null
+++ b/docutils/parsers/rst/tableparser.py
@@ -0,0 +1,313 @@
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+This module defines the `TableParser` class, which parses a plaintext-graphic
+table and produces a well-formed data structure suitable for building a CALS
+table.
+
+:Exception class: `TableMarkupError`
+
+:Function:
+ `update_dictoflists()`: Merge two dictionaries containing list values.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+import re
+
+
+class TableMarkupError(Exception): pass
+
+
+class TableParser:
+
+ """
+ Parse a plaintext graphic table using `parse()`.
+
+ Here's an example of a plaintext graphic table::
+
+ +------------------------+------------+----------+----------+
+ | Header row, column 1 | Header 2 | Header 3 | Header 4 |
+ +========================+============+==========+==========+
+ | body row 1, column 1 | column 2 | column 3 | column 4 |
+ +------------------------+------------+----------+----------+
+ | body row 2 | Cells may span columns. |
+ +------------------------+------------+---------------------+
+ | body row 3 | Cells may | - Table cells |
+ +------------------------+ span rows. | - contain |
+ | body row 4 | | - body elements. |
+ +------------------------+------------+---------------------+
+
+ Intersections use '+', row separators use '-' (except for one optional
+ head/body row separator, which uses '='), and column separators use '|'.
+
+ Passing the above table to the `parse()` method will result in the
+ following data structure::
+
+ ([24, 12, 10, 10],
+ [[(0, 0, 1, ['Header row, column 1']),
+ (0, 0, 1, ['Header 2']),
+ (0, 0, 1, ['Header 3']),
+ (0, 0, 1, ['Header 4'])]],
+ [[(0, 0, 3, ['body row 1, column 1']),
+ (0, 0, 3, ['column 2']),
+ (0, 0, 3, ['column 3']),
+ (0, 0, 3, ['column 4'])],
+ [(0, 0, 5, ['body row 2']),
+ (0, 2, 5, ['Cells may span columns.']),
+ None,
+ None],
+ [(0, 0, 7, ['body row 3']),
+ (1, 0, 7, ['Cells may', 'span rows.', '']),
+ (1, 1, 7, ['- Table cells', '- contain', '- body elements.']),
+ None],
+ [(0, 0, 9, ['body row 4']), None, None, None]])
+
+ The first item is a list containing column widths (colspecs). The second
+ item is a list of head rows, and the third is a list of body rows. Each
+ row contains a list of cells. Each cell is either None (for a cell unused
+ because of another cell's span), or a tuple. A cell tuple contains four
+ items: the number of extra rows used by the cell in a vertical span
+ (morerows); the number of extra columns used by the cell in a horizontal
+ span (morecols); the line offset of the first line of the cell contents;
+ and the cell contents, a list of lines of text.
+ """
+
+ headbodyseparatorpat = re.compile(r'\+=[=+]+=\+$')
+ """Matches the row separator between head rows and body rows."""
+
+ def parse(self, block):
+ """
+ Analyze the text `block` and return a table data structure.
+
+ Given a plaintext-graphic table in `block` (list of lines of text; no
+ whitespace padding), parse the table, construct and return the data
+ necessary to construct a CALS table or equivalent.
+
+ Raise `TableMarkupError` if there is any problem with the markup.
+ """
+ self.setup(block)
+ self.findheadbodysep()
+ self.parsegrid()
+ structure = self.structurefromcells()
+ return structure
+
+ def setup(self, block):
+ self.block = block[:] # make a copy; it may be modified
+ self.bottom = len(block) - 1
+ self.right = len(block[0]) - 1
+ self.headbodysep = None
+ self.done = [-1] * len(block[0])
+ self.cells = []
+ self.rowseps = {0: [0]}
+ self.colseps = {0: [0]}
+
+ def findheadbodysep(self):
+ """Look for a head/body row separator line; store the line index."""
+ for i in range(len(self.block)):
+ line = self.block[i]
+ if self.headbodyseparatorpat.match(line):
+ if self.headbodysep:
+ raise TableMarkupError, (
+ 'Multiple head/body row separators in table (at line '
+ 'offset %s and %s); only one allowed.'
+ % (self.headbodysep, i))
+ else:
+ self.headbodysep = i
+ self.block[i] = line.replace('=', '-')
+ if self.headbodysep == 0 or self.headbodysep == len(self.block) - 1:
+ raise TableMarkupError, (
+ 'The head/body row separator may not be the first or last '
+ 'line of the table.' % (self.headbodysep, i))
+
+ def parsegrid(self):
+ """
+ Start with a queue of upper-left corners, containing the upper-left
+ corner of the table itself. Trace out one rectangular cell, remember
+ it, and add its upper-right and lower-left corners to the queue of
+ potential upper-left corners of further cells. Process the queue in
+ top-to-bottom order, keeping track of how much of each text column has
+ been seen.
+
+ We'll end up knowing all the row and column boundaries, cell positions
+ and their dimensions.
+ """
+ corners = [(0, 0)]
+ while corners:
+ top, left = corners.pop(0)
+ if top == self.bottom or left == self.right \
+ or top <= self.done[left]:
+ continue
+ result = self.scancell(top, left)
+ if not result:
+ continue
+ bottom, right, rowseps, colseps = result
+ update_dictoflists(self.rowseps, rowseps)
+ update_dictoflists(self.colseps, colseps)
+ self.markdone(top, left, bottom, right)
+ cellblock = self.getcellblock(top, left, bottom, right)
+ self.cells.append((top, left, bottom, right, cellblock))
+ corners.extend([(top, right), (bottom, left)])
+ corners.sort()
+ if not self.checkparsecomplete():
+ raise TableMarkupError, 'Malformed table; parse incomplete.'
+
+ def markdone(self, top, left, bottom, right):
+ """For keeping track of how much of each text column has been seen."""
+ before = top - 1
+ after = bottom - 1
+ for col in range(left, right):
+ assert self.done[col] == before
+ self.done[col] = after
+
+ def checkparsecomplete(self):
+ """Each text column should have been completely seen."""
+ last = self.bottom - 1
+ for col in range(self.right):
+ if self.done[col] != last:
+ return None
+ return 1
+
+ def getcellblock(self, top, left, bottom, right):
+ """Given the corners, extract the text of a cell."""
+ cellblock = []
+ margin = right
+ for lineno in range(top + 1, bottom):
+ line = self.block[lineno][left + 1 : right].rstrip()
+ cellblock.append(line)
+ if line:
+ margin = margin and min(margin, len(line) - len(line.lstrip()))
+ if 0 < margin < right:
+ cellblock = [line[margin:] for line in cellblock]
+ return cellblock
+
+ def scancell(self, top, left):
+ """Starting at the top-left corner, start tracing out a cell."""
+ assert self.block[top][left] == '+'
+ result = self.scanright(top, left)
+ return result
+
+ def scanright(self, top, left):
+ """
+ Look for the top-right corner of the cell, and make note of all column
+ boundaries ('+').
+ """
+ colseps = {}
+ line = self.block[top]
+ for i in range(left + 1, self.right + 1):
+ if line[i] == '+':
+ colseps[i] = [top]
+ result = self.scandown(top, left, i)
+ if result:
+ bottom, rowseps, newcolseps = result
+ update_dictoflists(colseps, newcolseps)
+ return bottom, i, rowseps, colseps
+ elif line[i] != '-':
+ return None
+ return None
+
+ def scandown(self, top, left, right):
+ """
+ Look for the bottom-right corner of the cell, making note of all row
+ boundaries.
+ """
+ rowseps = {}
+ for i in range(top + 1, self.bottom + 1):
+ if self.block[i][right] == '+':
+ rowseps[i] = [right]
+ result = self.scanleft(top, left, i, right)
+ if result:
+ newrowseps, colseps = result
+ update_dictoflists(rowseps, newrowseps)
+ return i, rowseps, colseps
+ elif self.block[i][right] != '|':
+ return None
+ return None
+
+ def scanleft(self, top, left, bottom, right):
+ """
+ Noting column boundaries, look for the bottom-left corner of the cell.
+ It must line up with the starting point.
+ """
+ colseps = {}
+ line = self.block[bottom]
+ for i in range(right - 1, left, -1):
+ if line[i] == '+':
+ colseps[i] = [bottom]
+ elif line[i] != '-':
+ return None
+ if line[left] != '+':
+ return None
+ result = self.scanup(top, left, bottom, right)
+ if result is not None:
+ rowseps = result
+ return rowseps, colseps
+ return None
+
+ def scanup(self, top, left, bottom, right):
+ """Noting row boundaries, see if we can return to the starting point."""
+ rowseps = {}
+ for i in range(bottom - 1, top, -1):
+ if self.block[i][left] == '+':
+ rowseps[i] = [left]
+ elif self.block[i][left] != '|':
+ return None
+ return rowseps
+
+ def structurefromcells(self):
+ """
+ From the data colledted by `scancell()`, convert to the final data
+ structure.
+ """
+ rowseps = self.rowseps.keys() # list of row boundaries
+ rowseps.sort()
+ rowindex = {}
+ for i in range(len(rowseps)):
+ rowindex[rowseps[i]] = i # row boundary -> row number mapping
+ colseps = self.colseps.keys() # list of column boundaries
+ colseps.sort()
+ colindex = {}
+ for i in range(len(colseps)):
+ colindex[colseps[i]] = i # column boundary -> col number mapping
+ colspecs = [(colseps[i] - colseps[i - 1] - 1)
+ for i in range(1, len(colseps))] # list of column widths
+ # prepare an empty table with the correct number of rows & columns
+ onerow = [None for i in range(len(colseps) - 1)]
+ rows = [onerow[:] for i in range(len(rowseps) - 1)]
+ # keep track of # of cells remaining; should reduce to zero
+ remaining = (len(rowseps) - 1) * (len(colseps) - 1)
+ for top, left, bottom, right, block in self.cells:
+ rownum = rowindex[top]
+ colnum = colindex[left]
+ assert rows[rownum][colnum] is None, (
+ 'Cell (row %s, column %s) already used.'
+ % (rownum + 1, colnum + 1))
+ morerows = rowindex[bottom] - rownum - 1
+ morecols = colindex[right] - colnum - 1
+ remaining -= (morerows + 1) * (morecols + 1)
+ # write the cell into the table
+ rows[rownum][colnum] = (morerows, morecols, top + 1, block)
+ assert remaining == 0, 'Unused cells remaining.'
+ if self.headbodysep: # separate head rows from body rows
+ numheadrows = rowindex[self.headbodysep]
+ headrows = rows[:numheadrows]
+ bodyrows = rows[numheadrows:]
+ else:
+ headrows = []
+ bodyrows = rows
+ return (colspecs, headrows, bodyrows)
+
+
+def update_dictoflists(master, newdata):
+ """
+ Extend the list values of `master` with those from `newdata`.
+
+ Both parameters must be dictionaries containing list values.
+ """
+ for key, values in newdata.items():
+ master.setdefault(key, []).extend(values)
diff --git a/docutils/readers/__init__.py b/docutils/readers/__init__.py
new file mode 100644
index 000000000..9b8d38654
--- /dev/null
+++ b/docutils/readers/__init__.py
@@ -0,0 +1,118 @@
+#! /usr/bin/env python
+
+"""
+:Authors: David Goodger; Ueli Schlaepfer
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+This package contains Docutils Reader modules.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+import sys
+from docutils import nodes, utils
+from docutils.transforms import universal
+
+
+class Reader:
+
+ """
+ Abstract base class for docutils Readers.
+
+ Each reader module or package must export a subclass also called 'Reader'.
+
+ The three steps of a Reader's responsibility are defined: `scan()`,
+ `parse()`, and `transform()`. Call `read()` to process a document.
+ """
+
+ transforms = ()
+ """Ordered tuple of transform classes (each with a ``transform()`` method).
+ Populated by subclasses. `Reader.transform()` instantiates & runs them."""
+
+ def __init__(self, reporter, languagecode):
+ """
+ Initialize the Reader instance.
+
+ Several instance attributes are defined with dummy initial values.
+ Subclasses may use these attributes as they wish.
+ """
+
+ self.languagecode = languagecode
+ """Default language for new documents."""
+
+ self.reporter = reporter
+ """A `utils.Reporter` instance shared by all doctrees."""
+
+ self.source = None
+ """Path to the source of raw input."""
+
+ self.input = None
+ """Raw text input; either a single string or, for more complex cases,
+ a collection of strings."""
+
+ self.transforms = tuple(self.transforms)
+ """Instance copy of `Reader.transforms`; may be modified by client."""
+
+ def read(self, source, parser):
+ self.source = source
+ self.parser = parser
+ self.scan() # may modify self.parser, depending on input
+ self.parse()
+ self.transform()
+ return self.document
+
+ def scan(self):
+ """Override to read `self.input` from `self.source`."""
+ raise NotImplementedError('subclass must override this method')
+
+ def scanfile(self, source):
+ """
+ Scan a single file and return the raw data.
+
+ Parameter `source` may be:
+
+ (a) a file-like object, which is read directly;
+ (b) a path to a file, which is opened and then read; or
+ (c) `None`, which implies `sys.stdin`.
+ """
+ if hasattr(source, 'read'):
+ return source.read()
+ if self.source:
+ return open(source).read()
+ return sys.stdin.read()
+
+ def parse(self):
+ """Parse `self.input` into a document tree."""
+ self.document = self.newdocument()
+ self.parser.parse(self.input, self.document)
+
+ def transform(self):
+ """Run all of the transforms defined for this Reader."""
+ for xclass in (universal.first_reader_transforms
+ + tuple(self.transforms)
+ + universal.last_reader_transforms):
+ xclass(self.document).transform()
+
+ def newdocument(self, languagecode=None):
+ """Create and return a new empty document tree (root node)."""
+ document = nodes.document(
+ languagecode=(languagecode or self.languagecode),
+ reporter=self.reporter)
+ document['source'] = self.source
+ return document
+
+
+_reader_aliases = {'rtxt': 'standalone',
+ 'restructuredtext': 'standalone'}
+
+def get_reader_class(readername):
+ """Return the Reader class from the `readername` module."""
+ readername = readername.lower()
+ if _reader_aliases.has_key(readername):
+ readername = _reader_aliases[readername]
+ module = __import__(readername, globals(), locals())
+ return module.Reader
diff --git a/docutils/readers/standalone.py b/docutils/readers/standalone.py
new file mode 100644
index 000000000..27c0ded6b
--- /dev/null
+++ b/docutils/readers/standalone.py
@@ -0,0 +1,34 @@
+#! /usr/bin/env python
+
+"""
+:Authors: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Standalone file Reader for the reStructuredText markup syntax.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+import sys
+from docutils import readers
+from docutils.transforms import frontmatter, references
+from docutils.parsers.rst import Parser
+
+
+class Reader(readers.Reader):
+
+ document = None
+ """A single document tree."""
+
+ transforms = (references.Substitutions,
+ frontmatter.DocTitle,
+ frontmatter.DocInfo,
+ references.Footnotes,
+ references.Hyperlinks,)
+
+ def scan(self):
+ self.input = self.scanfile(self.source)
diff --git a/docutils/roman.py b/docutils/roman.py
new file mode 100644
index 000000000..5972c3cef
--- /dev/null
+++ b/docutils/roman.py
@@ -0,0 +1,81 @@
+"""Convert to and from Roman numerals"""
+
+__author__ = "Mark Pilgrim (f8dy@diveintopython.org)"
+__version__ = "1.4"
+__date__ = "8 August 2001"
+__copyright__ = """Copyright (c) 2001 Mark Pilgrim
+
+This program is part of "Dive Into Python", a free Python tutorial for
+experienced programmers. Visit http://diveintopython.org/ for the
+latest version.
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the Python 2.1.1 license, available at
+http://www.python.org/2.1.1/license.html
+"""
+
+import re
+
+#Define exceptions
+class RomanError(Exception): pass
+class OutOfRangeError(RomanError): pass
+class NotIntegerError(RomanError): pass
+class InvalidRomanNumeralError(RomanError): pass
+
+#Define digit mapping
+romanNumeralMap = (('M', 1000),
+ ('CM', 900),
+ ('D', 500),
+ ('CD', 400),
+ ('C', 100),
+ ('XC', 90),
+ ('L', 50),
+ ('XL', 40),
+ ('X', 10),
+ ('IX', 9),
+ ('V', 5),
+ ('IV', 4),
+ ('I', 1))
+
+def toRoman(n):
+ """convert integer to Roman numeral"""
+ if not (0 < n < 5000):
+ raise OutOfRangeError, "number out of range (must be 1..4999)"
+ if int(n) <> n:
+ raise NotIntegerError, "decimals can not be converted"
+
+ result = ""
+ for numeral, integer in romanNumeralMap:
+ while n >= integer:
+ result += numeral
+ n -= integer
+ return result
+
+#Define pattern to detect valid Roman numerals
+romanNumeralPattern = re.compile('''
+ ^ # beginning of string
+ M{0,4} # thousands - 0 to 4 M's
+ (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
+ # or 500-800 (D, followed by 0 to 3 C's)
+ (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
+ # or 50-80 (L, followed by 0 to 3 X's)
+ (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
+ # or 5-8 (V, followed by 0 to 3 I's)
+ $ # end of string
+ ''' ,re.VERBOSE)
+
+def fromRoman(s):
+ """convert Roman numeral to integer"""
+ if not s:
+ raise InvalidRomanNumeralError, 'Input can not be blank'
+ if not romanNumeralPattern.search(s):
+ raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s
+
+ result = 0
+ index = 0
+ for numeral, integer in romanNumeralMap:
+ while s[index:index+len(numeral)] == numeral:
+ result += integer
+ index += len(numeral)
+ return result
+
diff --git a/docutils/statemachine.py b/docutils/statemachine.py
new file mode 100644
index 000000000..9410cb956
--- /dev/null
+++ b/docutils/statemachine.py
@@ -0,0 +1,1076 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Version: 1.3
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+A finite state machine specialized for regular-expression-based text filters,
+this module defines the following classes:
+
+- `StateMachine`, a state machine
+- `State`, a state superclass
+- `StateMachineWS`, a whitespace-sensitive version of `StateMachine`
+- `StateWS`, a state superclass for use with `StateMachineWS`
+- `SearchStateMachine`, uses `re.search()` instead of `re.match()`
+- `SearchStateMachineWS`, uses `re.search()` instead of `re.match()`
+
+Exception classes:
+
+- `UnknownStateError`
+- `DuplicateStateError`
+- `UnknownTransitionError`
+- `DuplicateTransitionError`
+- `TransitionPatternNotFound`
+- `TransitionMethodNotFound`
+- `UnexpectedIndentationError`
+- `TransitionCorrection`: Raised to switch to another transition.
+
+Functions:
+
+- `string2lines()`: split a multi-line string into a list of one-line strings
+- `extractindented()`: return indented lines with minimum indentation removed
+
+How To Use This Module
+======================
+(See the individual classes, methods, and attributes for details.)
+
+1. Import it: ``import statemachine`` or ``from statemachine import ...``.
+ You will also need to ``import re``.
+
+2. Derive a subclass of `State` (or `StateWS`) for each state in your state
+ machine::
+
+ class MyState(statemachine.State):
+
+ Within the state's class definition:
+
+ a) Include a pattern for each transition, in `State.patterns`::
+
+ patterns = {'atransition': r'pattern', ...}
+
+ b) Include a list of initial transitions to be set up automatically, in
+ `State.initialtransitions`::
+
+ initialtransitions = ['atransition', ...]
+
+ c) Define a method for each transition, with the same name as the
+ transition pattern::
+
+ def atransition(self, match, context, nextstate):
+ # do something
+ result = [...] # a list
+ return context, nextstate, result
+ # context, nextstate may be altered
+
+ Transition methods may raise an `EOFError` to cut processing short.
+
+ d) You may wish to override the `State.bof()` and/or `State.eof()` implicit
+ transition methods, which handle the beginning- and end-of-file.
+
+ e) In order to handle nested processing, you may wish to override the
+ attributes `State.nestedSM` and/or `State.nestedSMkwargs`.
+
+ If you are using `StateWS` as a base class, in order to handle nested
+ indented blocks, you may wish to:
+
+ - override the attributes `StateWS.indentSM`, `StateWS.indentSMkwargs`,
+ `StateWS.knownindentSM`, and/or `StateWS.knownindentSMkwargs`;
+ - override the `StateWS.blank()` method; and/or
+ - override or extend the `StateWS.indent()`, `StateWS.knownindent()`,
+ and/or `StateWS.firstknownindent()` methods.
+
+3. Create a state machine object::
+
+ sm = StateMachine(stateclasses=[MyState, ...], initialstate='MyState')
+
+4. Obtain the input text, which needs to be converted into a tab-free list of
+ one-line strings. For example, to read text from a file called
+ 'inputfile'::
+
+ inputstring = open('inputfile').read()
+ inputlines = statemachine.string2lines(inputstring)
+
+6. Run the state machine on the input text and collect the results, a list::
+
+ results = sm.run(inputlines)
+
+7. Remove any lingering circular references::
+
+ sm.unlink()
+"""
+
+__docformat__ = 'restructuredtext'
+
+import sys, re, string
+
+
+class StateMachine:
+
+ """
+ A finite state machine for text filters using regular expressions.
+
+ The input is provided in the form of a list of one-line strings (no
+ newlines). States are subclasses of the `State` class. Transitions consist
+ of regular expression patterns and transition methods, and are defined in
+ each state.
+
+ The state machine is started with the `run()` method, which returns the
+ results of processing in a list.
+ """
+
+ def __init__(self, stateclasses, initialstate, debug=0):
+ """
+ Initialize a `StateMachine` object; add state objects.
+
+ Parameters:
+
+ - `stateclasses`: a list of `State` (sub)classes.
+ - `initialstate`: a string, the class name of the initial state.
+ - `debug`: a boolean; produce verbose output if true (nonzero).
+ """
+
+ self.inputlines = None
+ """List of strings (without newlines). Filled by `self.run()`."""
+
+ self.inputoffset = 0
+ """Offset of `self.inputlines` from the beginning of the file."""
+
+ self.line = None
+ """Current input line."""
+
+ self.lineoffset = None
+ """Current input line offset from beginning of `self.inputlines`."""
+
+ self.debug = debug
+ """Debugging mode on/off."""
+
+ self.initialstate = initialstate
+ """The name of the initial state (key to `self.states`)."""
+
+ self.currentstate = initialstate
+ """The name of the current state (key to `self.states`)."""
+
+ self.states = {}
+ """Mapping of {state_name: State_object}."""
+
+ self.addstates(stateclasses)
+
+ def unlink(self):
+ """Remove circular references to objects no longer required."""
+ for state in self.states.values():
+ state.unlink()
+ self.states = None
+
+ def run(self, inputlines, inputoffset=0):
+ """
+ Run the state machine on `inputlines`. Return results (a list).
+
+ Reset `self.lineoffset` and `self.currentstate`. Run the
+ beginning-of-file transition. Input one line at a time and check for a
+ matching transition. If a match is found, call the transition method
+ and possibly change the state. Store the context returned by the
+ transition method to be passed on to the next transition matched.
+ Accumulate the results returned by the transition methods in a list.
+ Run the end-of-file transition. Finally, return the accumulated
+ results.
+
+ Parameters:
+
+ - `inputlines`: a list of strings without newlines.
+ - `inputoffset`: the line offset of `inputlines` from the beginning of
+ the file.
+ """
+ self.inputlines = inputlines
+ self.inputoffset = inputoffset
+ self.lineoffset = -1
+ self.currentstate = self.initialstate
+ if self.debug:
+ print >>sys.stderr, ('\nStateMachine.run: inputlines:\n| %s' %
+ '\n| '.join(self.inputlines))
+ context = None
+ results = []
+ state = self.getstate()
+ try:
+ if self.debug:
+ print >>sys.stderr, ('\nStateMachine.run: bof transition')
+ context, result = state.bof(context)
+ results.extend(result)
+ while 1:
+ try:
+ self.nextline()
+ if self.debug:
+ print >>sys.stderr, ('\nStateMachine.run: line:\n| %s'
+ % self.line)
+ except IndexError:
+ break
+ try:
+ context, nextstate, result = self.checkline(context, state)
+ except EOFError:
+ break
+ state = self.getstate(nextstate)
+ results.extend(result)
+ if self.debug:
+ print >>sys.stderr, ('\nStateMachine.run: eof transition')
+ result = state.eof(context)
+ results.extend(result)
+ except:
+ self.error()
+ raise
+ return results
+
+ def getstate(self, nextstate=None):
+ """
+ Return current state object; set it first if `nextstate` given.
+
+ Parameter `nextstate`: a string, the name of the next state.
+
+ Exception: `UnknownStateError` raised if `nextstate` unknown.
+ """
+ if nextstate:
+ if self.debug and nextstate != self.currentstate:
+ print >>sys.stderr, \
+ ('\nStateMachine.getstate: Changing state from '
+ '"%s" to "%s" (input line %s).'
+ % (self.currentstate, nextstate, self.abslineno()))
+ self.currentstate = nextstate
+ try:
+ return self.states[self.currentstate]
+ except KeyError:
+ raise UnknownStateError(self.currentstate)
+
+ def nextline(self, n=1):
+ """Load `self.line` with the `n`'th next line and return it."""
+ self.lineoffset += n
+ self.line = self.inputlines[self.lineoffset]
+ return self.line
+
+ def nextlineblank(self):
+ """Return 1 if the next line is blank or non-existant."""
+ try:
+ return not self.inputlines[self.lineoffset + 1].strip()
+ except IndexError:
+ return 1
+
+ def ateof(self):
+ """Return 1 if the input is at or past end-of-file."""
+ return self.lineoffset >= len(self.inputlines) - 1
+
+ def atbof(self):
+ """Return 1 if the input is at or before beginning-of-file."""
+ return self.lineoffset <= 0
+
+ def previousline(self, n=1):
+ """Load `self.line` with the `n`'th previous line and return it."""
+ self.lineoffset -= n
+ self.line = self.inputlines[self.lineoffset]
+ return self.line
+
+ def gotoline(self, lineoffset):
+ """Jump to absolute line offset `lineoffset`, load and return it."""
+ self.lineoffset = lineoffset - self.inputoffset
+ self.line = self.inputlines[self.lineoffset]
+ return self.line
+
+ def abslineoffset(self):
+ """Return line offset of current line, from beginning of file."""
+ return self.lineoffset + self.inputoffset
+
+ def abslineno(self):
+ """Return line number of current line (counting from 1)."""
+ return self.lineoffset + self.inputoffset + 1
+
+ def gettextblock(self):
+ """Return a contiguous block of text."""
+ block = []
+ for line in self.inputlines[self.lineoffset:]:
+ if not line.strip():
+ break
+ block.append(line)
+ self.nextline(len(block) - 1) # advance to last line of block
+ return block
+
+ def getunindented(self):
+ """
+ Return a contiguous, flush-left block of text.
+
+ Raise `UnexpectedIndentationError` if an indented line is encountered
+ before the text block ends (with a blank line).
+ """
+ block = [self.line]
+ for line in self.inputlines[self.lineoffset + 1:]:
+ if not line.strip():
+ break
+ if line[0] == ' ':
+ self.nextline(len(block) - 1) # advance to last line of block
+ raise UnexpectedIndentationError(block, self.abslineno() + 1)
+ block.append(line)
+ self.nextline(len(block) - 1) # advance to last line of block
+ return block
+
+ def checkline(self, context, state):
+ """
+ Examine one line of input for a transition match.
+
+ Parameters:
+
+ - `context`: application-dependent storage.
+ - `state`: a `State` object, the current state.
+
+ Return the values returned by the transition method:
+
+ - context: possibly modified from the parameter `context`;
+ - next state name (`State` subclass name), or ``None`` if no match;
+ - the result output of the transition, a list.
+ """
+ if self.debug:
+ print >>sys.stdout, ('\nStateMachine.checkline: '
+ 'context "%s", state "%s"' %
+ (context, state.__class__.__name__))
+ context, nextstate, result = self.matchtransition(context, state)
+ return context, nextstate, result
+
+ def matchtransition(self, context, state):
+ """
+ Try to match the current line to a transition & execute its method.
+
+ Parameters:
+
+ - `context`: application-dependent storage.
+ - `state`: a `State` object, the current state.
+
+ Return the values returned by the transition method:
+
+ - context: possibly modified from the parameter `context`, unchanged
+ if no match;
+ - next state name (`State` subclass name), or ``None`` if no match;
+ - the result output of the transition, a list (empty if no match).
+ """
+ if self.debug:
+ print >>sys.stderr, (
+ '\nStateMachine.matchtransition: state="%s", transitions=%r.'
+ % (state.__class__.__name__, state.transitionorder))
+ for name in state.transitionorder:
+ while 1:
+ pattern, method, nextstate = state.transitions[name]
+ if self.debug:
+ print >>sys.stderr, (
+ '\nStateMachine.matchtransition: Trying transition '
+ '"%s" in state "%s".'
+ % (name, state.__class__.__name__))
+ match = self.match(pattern)
+ if match:
+ if self.debug:
+ print >>sys.stderr, (
+ '\nStateMachine.matchtransition: Matched '
+ 'transition "%s" in state "%s".'
+ % (name, state.__class__.__name__))
+ try:
+ return method(match, context, nextstate)
+ except TransitionCorrection, detail:
+ name = str(detail)
+ continue # try again with new transition name
+ break
+ else:
+ return context, None, [] # no match
+
+ def match(self, pattern):
+ """
+ Return the result of a regular expression match.
+
+ Parameter `pattern`: an `re` compiled regular expression.
+ """
+ return pattern.match(self.line)
+
+ def addstate(self, stateclass):
+ """
+ Initialize & add a `stateclass` (`State` subclass) object.
+
+ Exception: `DuplicateStateError` raised if `stateclass` already added.
+ """
+ statename = stateclass.__name__
+ if self.states.has_key(statename):
+ raise DuplicateStateError(statename)
+ self.states[statename] = stateclass(self, self.debug)
+
+ def addstates(self, stateclasses):
+ """
+ Add `stateclasses` (a list of `State` subclasses).
+ """
+ for stateclass in stateclasses:
+ self.addstate(stateclass)
+
+ def error(self):
+ """Report error details."""
+ type, value, module, line, function = _exceptiondata()
+ print >>sys.stderr, '%s: %s' % (type, value)
+ print >>sys.stderr, 'input line %s' % (self.abslineno())
+ print >>sys.stderr, ('module %s, line %s, function %s'
+ % (module, line, function))
+
+
+class State:
+
+ """
+ State superclass. Contains a list of transitions, and transition methods.
+
+ Transition methods all have the same signature. They take 3 parameters:
+
+ - An `re` match object. ``match.string`` contains the matched input line,
+ ``match.start()`` gives the start index of the match, and
+ ``match.end()`` gives the end index.
+ - A context object, whose meaning is application-defined (initial value
+ ``None``). It can be used to store any information required by the state
+ machine, and the retured context is passed on to the next transition
+ method unchanged.
+ - The name of the next state, a string, taken from the transitions list;
+ normally it is returned unchanged, but it may be altered by the
+ transition method if necessary.
+
+ Transition methods all return a 3-tuple:
+
+ - A context object, as (potentially) modified by the transition method.
+ - The next state name (a return value of ``None`` means no state change).
+ - The processing result, a list, which is accumulated by the state
+ machine.
+
+ Transition methods may raise an `EOFError` to cut processing short.
+
+ There are two implicit transitions, and corresponding transition methods
+ are defined: `bof()` handles the beginning-of-file, and `eof()` handles
+ the end-of-file. These methods have non-standard signatures and return
+ values. `bof()` returns the initial context and results, and may be used
+ to return a header string, or do any other processing needed. `eof()`
+ should handle any remaining context and wrap things up; it returns the
+ final processing result.
+
+ Typical applications need only subclass `State` (or a subclass), set the
+ `patterns` and `initialtransitions` class attributes, and provide
+ corresponding transition methods. The default object initialization will
+ take care of constructing the list of transitions.
+ """
+
+ patterns = None
+ """
+ {Name: pattern} mapping, used by `maketransition()`. Each pattern may
+ be a string or a compiled `re` pattern. Override in subclasses.
+ """
+
+ initialtransitions = None
+ """
+ A list of transitions to initialize when a `State` is instantiated.
+ Each entry is either a transition name string, or a (transition name, next
+ state name) pair. See `maketransitions()`. Override in subclasses.
+ """
+
+ nestedSM = None
+ """
+ The `StateMachine` class for handling nested processing.
+
+ If left as ``None``, `nestedSM` defaults to the class of the state's
+ controlling state machine. Override it in subclasses to avoid the default.
+ """
+
+ nestedSMkwargs = None
+ """
+ Keyword arguments dictionary, passed to the `nestedSM` constructor.
+
+ Two keys must have entries in the dictionary:
+
+ - Key 'stateclasses' must be set to a list of `State` classes.
+ - Key 'initialstate' must be set to the name of the initial state class.
+
+ If `nestedSMkwargs` is left as ``None``, 'stateclasses' defaults to the
+ class of the current state, and 'initialstate' defaults to the name of the
+ class of the current state. Override in subclasses to avoid the defaults.
+ """
+
+ def __init__(self, statemachine, debug=0):
+ """
+ Initialize a `State` object; make & add initial transitions.
+
+ Parameters:
+
+ - `statemachine`: the controlling `StateMachine` object.
+ - `debug`: a boolean; produce verbose output if true (nonzero).
+ """
+
+ self.transitionorder = []
+ """A list of transition names in search order."""
+
+ self.transitions = {}
+ """
+ A mapping of transition names to 3-tuples containing
+ (compiled_pattern, transition_method, next_state_name). Initialized as
+ an instance attribute dynamically (instead of as a class attribute)
+ because it may make forward references to patterns and methods in this
+ or other classes.
+ """
+
+ if self.initialtransitions:
+ names, transitions = self.maketransitions(self.initialtransitions)
+ self.addtransitions(names, transitions)
+
+ self.statemachine = statemachine
+ """A reference to the controlling `StateMachine` object."""
+
+ self.debug = debug
+ """Debugging mode on/off."""
+
+ if self.nestedSM is None:
+ self.nestedSM = self.statemachine.__class__
+ if self.nestedSMkwargs is None:
+ self.nestedSMkwargs = {'stateclasses': [self.__class__],
+ 'initialstate': self.__class__.__name__}
+
+ def unlink(self):
+ """Remove circular references to objects no longer required."""
+ self.statemachine = None
+
+ def addtransitions(self, names, transitions):
+ """
+ Add a list of transitions to the start of the transition list.
+
+ Parameters:
+
+ - `names`: a list of transition names.
+ - `transitions`: a mapping of names to transition tuples.
+
+ Exceptions: `DuplicateTransitionError`, `UnknownTransitionError`.
+ """
+ for name in names:
+ if self.transitions.has_key(name):
+ raise DuplicateTransitionError(name)
+ if not transitions.has_key(name):
+ raise UnknownTransitionError(name)
+ self.transitionorder[:0] = names
+ self.transitions.update(transitions)
+
+ def addtransition(self, name, transition):
+ """
+ Add a transition to the start of the transition list.
+
+ Parameter `transition`: a ready-made transition 3-tuple.
+
+ Exception: `DuplicateTransitionError`.
+ """
+ if self.transitions.has_key(name):
+ raise DuplicateTransitionError(name)
+ self.transitionorder[:0] = [name]
+ self.transitions[name] = transition
+
+ def removetransition(self, name):
+ """
+ Remove a transition by `name`.
+
+ Exception: `UnknownTransitionError`.
+ """
+ try:
+ del self.transitions[name]
+ self.transitionorder.remove(name)
+ except:
+ raise UnknownTransitionError(name)
+
+ def maketransition(self, name, nextstate=None):
+ """
+ Make & return a transition tuple based on `name`.
+
+ This is a convenience function to simplify transition creation.
+
+ Parameters:
+
+ - `name`: a string, the name of the transition pattern & method. This
+ `State` object must have a method called '`name`', and a dictionary
+ `self.patterns` containing a key '`name`'.
+ - `nextstate`: a string, the name of the next `State` object for this
+ transition. A value of ``None`` (or absent) implies no state change
+ (i.e., continue with the same state).
+
+ Exceptions: `TransitionPatternNotFound`, `TransitionMethodNotFound`.
+ """
+ if nextstate is None:
+ nextstate = self.__class__.__name__
+ try:
+ pattern = self.patterns[name]
+ if not hasattr(pattern, 'match'):
+ pattern = re.compile(pattern)
+ except KeyError:
+ raise TransitionPatternNotFound(
+ '%s.patterns[%r]' % (self.__class__.__name__, name))
+ try:
+ method = getattr(self, name)
+ except AttributeError:
+ raise TransitionMethodNotFound(
+ '%s.%s' % (self.__class__.__name__, name))
+ return (pattern, method, nextstate)
+
+ def maketransitions(self, namelist):
+ """
+ Return a list of transition names and a transition mapping.
+
+ Parameter `namelist`: a list, where each entry is either a
+ transition name string, or a 1- or 2-tuple (transition name, optional
+ next state name).
+ """
+ stringtype = type('')
+ names = []
+ transitions = {}
+ for namestate in namelist:
+ if type(namestate) is stringtype:
+ transitions[namestate] = self.maketransition(namestate)
+ names.append(namestate)
+ else:
+ transitions[namestate[0]] = self.maketransition(*namestate)
+ names.append(namestate[0])
+ return names, transitions
+
+ def bof(self, context):
+ """
+ Handle beginning-of-file. Return unchanged `context`, empty result.
+
+ Override in subclasses.
+
+ Parameter `context`: application-defined storage.
+ """
+ return context, []
+
+ def eof(self, context):
+ """
+ Handle end-of-file. Return empty result.
+
+ Override in subclasses.
+
+ Parameter `context`: application-defined storage.
+ """
+ return []
+
+ def nop(self, match, context, nextstate):
+ """
+ A "do nothing" transition method.
+
+ Return unchanged `context` & `nextstate`, empty result. Useful for
+ simple state changes (actionless transitions).
+ """
+ return context, nextstate, []
+
+
+class StateMachineWS(StateMachine):
+
+ """
+ `StateMachine` subclass specialized for whitespace recognition.
+
+ The transitions 'blank' (for blank lines) and 'indent' (for indented text
+ blocks) are defined implicitly, and are checked before any other
+ transitions. The companion `StateWS` class defines default transition
+ methods. There are three methods provided for extracting indented text
+ blocks:
+
+ - `getindented()`: use when the indent is unknown.
+ - `getknownindented()`: use when the indent is known for all lines.
+ - `getfirstknownindented()`: use when only the first line's indent is
+ known.
+ """
+
+ spaces = re.compile(' *')
+ """Indentation recognition pattern."""
+
+ def checkline(self, context, state):
+ """
+ Examine one line of input for whitespace first, then transitions.
+
+ Extends `StateMachine.checkline()`.
+ """
+ if self.debug:
+ print >>sys.stdout, ('\nStateMachineWS.checkline: '
+ 'context "%s", state "%s"' %
+ (context, state.__class__.__name__))
+ context, nextstate, result = self.checkwhitespace(context, state)
+ if nextstate == '': # no whitespace match
+ return StateMachine.checkline(self, context, state)
+ else:
+ return context, nextstate, result
+
+ def checkwhitespace(self, context, state):
+ """
+ Check for a blank line or increased indent. Call the state's
+ transition method if a match is found.
+
+ Parameters:
+
+ - `context`: application-dependent storage.
+ - `state`: a `State` object, the current state.
+
+ Return the values returned by the transition method:
+
+ - context, possibly modified from the parameter `context`;
+ - next state name (`State` subclass name), or '' (empty string) if no
+ match;
+ - the result output of the transition, a list (empty if no match).
+ """
+ if self.debug:
+ print >>sys.stdout, ('\nStateMachineWS.checkwhitespace: '
+ 'context "%s", state "%s"' %
+ (context, state.__class__.__name__))
+ match = self.spaces.match(self.line)
+ indent = match.end()
+ if indent == len(self.line):
+ if self.debug:
+ print >>sys.stdout, ('\nStateMachineWS.checkwhitespace: '
+ 'implicit transition "blank" matched')
+ return state.blank(match, context, self.currentstate)
+ elif indent:
+ if self.debug:
+ print >>sys.stdout, ('\nStateMachineWS.checkwhitespace: '
+ 'implicit transition "indent" matched')
+ return state.indent(match, context, self.currentstate)
+ else:
+ return context, '', [] # neither blank line nor indented
+
+ def getindented(self, uptoblank=0, stripindent=1):
+ """
+ Return a indented lines of text and info.
+
+ Extract an indented block where the indent is unknown for all lines.
+
+ :Parameters:
+ - `uptoblank`: Stop collecting at the first blank line if true (1).
+ - `stripindent`: Strip common leading indent if true (1, default).
+
+ :Return:
+ - the indented block (a list of lines of text),
+ - its indent,
+ - its first line offset from BOF, and
+ - whether or not it finished with a blank line.
+ """
+ offset = self.abslineoffset()
+ indented, indent, blankfinish = extractindented(
+ self.inputlines[self.lineoffset:], uptoblank, stripindent)
+ if indented:
+ self.nextline(len(indented) - 1) # advance to last indented line
+ while indented and not indented[0].strip():
+ indented.pop(0)
+ offset += 1
+ return indented, indent, offset, blankfinish
+
+ def getknownindented(self, indent, uptoblank=0, stripindent=1):
+ """
+ Return an indented block and info.
+
+ Extract an indented block where the indent is known for all lines.
+ Starting with the current line, extract the entire text block with at
+ least `indent` indentation (which must be whitespace, except for the
+ first line).
+
+ :Parameters:
+ - `indent`: The number of indent columns/characters.
+ - `uptoblank`: Stop collecting at the first blank line if true (1).
+ - `stripindent`: Strip `indent` characters of indentation if true
+ (1, default).
+
+ :Return:
+ - the indented block,
+ - its first line offset from BOF, and
+ - whether or not it finished with a blank line.
+ """
+ offset = self.abslineoffset()
+ indented = [self.line[indent:]]
+ for line in self.inputlines[self.lineoffset + 1:]:
+ if line[:indent].strip():
+ blankfinish = not indented[-1].strip() and len(indented) > 1
+ break
+ if uptoblank and line.strip():
+ blankfinish = 1
+ break
+ if stripindent:
+ indented.append(line[indent:])
+ else:
+ indented.append(line)
+ else:
+ blankfinish = 1
+ if indented:
+ self.nextline(len(indented) - 1) # advance to last indented line
+ while indented and not indented[0].strip():
+ indented.pop(0)
+ offset += 1
+ return indented, offset, blankfinish
+
+ def getfirstknownindented(self, indent, uptoblank=0, stripindent=1):
+ """
+ Return an indented block and info.
+
+ Extract an indented block where the indent is known for the first line
+ and unknown for all other lines.
+
+ :Parameters:
+ - `indent`: The first line's indent (# of columns/characters).
+ - `uptoblank`: Stop collecting at the first blank line if true (1).
+ - `stripindent`: Strip `indent` characters of indentation if true
+ (1, default).
+
+ :Return:
+ - the indented block,
+ - its indent,
+ - its first line offset from BOF, and
+ - whether or not it finished with a blank line.
+ """
+ offset = self.abslineoffset()
+ indented = [self.line[indent:]]
+ indented[1:], indent, blankfinish = extractindented(
+ self.inputlines[self.lineoffset + 1:], uptoblank, stripindent)
+ self.nextline(len(indented) - 1) # advance to last indented line
+ while indented and not indented[0].strip():
+ indented.pop(0)
+ offset += 1
+ return indented, indent, offset, blankfinish
+
+
+class StateWS(State):
+
+ """
+ State superclass specialized for whitespace (blank lines & indents).
+
+ Use this class with `StateMachineWS`. The transition method `blank()`
+ handles blank lines and `indent()` handles nested indented blocks.
+ Indented blocks trigger a new state machine to be created by `indent()`
+ and run. The class of the state machine to be created is in `indentSM`,
+ and the constructor keyword arguments are in the dictionary
+ `indentSMkwargs`.
+
+ The methods `knownindent()` and `firstknownindent()` are provided for
+ indented blocks where the indent (all lines' and first line's only,
+ respectively) is known to the transition method, along with the attributes
+ `knownindentSM` and `knownindentSMkwargs`. Neither transition method is
+ triggered automatically.
+ """
+
+ indentSM = None
+ """
+ The `StateMachine` class handling indented text blocks.
+
+ If left as ``None``, `indentSM` defaults to the value of `State.nestedSM`.
+ Override it in subclasses to avoid the default.
+ """
+
+ indentSMkwargs = None
+ """
+ Keyword arguments dictionary, passed to the `indentSM` constructor.
+
+ If left as ``None``, `indentSMkwargs` defaults to the value of
+ `State.nestedSMkwargs`. Override it in subclasses to avoid the default.
+ """
+
+ knownindentSM = None
+ """
+ The `StateMachine` class handling known-indented text blocks.
+
+ If left as ``None``, `knownindentSM` defaults to the value of `indentSM`.
+ Override it in subclasses to avoid the default.
+ """
+
+ knownindentSMkwargs = None
+ """
+ Keyword arguments dictionary, passed to the `knownindentSM` constructor.
+
+ If left as ``None``, `knownindentSMkwargs` defaults to the value of
+ `indentSMkwargs`. Override it in subclasses to avoid the default.
+ """
+
+ def __init__(self, statemachine, debug=0):
+ """
+ Initialize a `StateSM` object; extends `State.__init__()`.
+
+ Check for indent state machine attributes, set defaults if not set.
+ """
+ State.__init__(self, statemachine, debug)
+ if self.indentSM is None:
+ self.indentSM = self.nestedSM
+ if self.indentSMkwargs is None:
+ self.indentSMkwargs = self.nestedSMkwargs
+ if self.knownindentSM is None:
+ self.knownindentSM = self.indentSM
+ if self.knownindentSMkwargs is None:
+ self.knownindentSMkwargs = self.indentSMkwargs
+
+ def blank(self, match, context, nextstate):
+ """Handle blank lines. Does nothing. Override in subclasses."""
+ return self.nop(match, context, nextstate)
+
+ def indent(self, match, context, nextstate):
+ """
+ Handle an indented text block. Extend or override in subclasses.
+
+ Recursively run the registered state machine for indented blocks
+ (`self.indentSM`).
+ """
+ indented, indent, lineoffset, blankfinish = \
+ self.statemachine.getindented()
+ sm = self.indentSM(debug=self.debug, **self.indentSMkwargs)
+ results = sm.run(indented, inputoffset=lineoffset)
+ return context, nextstate, results
+
+ def knownindent(self, match, context, nextstate):
+ """
+ Handle a known-indent text block. Extend or override in subclasses.
+
+ Recursively run the registered state machine for known-indent indented
+ blocks (`self.knownindentSM`). The indent is the length of the match,
+ ``match.end()``.
+ """
+ indented, lineoffset, blankfinish = \
+ self.statemachine.getknownindented(match.end())
+ sm = self.knownindentSM(debug=self.debug, **self.knownindentSMkwargs)
+ results = sm.run(indented, inputoffset=lineoffset)
+ return context, nextstate, results
+
+ def firstknownindent(self, match, context, nextstate):
+ """
+ Handle an indented text block (first line's indent known).
+
+ Extend or override in subclasses.
+
+ Recursively run the registered state machine for known-indent indented
+ blocks (`self.knownindentSM`). The indent is the length of the match,
+ ``match.end()``.
+ """
+ indented, lineoffset, blankfinish = \
+ self.statemachine.getfirstknownindented(match.end())
+ sm = self.knownindentSM(debug=self.debug, **self.knownindentSMkwargs)
+ results = sm.run(indented, inputoffset=lineoffset)
+ return context, nextstate, results
+
+
+class _SearchOverride:
+
+ """
+ Mix-in class to override `StateMachine` regular expression behavior.
+
+ Changes regular expression matching, from the default `re.match()`
+ (succeeds only if the pattern matches at the start of `self.line`) to
+ `re.search()` (succeeds if the pattern matches anywhere in `self.line`).
+ When subclassing a `StateMachine`, list this class **first** in the
+ inheritance list of the class definition.
+ """
+
+ def match(self, pattern):
+ """
+ Return the result of a regular expression search.
+
+ Overrides `StateMachine.match()`.
+
+ Parameter `pattern`: `re` compiled regular expression.
+ """
+ return pattern.search(self.line)
+
+
+class SearchStateMachine(_SearchOverride, StateMachine):
+ """`StateMachine` which uses `re.search()` instead of `re.match()`."""
+ pass
+
+
+class SearchStateMachineWS(_SearchOverride, StateMachineWS):
+ """`StateMachineWS` which uses `re.search()` instead of `re.match()`."""
+ pass
+
+
+class UnknownStateError(Exception): pass
+class DuplicateStateError(Exception): pass
+class UnknownTransitionError(Exception): pass
+class DuplicateTransitionError(Exception): pass
+class TransitionPatternNotFound(Exception): pass
+class TransitionMethodNotFound(Exception): pass
+class UnexpectedIndentationError(Exception): pass
+
+
+class TransitionCorrection(Exception):
+
+ """
+ Raise from within a transition method to switch to another transition.
+ """
+
+
+_whitespace_conversion_table = string.maketrans('\v\f', ' ')
+
+def string2lines(astring, tabwidth=8, convertwhitespace=0):
+ """
+ Return a list of one-line strings with tabs expanded and no newlines.
+
+ Each tab is expanded with between 1 and `tabwidth` spaces, so that the
+ next character's index becomes a multiple of `tabwidth` (8 by default).
+
+ Parameters:
+
+ - `astring`: a multi-line string.
+ - `tabwidth`: the number of columns between tab stops.
+ - `convertwhitespace`: convert form feeds and vertical tabs to spaces?
+ """
+ if convertwhitespace:
+ astring = astring.translate(_whitespace_conversion_table)
+ return [s.expandtabs(tabwidth) for s in astring.splitlines()]
+
+def extractindented(lines, uptoblank=0, stripindent=1):
+ """
+ Extract and return a list of indented lines of text.
+
+ Collect all lines with indentation, determine the minimum indentation,
+ remove the minimum indentation from all indented lines (unless
+ `stripindent` is false), and return them. All lines up to but not
+ including the first unindented line will be returned.
+
+ :Parameters:
+ - `lines`: a list of one-line strings without newlines.
+ - `uptoblank`: Stop collecting at the first blank line if true (1).
+ - `stripindent`: Strip common leading indent if true (1, default).
+
+ :Return:
+ - a list of indented lines with mininum indent removed;
+ - the amount of the indent;
+ - whether or not the block finished with a blank line or at the end of
+ `lines`.
+ """
+ source = []
+ indent = None
+ for line in lines:
+ if line and line[0] != ' ': # line not indented
+ # block finished properly iff the last indented line was blank
+ blankfinish = len(source) and not source[-1].strip()
+ break
+ stripped = line.lstrip()
+ if uptoblank and not stripped: # blank line
+ blankfinish = 1
+ break
+ source.append(line)
+ if not stripped: # blank line
+ continue
+ lineindent = len(line) - len(stripped)
+ if indent is None:
+ indent = lineindent
+ else:
+ indent = min(indent, lineindent)
+ else:
+ blankfinish = 1 # block ends at end of lines
+ if indent:
+ if stripindent:
+ source = [s[indent:] for s in source]
+ return source, indent, blankfinish
+ else:
+ return [], 0, blankfinish
+
+def _exceptiondata():
+ """
+ Return exception information:
+
+ - the exception's class name;
+ - the exception object;
+ - the name of the file containing the offending code;
+ - the line number of the offending code;
+ - the function name of the offending code.
+ """
+ type, value, traceback = sys.exc_info()
+ while traceback.tb_next:
+ traceback = traceback.tb_next
+ code = traceback.tb_frame.f_code
+ return (type.__name__, value, code.co_filename, traceback.tb_lineno,
+ code.co_name)
diff --git a/docutils/transforms/__init__.py b/docutils/transforms/__init__.py
new file mode 100644
index 000000000..6c2ae279f
--- /dev/null
+++ b/docutils/transforms/__init__.py
@@ -0,0 +1,62 @@
+#! /usr/bin/env python
+"""
+:Authors: David Goodger, Ueli Schlaepfer
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+This package contains modules for standard tree transforms available
+to Docutils components. Tree transforms serve a variety of purposes:
+
+- To tie up certain syntax-specific "loose ends" that remain after the
+ initial parsing of the input plaintext. These transforms are used to
+ supplement a limited syntax.
+
+- To automate the internal linking of the document tree (hyperlink
+ references, footnote references, etc.).
+
+- To extract useful information from the document tree. These
+ transforms may be used to construct (for example) indexes and tables
+ of contents.
+
+Each transform is an optional step that a Docutils Reader may choose to
+perform on the parsed document, depending on the input context. A Docutils
+Reader may also perform Reader-specific transforms before or after performing
+these standard transforms.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+from docutils import languages
+
+
+class TransformError(Exception): pass
+
+
+class Transform:
+
+ """
+ Docutils transform component abstract base class.
+ """
+
+ def __init__(self, doctree, startnode=None):
+ """
+ Initial setup for in-place document transforms.
+ """
+
+ self.doctree = doctree
+ """The document tree to transform."""
+
+ self.startnode = startnode
+ """Node from which to begin the transform. For many transforms which
+ apply to the document as a whole, `startnode` is not set (i.e. its
+ value is `None`)."""
+
+ self.language = languages.getlanguage(doctree.languagecode)
+ """Language module local to this document."""
+
+ def transform(self):
+ """Override to transform the document tree."""
+ raise NotImplementedError('subclass must override this method')
diff --git a/docutils/transforms/components.py b/docutils/transforms/components.py
new file mode 100644
index 000000000..2cfe4d2a8
--- /dev/null
+++ b/docutils/transforms/components.py
@@ -0,0 +1,85 @@
+#! /usr/bin/env python
+"""
+:Authors: David Goodger, Ueli Schlaepfer
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Transforms related to document components.
+
+- `Contents`: Used to build a table of contents.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+import re
+from docutils import nodes, utils
+from docutils.transforms import TransformError, Transform
+
+
+class Contents(Transform):
+
+ """
+ This transform generates a table of contents from the entire document tree
+ or from a single branch. It locates "section" elements and builds them
+ into a nested bullet list, which is placed within a "topic". A title is
+ either explicitly specified, taken from the appropriate language module,
+ or omitted (local table of contents). The depth may be specified.
+ Two-way references between the table of contents and section titles are
+ generated (requires Writer support).
+
+ This transform requires a startnode, which which contains generation
+ options and provides the location for the generated table of contents (the
+ startnode is replaced by the table of contents "topic").
+ """
+
+ def transform(self):
+ topic = nodes.topic(CLASS='contents')
+ title = self.startnode.details['title']
+ if self.startnode.details.has_key('local'):
+ startnode = self.startnode.parent
+ # @@@ generate an error if the startnode (directive) not at
+ # section/document top-level? Drag it up until it is?
+ while not isinstance(startnode, nodes.Structural):
+ startnode = startnode.parent
+ if not title:
+ title = []
+ else:
+ startnode = self.doctree
+ if not title:
+ title = nodes.title('', self.language.labels['contents'])
+ contents = self.build_contents(startnode)
+ if len(contents):
+ topic += title
+ topic += contents
+ self.startnode.parent.replace(self.startnode, topic)
+ else:
+ self.startnode.parent.remove(self.startnode)
+
+ def build_contents(self, node, level=0):
+ level += 1
+ sections = []
+ i = len(node) - 1
+ while i >= 0 and isinstance(node[i], nodes.section):
+ sections.append(node[i])
+ i -= 1
+ sections.reverse()
+ entries = []
+ for section in sections:
+ title = section[0]
+ reference = nodes.reference('', '', refid=section['id'],
+ *title.getchildren())
+ entry = nodes.paragraph('', '', reference)
+ item = nodes.list_item('', entry)
+ itemid = self.doctree.set_id(item)
+ title['refid'] = itemid
+ if (not self.startnode.details.has_key('depth')) \
+ or level < self.startnode.details['depth']:
+ subsects = self.build_contents(section, level)
+ item += subsects
+ entries.append(item)
+ if entries:
+ entries = nodes.bullet_list('', *entries)
+ return entries
diff --git a/docutils/transforms/frontmatter.py b/docutils/transforms/frontmatter.py
new file mode 100644
index 000000000..0a8068fad
--- /dev/null
+++ b/docutils/transforms/frontmatter.py
@@ -0,0 +1,375 @@
+#! /usr/bin/env python
+"""
+:Authors: David Goodger, Ueli Schlaepfer
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Transforms related to the front matter of a document (information
+found before the main text):
+
+- `DocTitle`: Used to transform a lone top level section's title to
+ the document title, and promote a remaining lone top-level section's
+ title to the document subtitle.
+
+- `DocInfo`: Used to transform a bibliographic field list into docinfo
+ elements.
+"""
+
+__docformat__ = 'reStructuredText'
+
+import re
+from docutils import nodes, utils
+from docutils.transforms import TransformError, Transform
+
+
+class DocTitle(Transform):
+
+ """
+ In reStructuredText_, there is no way to specify a document title
+ and subtitle explicitly. Instead, we can supply the document title
+ (and possibly the subtitle as well) implicitly, and use this
+ two-step transform to "raise" or "promote" the title(s) (and their
+ corresponding section contents) to the document level.
+
+ 1. If the document contains a single top-level section as its
+ first non-comment element, the top-level section's title
+ becomes the document's title, and the top-level section's
+ contents become the document's immediate contents. The lone
+ top-level section header must be the first non-comment element
+ in the document.
+
+ For example, take this input text::
+
+ =================
+ Top-Level Title
+ =================
+
+ A paragraph.
+
+ Once parsed, it looks like this::
+
+
+
+
+ Top-Level Title
+
+ A paragraph.
+
+ After running the DocTitle transform, we have::
+
+
+
+ Top-Level Title
+
+ A paragraph.
+
+ 2. If step 1 successfully determines the document title, we
+ continue by checking for a subtitle.
+
+ If the lone top-level section itself contains a single
+ second-level section as its first non-comment element, that
+ section's title is promoted to the document's subtitle, and
+ that section's contents become the document's immediate
+ contents. Given this input text::
+
+ =================
+ Top-Level Title
+ =================
+
+ Second-Level Title
+ ~~~~~~~~~~~~~~~~~~
+
+ A paragraph.
+
+ After parsing and running the Section Promotion transform, the
+ result is::
+
+
+
+ Top-Level Title
+
+ Second-Level Title
+
+ A paragraph.
+
+ (Note that the implicit hyperlink target generated by the
+ "Second-Level Title" is preserved on the "subtitle" element
+ itself.)
+
+ Any comment elements occurring before the document title or
+ subtitle are accumulated and inserted as the first body elements
+ after the title(s).
+ """
+
+ def transform(self):
+ if self.promote_document_title():
+ self.promote_document_subtitle()
+
+ def promote_document_title(self):
+ section, index = self.candidate_index()
+ if index is None:
+ return None
+ doctree = self.doctree
+ # Transfer the section's attributes to the document element (at root):
+ doctree.attributes.update(section.attributes)
+ doctree[:] = (section[:1] # section title
+ + doctree[:index] # everything that was in the document
+ # before the section
+ + section[1:]) # everything that was in the section
+ return 1
+
+ def promote_document_subtitle(self):
+ subsection, index = self.candidate_index()
+ if index is None:
+ return None
+ subtitle = nodes.subtitle()
+ # Transfer the subsection's attributes to the new subtitle:
+ subtitle.attributes.update(subsection.attributes)
+ # Transfer the contents of the subsection's title to the subtitle:
+ subtitle[:] = subsection[0][:]
+ doctree = self.doctree
+ doctree[:] = (doctree[:1] # document title
+ + [subtitle]
+ + doctree[1:index] # everything that was in the document
+ # before the section
+ + subsection[1:]) # everything that was in the subsection
+ return 1
+
+ def candidate_index(self):
+ """
+ Find and return the promotion candidate and its index.
+
+ Return (None, None) if no valid candidate was found.
+ """
+ doctree = self.doctree
+ index = doctree.findnonclass(nodes.PreBibliographic)
+ if index is None or len(doctree) > (index + 1) or \
+ not isinstance(doctree[index], nodes.section):
+ return None, None
+ else:
+ return doctree[index], index
+
+
+class DocInfo(Transform):
+
+ """
+ This transform is specific to the reStructuredText_ markup syntax;
+ see "Bibliographic Fields" in the `reStructuredText Markup
+ Specification`_ for a high-level description. This transform
+ should be run *after* the `DocTitle` transform.
+
+ Given a field list as the first non-comment element after the
+ document title and subtitle (if present), registered bibliographic
+ field names are transformed to the corresponding DTD elements,
+ becoming child elements of the "docinfo" element (except for the
+ abstract, which becomes a "topic" element after "docinfo").
+
+ For example, given this document fragment after parsing::
+
+
+
+ Document Title
+
+
+
+ Author
+
+
+ A. Name
+
+
+ Status
+
+
+ $RCSfile$
+ ...
+
+ After running the bibliographic field list transform, the
+ resulting document tree would look like this::
+
+
+
+ Document Title
+
+
+ A. Name
+
+ frontmatter.py
+ ...
+
+ The "Status" field contained an expanded RCS keyword, which is
+ normally (but optionally) cleaned up by the transform. The sole
+ contents of the field body must be a paragraph containing an
+ expanded RCS keyword of the form "$keyword: expansion text $". Any
+ RCS keyword can be processed in any bibliographic field. The
+ dollar signs and leading RCS keyword name are removed. Extra
+ processing is done for the following RCS keywords:
+
+ - "RCSfile" expands to the name of the file in the RCS or CVS
+ repository, which is the name of the source file with a ",v"
+ suffix appended. The transform will remove the ",v" suffix.
+
+ - "Date" expands to the format "YYYY/MM/DD hh:mm:ss" (in the UTC
+ time zone). The RCS Keywords transform will extract just the
+ date itself and transform it to an ISO 8601 format date, as in
+ "2000-12-31".
+
+ (Since the source file for this text is itself stored under CVS,
+ we can't show an example of the "Date" RCS keyword because we
+ can't prevent any RCS keywords used in this explanation from
+ being expanded. Only the "RCSfile" keyword is stable; its
+ expansion text changes only if the file name changes.)
+ """
+
+ def transform(self):
+ doctree = self.doctree
+ index = doctree.findnonclass(nodes.PreBibliographic)
+ if index is None:
+ return
+ candidate = doctree[index]
+ if isinstance(candidate, nodes.field_list):
+ biblioindex = doctree.findnonclass(nodes.Titular)
+ nodelist, remainder = self.extract_bibliographic(candidate)
+ if remainder:
+ doctree[index] = remainder
+ else:
+ del doctree[index]
+ doctree[biblioindex:biblioindex] = nodelist
+ return
+
+ def extract_bibliographic(self, field_list):
+ docinfo = nodes.docinfo()
+ remainder = []
+ bibliofields = self.language.bibliographic_fields
+ abstract = None
+ for field in field_list:
+ try:
+ name = field[0][0].astext()
+ normedname = utils.normname(name)
+ if not (len(field) == 2 and bibliofields.has_key(normedname)
+ and self.check_empty_biblio_field(field, name)):
+ raise TransformError
+ biblioclass = bibliofields[normedname]
+ if issubclass(biblioclass, nodes.TextElement):
+ if not self.check_compound_biblio_field(field, name):
+ raise TransformError
+ self.filter_rcs_keywords(field[1][0])
+ docinfo.append(biblioclass('', '', *field[1][0]))
+ else: # multiple body elements possible
+ if issubclass(biblioclass, nodes.authors):
+ self.extract_authors(field, name, docinfo)
+ elif issubclass(biblioclass, nodes.topic):
+ if abstract:
+ field[-1] += self.doctree.reporter.warning(
+ 'There can only be one abstract.')
+ raise TransformError
+ title = nodes.title(
+ name, self.language.labels['abstract'])
+ abstract = nodes.topic('', title, CLASS='abstract',
+ *field[1].children)
+ else:
+ docinfo.append(biblioclass('', *field[1].children))
+ except TransformError:
+ remainder.append(field)
+ continue
+ nodelist = []
+ if len(docinfo) != 0:
+ nodelist.append(docinfo)
+ if abstract:
+ nodelist.append(abstract)
+ if remainder:
+ field_list[:] = remainder
+ else:
+ field_list = None
+ return nodelist, field_list
+
+ def check_empty_biblio_field(self, field, name):
+ if len(field[1]) < 1:
+ field[-1] += self.doctree.reporter.warning(
+ 'Cannot extract empty bibliographic field "%s".' % name)
+ return None
+ return 1
+
+ def check_compound_biblio_field(self, field, name):
+ if len(field[1]) > 1:
+ field[-1] += self.doctree.reporter.warning(
+ 'Cannot extract compound bibliographic field "%s".' % name)
+ return None
+ if not isinstance(field[1][0], nodes.paragraph):
+ field[-1] += self.doctree.reporter.warning(
+ 'Cannot extract bibliographic field "%s" containing anything '
+ 'other than a single paragraph.'
+ % name)
+ return None
+ return 1
+
+ rcs_keyword_substitutions = [
+ (re.compile(r'\$' r'Date: (\d\d\d\d)/(\d\d)/(\d\d) [\d:]+ \$$',
+ re.IGNORECASE), r'\1-\2-\3'),
+ (re.compile(r'\$' r'RCSfile: (.+),v \$$',
+ re.IGNORECASE), r'\1'),
+ (re.compile(r'\$[a-zA-Z]+: (.+) \$$'), r'\1'),]
+
+ def filter_rcs_keywords(self, paragraph):
+ if len(paragraph) == 1 and isinstance(paragraph[0], nodes.Text):
+ textnode = paragraph[0]
+ for pattern, substitution in self.rcs_keyword_substitutions:
+ match = pattern.match(textnode.data)
+ if match:
+ textnode.data = pattern.sub(substitution, textnode.data)
+ return
+
+ def extract_authors(self, field, name, docinfo):
+ try:
+ if len(field[1]) == 1:
+ if isinstance(field[1][0], nodes.paragraph):
+ authors = self.authors_from_one_paragraph(field)
+ elif isinstance(field[1][0], nodes.bullet_list):
+ authors = self.authors_from_bullet_list(field)
+ else:
+ raise TransformError
+ else:
+ authors = self.authors_from_paragraphs(field)
+ authornodes = [nodes.author('', '', *author)
+ for author in authors if author]
+ docinfo.append(nodes.authors('', *authornodes))
+ except TransformError:
+ field[-1] += self.doctree.reporter.warning(
+ 'Bibliographic field "%s" incompatible with extraction: '
+ 'it must contain either a single paragraph (with authors '
+ 'separated by one of "%s"), multiple paragraphs (one per '
+ 'author), or a bullet list with one paragraph (one author) '
+ 'per item.'
+ % (name, ''.join(self.language.author_separators)))
+ raise
+
+ def authors_from_one_paragraph(self, field):
+ text = field[1][0].astext().strip()
+ if not text:
+ raise TransformError
+ for authorsep in self.language.author_separators:
+ authornames = text.split(authorsep)
+ if len(authornames) > 1:
+ break
+ authornames = [author.strip() for author in authornames]
+ authors = [[nodes.Text(author)] for author in authornames]
+ return authors
+
+ def authors_from_bullet_list(self, field):
+ authors = []
+ for item in field[1][0]:
+ if len(item) != 1 or not isinstance(item[0], nodes.paragraph):
+ raise TransformError
+ authors.append(item[0].children)
+ if not authors:
+ raise TransformError
+ return authors
+
+ def authors_from_paragraphs(self, field):
+ for item in field[1]:
+ if not isinstance(item, nodes.paragraph):
+ raise TransformError
+ authors = [item.children for item in field[1]]
+ return authors
diff --git a/docutils/transforms/references.py b/docutils/transforms/references.py
new file mode 100644
index 000000000..c2ff9189b
--- /dev/null
+++ b/docutils/transforms/references.py
@@ -0,0 +1,670 @@
+#! /usr/bin/env python
+"""
+:Authors: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Transforms for resolving references:
+
+- `Hyperlinks`: Used to resolve hyperlink targets and references.
+- `Footnotes`: Resolve footnote numbering and references.
+- `Substitutions`: Resolve substitutions.
+"""
+
+__docformat__ = 'reStructuredText'
+
+import re
+from docutils import nodes, utils
+from docutils.transforms import TransformError, Transform
+
+
+class Hyperlinks(Transform):
+
+ """Resolve the various types of hyperlink targets and references."""
+
+ def transform(self):
+ stages = []
+ #stages.append('Beginning of references.Hyperlinks.transform()\n' + self.doctree.pformat())
+ self.resolve_chained_targets()
+ #stages.append('After references.Hyperlinks.resolve_chained_targets()\n' + self.doctree.pformat())
+ self.resolve_anonymous()
+ #stages.append('After references.Hyperlinks.resolve_anonymous()\n' + self.doctree.pformat())
+ self.resolve_indirect()
+ #stages.append('After references.Hyperlinks.resolve_indirect()\n' + self.doctree.pformat())
+ self.resolve_external_targets()
+ #stages.append('After references.Hyperlinks.resolve_external_references()\n' + self.doctree.pformat())
+ self.resolve_internal_targets()
+ #stages.append('After references.Hyperlinks.resolve_internal_references()\n' + self.doctree.pformat())
+ #import difflib
+ #compare = difflib.Differ().compare
+ #for i in range(len(stages) - 1):
+ # print ''.join(compare(stages[i].splitlines(1), stages[i+1].splitlines(1)))
+
+ def resolve_chained_targets(self):
+ """
+ Attributes "refuri" and "refname" are migrated from the final direct
+ target up the chain of contiguous adjacent internal targets, using
+ `ChainedTargetResolver`.
+ """
+ visitor = ChainedTargetResolver(self.doctree)
+ self.doctree.walk(visitor)
+
+ def resolve_anonymous(self):
+ """
+ Link anonymous references to targets. Given::
+
+
+
+ internal
+
+ external
+
+
+
+ Corresponding references are linked via "refid" or resolved via
+ "refuri"::
+
+
+
+ text
+
+ external
+
+
+ """
+ if len(self.doctree.anonymous_refs) \
+ != len(self.doctree.anonymous_targets):
+ msg = self.doctree.reporter.error(
+ 'Anonymous hyperlink mismatch: %s references but %s targets.'
+ % (len(self.doctree.anonymous_refs),
+ len(self.doctree.anonymous_targets)))
+ self.doctree.messages += msg
+ msgid = self.doctree.set_id(msg)
+ for ref in self.doctree.anonymous_refs:
+ prb = nodes.problematic(
+ ref.rawsource, ref.rawsource, refid=msgid)
+ prbid = self.doctree.set_id(prb)
+ msg.add_backref(prbid)
+ ref.parent.replace(ref, prb)
+ return
+ for i in range(len(self.doctree.anonymous_refs)):
+ ref = self.doctree.anonymous_refs[i]
+ target = self.doctree.anonymous_targets[i]
+ if target.hasattr('refuri'):
+ ref['refuri'] = target['refuri']
+ ref.resolved = 1
+ else:
+ ref['refid'] = target['id']
+ self.doctree.note_refid(ref)
+ target.referenced = 1
+
+ def resolve_indirect(self):
+ """
+ a) Indirect external references::
+
+
+
+ indirect external
+
+
+
+ The "refuri" attribute is migrated back to all indirect targets from
+ the final direct target (i.e. a target not referring to another
+ indirect target)::
+
+
+
+ indirect external
+
+
+
+ Once the attribute is migrated, the preexisting "refname" attribute
+ is dropped.
+
+ b) Indirect internal references::
+
+
+
+
+ indirect internal
+
+
+
+ Targets which indirectly refer to an internal target become one-hop
+ indirect (their "refid" attributes are directly set to the internal
+ target's "id"). References which indirectly refer to an internal
+ target become direct internal references::
+
+
+
+
+ indirect internal
+
+
+ """
+ #import mypdb as pdb
+ #pdb.set_trace()
+ for target in self.doctree.indirect_targets:
+ if not target.resolved:
+ self.resolve_indirect_target(target)
+ self.resolve_indirect_references(target)
+
+ def resolve_indirect_target(self, target):
+ refname = target['refname']
+ reftarget = None
+ if self.doctree.explicit_targets.has_key(refname):
+ reftarget = self.doctree.explicit_targets[refname]
+ elif self.doctree.implicit_targets.has_key(refname):
+ reftarget = self.doctree.implicit_targets[refname]
+ if not reftarget:
+ self.nonexistent_indirect_target(target)
+ return
+ if isinstance(reftarget, nodes.target) \
+ and not reftarget.resolved and reftarget.hasattr('refname'):
+ self.one_indirect_target(reftarget) # multiply indirect
+ if reftarget.hasattr('refuri'):
+ target['refuri'] = reftarget['refuri']
+ if target.hasattr('name'):
+ self.doctree.note_external_target(target)
+ elif reftarget.hasattr('refid'):
+ target['refid'] = reftarget['refid']
+ self.doctree.note_refid(target)
+ else:
+ try:
+ target['refid'] = reftarget['id']
+ self.doctree.note_refid(target)
+ except KeyError:
+ self.nonexistent_indirect_target(target)
+ return
+ del target['refname']
+ target.resolved = 1
+ reftarget.referenced = 1
+
+ def nonexistent_indirect_target(self, target):
+ naming = ''
+ if target.hasattr('name'):
+ naming = '"%s" ' % target['name']
+ reflist = self.doctree.refnames[target['name']]
+ else:
+ reflist = self.doctree.refnames[target['id']]
+ naming += '(id="%s")' % target['id']
+ msg = self.doctree.reporter.warning(
+ 'Indirect hyperlink target %s refers to target "%s", '
+ 'which does not exist.' % (naming, target['refname']))
+ self.doctree.messages += msg
+ msgid = self.doctree.set_id(msg)
+ for ref in reflist:
+ prb = nodes.problematic(
+ ref.rawsource, ref.rawsource, refid=msgid)
+ prbid = self.doctree.set_id(prb)
+ msg.add_backref(prbid)
+ ref.parent.replace(ref, prb)
+ target.resolved = 1
+
+ def resolve_indirect_references(self, target):
+ if target.hasattr('refid'):
+ attname = 'refid'
+ call_if_named = 0
+ call_method = self.doctree.note_refid
+ elif target.hasattr('refuri'):
+ attname = 'refuri'
+ call_if_named = 1
+ call_method = self.doctree.note_external_target
+ else:
+ return
+ attval = target[attname]
+ if target.hasattr('name'):
+ name = target['name']
+ try:
+ reflist = self.doctree.refnames[name]
+ except KeyError, instance:
+ if target.referenced:
+ return
+ msg = self.doctree.reporter.info(
+ 'Indirect hyperlink target "%s" is not referenced.'
+ % name)
+ self.doctree.messages += msg
+ target.referenced = 1
+ return
+ delatt = 'refname'
+ else:
+ id = target['id']
+ try:
+ reflist = self.doctree.refids[id]
+ except KeyError, instance:
+ if target.referenced:
+ return
+ msg = self.doctree.reporter.info(
+ 'Indirect hyperlink target id="%s" is not referenced.'
+ % id)
+ self.doctree.messages += msg
+ target.referenced = 1
+ return
+ delatt = 'refid'
+ for ref in reflist:
+ if ref.resolved:
+ continue
+ del ref[delatt]
+ ref[attname] = attval
+ if not call_if_named or ref.hasattr('name'):
+ call_method(ref)
+ ref.resolved = 1
+ if isinstance(ref, nodes.target):
+ self.resolve_indirect_references(ref)
+ target.referenced = 1
+
+ def resolve_external_targets(self):
+ """
+ Given::
+
+
+
+ direct external
+
+
+ The "refname" attribute is replaced by the direct "refuri" attribute::
+
+
+
+ direct external
+
+ """
+ for target in self.doctree.external_targets:
+ if target.hasattr('refuri') and target.hasattr('name'):
+ name = target['name']
+ refuri = target['refuri']
+ try:
+ reflist = self.doctree.refnames[name]
+ except KeyError, instance:
+ if target.referenced:
+ continue
+ msg = self.doctree.reporter.info(
+ 'External hyperlink target "%s" is not referenced.'
+ % name)
+ self.doctree.messages += msg
+ target.referenced = 1
+ continue
+ for ref in reflist:
+ if ref.resolved:
+ continue
+ del ref['refname']
+ ref['refuri'] = refuri
+ ref.resolved = 1
+ target.referenced = 1
+
+ def resolve_internal_targets(self):
+ """
+ Given::
+
+
+
+ direct internal
+
+
+ The "refname" attribute is replaced by "refid" linking to the target's
+ "id"::
+
+
+
+ direct internal
+
+ """
+ for target in self.doctree.internal_targets:
+ if target.hasattr('refuri') or target.hasattr('refid') \
+ or not target.hasattr('name'):
+ continue
+ name = target['name']
+ refid = target['id']
+ try:
+ reflist = self.doctree.refnames[name]
+ except KeyError, instance:
+ if target.referenced:
+ continue
+ msg = self.doctree.reporter.info(
+ 'Internal hyperlink target "%s" is not referenced.'
+ % name)
+ self.doctree.messages += msg
+ target.referenced = 1
+ continue
+ for ref in reflist:
+ if ref.resolved:
+ continue
+ del ref['refname']
+ ref['refid'] = refid
+ ref.resolved = 1
+ target.referenced = 1
+
+
+class ChainedTargetResolver(nodes.NodeVisitor):
+
+ """
+ Copy reference attributes up the length of a hyperlink target chain.
+
+ "Chained targets" are multiple adjacent internal hyperlink targets which
+ "point to" an external or indirect target. After the transform, all
+ chained targets will effectively point to the same place.
+
+ Given the following ``doctree`` as input::
+
+
+
+
+
+
+
+ I'm known as "d".
+
+
+
+
+ ``ChainedTargetResolver(doctree).walk()`` will transform the above into::
+
+
+
+
+
+
+
+ I'm known as "d".
+
+
+
+ """
+
+ def unknown_visit(self, node):
+ pass
+
+ def visit_target(self, node):
+ if node.hasattr('refuri'):
+ attname = 'refuri'
+ call_if_named = self.doctree.note_external_target
+ elif node.hasattr('refname'):
+ attname = 'refname'
+ call_if_named = self.doctree.note_indirect_target
+ elif node.hasattr('refid'):
+ attname = 'refid'
+ call_if_named = None
+ else:
+ return
+ attval = node[attname]
+ index = node.parent.index(node)
+ for i in range(index - 1, -1, -1):
+ sibling = node.parent[i]
+ if not isinstance(sibling, nodes.target) \
+ or sibling.hasattr('refuri') \
+ or sibling.hasattr('refname') \
+ or sibling.hasattr('refid'):
+ break
+ sibling[attname] = attval
+ if sibling.hasattr('name') and call_if_named:
+ call_if_named(sibling)
+
+
+class Footnotes(Transform):
+
+ """
+ Assign numbers to autonumbered footnotes, and resolve links to footnotes,
+ citations, and their references.
+
+ Given the following ``doctree`` as input::
+
+
+
+ A labeled autonumbered footnote referece:
+
+
+ An unlabeled autonumbered footnote referece:
+
+
+
+ Unlabeled autonumbered footnote.
+
+
+ Labeled autonumbered footnote.
+
+ Auto-numbered footnotes have attribute ``auto="1"`` and no label.
+ Auto-numbered footnote_references have no reference text (they're
+ empty elements). When resolving the numbering, a ``label`` element
+ is added to the beginning of the ``footnote``, and reference text
+ to the ``footnote_reference``.
+
+ The transformed result will be::
+
+
+
+ A labeled autonumbered footnote referece:
+
+ 2
+
+ An unlabeled autonumbered footnote referece:
+
+ 1
+
+
')
+ if node.hasattr('backrefs'):
+ backrefs = node['backrefs']
+ if len(backrefs) == 1:
+ self.body.append('%s '
+ '(level %s system message)
\n'
+ % (backrefs[0], node['type'], node['level']))
+ else:
+ i = 1
+ backlinks = []
+ for backref in backrefs:
+ backlinks.append('%s' % (backref, i))
+ i += 1
+ self.body.append('%s (%s; level %s system message)\n'
+ % (node['type'], '|'.join(backlinks),
+ node['level']))
+ else:
+ self.body.append('%s (level %s system message)\n'
+ % (node['type'], node['level']))
+
+ def depart_system_message(self, node):
+ self.body.append('\n')
+
+ def visit_table(self, node):
+ self.body.append(
+ self.starttag(node, 'table', frame='border', rules='all'))
+
+ def depart_table(self, node):
+ self.body.append('\n')
+
+ def visit_target(self, node):
+ if not (node.has_key('refuri') or node.has_key('refid')
+ or node.has_key('refname')):
+ self.body.append(self.starttag(node, 'a', '', CLASS='target'))
+ self.context.append('')
+ else:
+ self.context.append('')
+
+ def depart_target(self, node):
+ self.body.append(self.context.pop())
+
+ def visit_tbody(self, node):
+ self.body.append(self.context.pop()) # '\n' or ''
+ self.body.append(self.starttag(node, 'tbody', valign='top'))
+
+ def depart_tbody(self, node):
+ self.body.append('\n')
+
+ def visit_term(self, node):
+ self.body.append(self.starttag(node, 'dt', ''))
+
+ def depart_term(self, node):
+ """
+ Leave the end tag to `self.visit_definition()`, in case there's a
+ classifier.
+ """
+ pass
+
+ def visit_tgroup(self, node):
+ self.body.append(self.starttag(node, 'colgroup'))
+ self.context.append('\n')
+
+ def depart_tgroup(self, node):
+ pass
+
+ def visit_thead(self, node):
+ self.body.append(self.context.pop()) # '\n'
+ self.context.append('')
+ self.body.append(self.starttag(node, 'thead', valign='bottom'))
+
+ def depart_thead(self, node):
+ self.body.append('\n')
+
+ def visit_tip(self, node):
+ self.visit_admonition(node, 'tip')
+
+ def depart_tip(self, node):
+ self.depart_admonition()
+
+ def visit_title(self, node):
+ """Only 6 section levels are supported by HTML."""
+ if isinstance(node.parent, nodes.topic):
+ self.body.append(
+ self.starttag(node, 'P', '', CLASS='topic-title'))
+ self.context.append('\n')
+ elif self.sectionlevel == 0:
+ self.head.append('%s\n'
+ % self.encode(node.astext()))
+ self.body.append(self.starttag(node, 'H1', '', CLASS='title'))
+ self.context.append('\n')
+ else:
+ self.body.append(
+ self.starttag(node, 'H%s' % self.sectionlevel, ''))
+ context = ''
+ if node.hasattr('refid'):
+ self.body.append('' % node['refid'])
+ context = ''
+ self.context.append('%s\n' % (context, self.sectionlevel))
+
+ def depart_title(self, node):
+ self.body.append(self.context.pop())
+
+ def visit_topic(self, node):
+ self.body.append(self.starttag(node, 'div', CLASS='topic'))
+ self.topic_class = node.get('class')
+
+ def depart_topic(self, node):
+ self.body.append('\n')
+ self.topic_class = ''
+
+ def visit_transition(self, node):
+ self.body.append(self.starttag(node, 'hr'))
+
+ def depart_transition(self, node):
+ pass
+
+ def visit_version(self, node):
+ self.visit_docinfo_item(node, 'version')
+
+ def depart_version(self, node):
+ self.depart_docinfo_item()
+
+ def visit_warning(self, node):
+ self.visit_admonition(node, 'warning')
+
+ def depart_warning(self, node):
+ self.depart_admonition()
+
+ def unimplemented_visit(self, node):
+ raise NotImplementedError('visiting unimplemented node type: %s'
+ % node.__class__.__name__)
diff --git a/docutils/writers/pprint.py b/docutils/writers/pprint.py
new file mode 100644
index 000000000..a34c2a920
--- /dev/null
+++ b/docutils/writers/pprint.py
@@ -0,0 +1,28 @@
+#! /usr/bin/env python
+
+"""
+:Authors: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Simple internal document tree Writer, writes indented pseudo-XML.
+"""
+
+__docformat__ = 'reStructuredText'
+
+
+from docutils import writers
+
+
+class Writer(writers.Writer):
+
+ output = None
+ """Final translated form of `document`."""
+
+ def translate(self):
+ self.output = self.document.pformat()
+
+ def record(self):
+ self.recordfile(self.output, self.destination)
diff --git a/install.py b/install.py
new file mode 100755
index 000000000..be9ed238b
--- /dev/null
+++ b/install.py
@@ -0,0 +1,20 @@
+#!/usr/bin/env python
+# $Id$
+
+"""
+This is a quick & dirty installation shortcut. It is equivalent to the
+command::
+
+ python setup.py install
+
+However, the shortcut lacks error checking!
+"""
+
+from distutils import core
+from setup import do_setup
+
+if __name__ == '__main__' :
+ core._setup_stop_after = 'config'
+ dist = do_setup()
+ dist.commands = ['install']
+ dist.run_commands()
diff --git a/setup.py b/setup.py
new file mode 100755
index 000000000..23aa0ce4a
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,24 @@
+#!/usr/bin/env python
+# $Id$
+
+from distutils.core import setup
+
+def do_setup():
+ dist = setup(
+ name = 'Docutils',
+ description = 'Python Documentation Utilities',
+ #long_description = '',
+ url = 'http://docutils.sourceforge.net/',
+ version = 'pre-0.1',
+ author = 'David Goodger',
+ author_email = 'goodger@users.sourceforge.net',
+ license = 'public domain, Python (see COPYING.txt)',
+ packages = ['docutils', 'docutils.readers', 'docutils.writers',
+ 'docutils.transforms', 'docutils.languages',
+ 'docutils.parsers', 'docutils.parsers.restructuredtext',
+ 'docutils.parsers.restructuredtext.directives',
+ 'docutils.parsers.restructuredtext.languages'])
+ return dist
+
+if __name__ == '__main__' :
+ do_setup()
diff --git a/test/DocutilsTestSupport.py b/test/DocutilsTestSupport.py
new file mode 100644
index 000000000..766eafde6
--- /dev/null
+++ b/test/DocutilsTestSupport.py
@@ -0,0 +1,379 @@
+#! /usr/bin/env python
+
+"""
+:Authors: David Goodger; Garth Kidd
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Exports the following:
+
+:Modules:
+ - `statemachine` is 'docutils.statemachine'
+ - `nodes` is 'docutils.nodes'
+ - `urischemes` is 'docutils.urischemes'
+ - `utils` is 'docutils.utils'
+ - `transforms` is 'docutils.transforms'
+ - `states` is 'docutils.parsers.rst.states'
+ - `tableparser` is 'docutils.parsers.rst.tableparser'
+
+:Classes:
+ - `CustomTestSuite`
+ - `CustomTestCase`
+ - `ParserTestSuite`
+ - `ParserTestCase`
+ - `TableParserTestSuite`
+ - `TableParserTestCase`
+"""
+__docformat__ = 'reStructuredText'
+
+import UnitTestFolder
+import sys, os, unittest, difflib, inspect, os, sys
+from pprint import pformat
+import docutils
+from docutils import statemachine, nodes, urischemes, utils, transforms
+from docutils.transforms import universal
+from docutils.parsers import rst
+from docutils.parsers.rst import states, tableparser, directives, languages
+from docutils.statemachine import string2lines
+
+try:
+ import mypdb as pdb
+except:
+ import pdb
+
+
+class CustomTestSuite(unittest.TestSuite):
+
+ """
+ A collection of custom TestCases.
+
+ """
+
+ id = ''
+ """Identifier for the TestSuite. Prepended to the
+ TestCase identifiers to make identification easier."""
+
+ nextTestCaseId = 0
+ """The next identifier to use for non-identified test cases."""
+
+ def __init__(self, tests=(), id=None):
+ """
+ Initialize the CustomTestSuite.
+
+ Arguments:
+
+ id -- identifier for the suite, prepended to test cases.
+ """
+ unittest.TestSuite.__init__(self, tests)
+ if id is None:
+ outerframes = inspect.getouterframes(inspect.currentframe())
+ mypath = outerframes[0][1]
+ for outerframe in outerframes[1:]:
+ if outerframe[3] != '__init__':
+ callerpath = outerframe[1]
+ break
+ mydir, myname = os.path.split(mypath)
+ if not mydir:
+ mydir = os.curdir
+ if callerpath.startswith(mydir):
+ self.id = callerpath[len(mydir) + 1:] # caller's module
+ else:
+ self.id = callerpath
+ else:
+ self.id = id
+
+ def addTestCase(self, testCaseClass, methodName, input, expected,
+ id=None, runInDebugger=0, shortDescription=None,
+ **kwargs):
+ """
+ Create a custom TestCase in the CustomTestSuite.
+ Also return it, just in case.
+
+ Arguments:
+
+ testCaseClass --
+ methodName --
+ input -- input to the parser.
+ expected -- expected output from the parser.
+ id -- unique test identifier, used by the test framework.
+ runInDebugger -- if true, run this test under the pdb debugger.
+ shortDescription -- override to default test description.
+ """
+ if id is None: # generate id if required
+ id = self.nextTestCaseId
+ self.nextTestCaseId += 1
+ # test identifier will become suiteid.testid
+ tcid = '%s: %s' % (self.id, id)
+ # generate and add test case
+ tc = testCaseClass(methodName, input, expected, tcid,
+ runInDebugger=runInDebugger,
+ shortDescription=shortDescription,
+ **kwargs)
+ self.addTest(tc)
+ return tc
+
+
+class CustomTestCase(unittest.TestCase):
+
+ compare = difflib.Differ().compare
+ """Comparison method shared by all subclasses."""
+
+ def __init__(self, methodName, input, expected, id,
+ runInDebugger=0, shortDescription=None):
+ """
+ Initialise the CustomTestCase.
+
+ Arguments:
+
+ methodName -- name of test method to run.
+ input -- input to the parser.
+ expected -- expected output from the parser.
+ id -- unique test identifier, used by the test framework.
+ runInDebugger -- if true, run this test under the pdb debugger.
+ shortDescription -- override to default test description.
+ """
+ self.id = id
+ self.input = input
+ self.expected = expected
+ self.runInDebugger = runInDebugger
+ # Ring your mother.
+ unittest.TestCase.__init__(self, methodName)
+
+ def __str__(self):
+ """
+ Return string conversion. Overridden to give test id, in addition to
+ method name.
+ """
+ return '%s; %s' % (self.id, unittest.TestCase.__str__(self))
+
+ def compareOutput(self, input, output, expected):
+ """`input`, `output`, and `expected` should all be strings."""
+ try:
+ self.assertEquals('\n' + output, '\n' + expected)
+ except AssertionError:
+ print >>sys.stderr, '\n%s\ninput:' % (self,)
+ print >>sys.stderr, input
+ print >>sys.stderr, '-: expected\n+: output'
+ print >>sys.stderr, ''.join(self.compare(expected.splitlines(1),
+ output.splitlines(1)))
+ raise
+
+
+class TransformTestSuite(CustomTestSuite):
+
+ """
+ A collection of TransformTestCases.
+
+ A TransformTestSuite instance manufactures TransformTestCases,
+ keeps track of them, and provides a shared test fixture (a-la
+ setUp and tearDown).
+ """
+
+ def __init__(self, parser):
+ self.parser = parser
+ """Parser shared by all test cases."""
+
+ CustomTestSuite.__init__(self)
+
+ def generateTests(self, dict, dictname='totest',
+ testmethod='test_transforms'):
+ """
+ Stock the suite with test cases generated from a test data dictionary.
+
+ Each dictionary key (test type's name) maps to a list of transform
+ classes and list of tests. Each test is a list: input, expected
+ output, optional modifier. The optional third entry, a behavior
+ modifier, can be 0 (temporarily disable this test) or 1 (run this test
+ under the pdb debugger). Tests should be self-documenting and not
+ require external comments.
+ """
+ for name, (transforms, cases) in dict.items():
+ for casenum in range(len(cases)):
+ case = cases[casenum]
+ runInDebugger = 0
+ if len(case)==3:
+ if case[2]:
+ runInDebugger = 1
+ else:
+ continue
+ self.addTestCase(
+ TransformTestCase, testmethod,
+ transforms=transforms, parser=self.parser,
+ input=case[0], expected=case[1],
+ id='%s[%r][%s]' % (dictname, name, casenum),
+ runInDebugger=runInDebugger)
+
+
+class TransformTestCase(CustomTestCase):
+
+ """
+ Output checker for the transform.
+
+ Should probably be called TransformOutputChecker, but I can deal with
+ that later when/if someone comes up with a category of transform test
+ cases that have nothing to do with the input and output of the transform.
+ """
+
+ def __init__(self, *args, **kwargs):
+ self.transforms = kwargs['transforms']
+ """List of transforms to perform for this test case."""
+
+ self.parser = kwargs['parser']
+ """Input parser for this test case."""
+
+ del kwargs['transforms'], kwargs['parser'] # only wanted here
+ CustomTestCase.__init__(self, *args, **kwargs)
+
+ def test_transforms(self):
+ if self.runInDebugger:
+ pdb.set_trace()
+ doctree = utils.newdocument(warninglevel=5, errorlevel=5,
+ debug=UnitTestFolder.debug)
+ self.parser.parse(self.input, doctree)
+ for transformClass in (self.transforms + universal.test_transforms):
+ transformClass(doctree).transform()
+ output = doctree.pformat()
+ self.compareOutput(self.input, output, self.expected)
+
+ def test_transforms_verbosely(self):
+ if self.runInDebugger:
+ pdb.set_trace()
+ print '\n', self.id
+ print '-' * 70
+ print self.input
+ doctree = utils.newdocument(warninglevel=5, errorlevel=5,
+ debug=UnitTestFolder.debug)
+ self.parser.parse(self.input, doctree)
+ print '-' * 70
+ print doctree.pformat()
+ for transformClass in self.transforms:
+ transformClass(doctree).transform()
+ output = doctree.pformat()
+ print '-' * 70
+ print output
+ self.compareOutput(self.input, output, self.expected)
+
+
+class ParserTestSuite(CustomTestSuite):
+
+ """
+ A collection of ParserTestCases.
+
+ A ParserTestSuite instance manufactures ParserTestCases,
+ keeps track of them, and provides a shared test fixture (a-la
+ setUp and tearDown).
+ """
+
+ def generateTests(self, dict, dictname='totest'):
+ """
+ Stock the suite with test cases generated from a test data dictionary.
+
+ Each dictionary key (test type name) maps to a list of tests. Each
+ test is a list: input, expected output, optional modifier. The
+ optional third entry, a behavior modifier, can be 0 (temporarily
+ disable this test) or 1 (run this test under the pdb debugger). Tests
+ should be self-documenting and not require external comments.
+ """
+ for name, cases in dict.items():
+ for casenum in range(len(cases)):
+ case = cases[casenum]
+ runInDebugger = 0
+ if len(case)==3:
+ if case[2]:
+ runInDebugger = 1
+ else:
+ continue
+ self.addTestCase(
+ ParserTestCase, 'test_parser',
+ input=case[0], expected=case[1],
+ id='%s[%r][%s]' % (dictname, name, casenum),
+ runInDebugger=runInDebugger)
+
+
+class ParserTestCase(CustomTestCase):
+
+ """
+ Output checker for the parser.
+
+ Should probably be called ParserOutputChecker, but I can deal with
+ that later when/if someone comes up with a category of parser test
+ cases that have nothing to do with the input and output of the parser.
+ """
+
+ parser = rst.Parser()
+ """Parser shared by all ParserTestCases."""
+
+ def test_parser(self):
+ if self.runInDebugger:
+ pdb.set_trace()
+ document = utils.newdocument(warninglevel=5, errorlevel=5,
+ debug=UnitTestFolder.debug)
+ self.parser.parse(self.input, document)
+ output = document.pformat()
+ self.compareOutput(self.input, output, self.expected)
+
+
+class TableParserTestSuite(CustomTestSuite):
+
+ """
+ A collection of TableParserTestCases.
+
+ A TableParserTestSuite instance manufactures TableParserTestCases,
+ keeps track of them, and provides a shared test fixture (a-la
+ setUp and tearDown).
+ """
+
+ def generateTests(self, dict, dictname='totest'):
+ """
+ Stock the suite with test cases generated from a test data dictionary.
+
+ Each dictionary key (test type name) maps to a list of tests. Each
+ test is a list: an input table, expected output from parsegrid(),
+ expected output from parse(), optional modifier. The optional fourth
+ entry, a behavior modifier, can be 0 (temporarily disable this test)
+ or 1 (run this test under the pdb debugger). Tests should be
+ self-documenting and not require external comments.
+ """
+ for name, cases in dict.items():
+ for casenum in range(len(cases)):
+ case = cases[casenum]
+ runInDebugger = 0
+ if len(case) == 4:
+ if case[3]:
+ runInDebugger = 1
+ else:
+ continue
+ self.addTestCase(TableParserTestCase, 'test_parsegrid',
+ input=case[0], expected=case[1],
+ id='%s[%r][%s]' % (dictname, name, casenum),
+ runInDebugger=runInDebugger)
+ self.addTestCase(TableParserTestCase, 'test_parse',
+ input=case[0], expected=case[2],
+ id='%s[%r][%s]' % (dictname, name, casenum),
+ runInDebugger=runInDebugger)
+
+
+class TableParserTestCase(CustomTestCase):
+
+ parser = tableparser.TableParser()
+
+ def test_parsegrid(self):
+ self.parser.setup(string2lines(self.input))
+ try:
+ self.parser.findheadbodysep()
+ self.parser.parsegrid()
+ output = self.parser.cells
+ except Exception, details:
+ output = '%s: %s' % (details.__class__.__name__, details)
+ self.compareOutput(self.input, pformat(output) + '\n',
+ pformat(self.expected) + '\n')
+
+ def test_parse(self):
+ try:
+ output = self.parser.parse(string2lines(self.input))
+ except Exception, details:
+ output = '%s: %s' % (details.__class__.__name__, details)
+ self.compareOutput(self.input, pformat(output) + '\n',
+ pformat(self.expected) + '\n')
diff --git a/test/UnitTestFolder.py b/test/UnitTestFolder.py
new file mode 100644
index 000000000..529d8c7e8
--- /dev/null
+++ b/test/UnitTestFolder.py
@@ -0,0 +1,135 @@
+#! /usr/bin/env python
+
+"""
+:Author: Garth Kidd
+:Contact: garth@deadlybloodyserious.com
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+"""
+
+import sys, os, getopt, types, unittest, re
+
+
+# So that individual test modules can share a bit of state,
+# `UnitTestFolder` acts as an intermediary for the following
+# variables:
+debug = 0
+verbosity = 1
+
+USAGE = """\
+Usage: test_whatever [options]
+
+Options:
+ -h, --help Show this message
+ -v, --verbose Verbose output
+ -q, --quiet Minimal output
+ -d, --debug Debug mode
+"""
+
+def usageExit(msg=None):
+ """Print usage and exit."""
+ if msg:
+ print msg
+ print USAGE
+ sys.exit(2)
+
+def parseArgs(argv=sys.argv):
+ """Parse command line arguments and set TestFramework state.
+
+ State is to be acquired by test_* modules by a grotty hack:
+ ``from TestFramework import *``. For this stylistic
+ transgression, I expect to be first up against the wall
+ when the revolution comes. --Garth"""
+ global verbosity, debug
+ try:
+ options, args = getopt.getopt(argv[1:], 'hHvqd',
+ ['help', 'verbose', 'quiet', 'debug'])
+ for opt, value in options:
+ if opt in ('-h', '-H', '--help'):
+ usageExit()
+ if opt in ('-q', '--quiet'):
+ verbosity = 0
+ if opt in ('-v', '--verbose'):
+ verbosity = 2
+ if opt in ('-d', '--debug'):
+ debug =1
+ if len(args) != 0:
+ usageExit("No command-line arguments supported yet.")
+ except getopt.error, msg:
+ self.usageExit(msg)
+
+def loadModulesFromFolder(path, name='', subfolders=None):
+ """
+ Return a test suite composed of all the tests from modules in a folder.
+
+ Search for modules in directory `path`, beginning with `name`. If
+ `subfolders` is true, search subdirectories (also beginning with `name`)
+ recursively.
+ """
+ testLoader = unittest.defaultTestLoader
+ testSuite = unittest.TestSuite()
+ testModules = []
+ paths = [path]
+ while paths:
+ p = paths.pop(0)
+ if not p:
+ p = os.curdir
+ files = os.listdir(p)
+ for filename in files:
+ if filename.startswith(name):
+ fullpath = os.path.join(p, filename)
+ if filename.endswith('.py'):
+ testModules.append(fullpath)
+ elif subfolders and os.path.isdir(fullpath):
+ paths.append(fullpath)
+ sys.path.insert(0, '')
+ # Import modules and add their tests to the suite.
+ for modpath in testModules:
+ if debug:
+ print >>sys.stderr, "importing %s" % modpath
+ sys.path[0], filename = os.path.split(modpath)
+ modname = filename[:-3] # strip off the '.py'
+ module = __import__(modname)
+ # if there's a suite defined, incorporate its contents
+ try:
+ suite = getattr(module, 'suite')
+ except AttributeError:
+ # Look for individual tests
+ moduleTests = testLoader.loadTestsFromModule(module)
+ # unittest.TestSuite.addTests() doesn't work as advertised,
+ # as it can't load tests from another TestSuite, so we have
+ # to cheat:
+ testSuite.addTest(moduleTests)
+ continue
+ if type(suite) == types.FunctionType:
+ testSuite.addTest(suite())
+ elif type(suite) == types.InstanceType \
+ and isinstance(suite, unittest.TestSuite):
+ testSuite.addTest(suite)
+ else:
+ raise AssertionError, "don't understand suite (%s)" % modpath
+ return testSuite
+
+
+def main(suite=None):
+ """
+ Shared `main` for any individual test_* file.
+
+ suite -- TestSuite to run. If not specified, look for any globally defined
+ tests and run them.
+ """
+ parseArgs()
+ if suite is None:
+ # Load any globally defined tests.
+ suite = unittest.defaultTestLoader.loadTestsFromModule(
+ __import__('__main__'))
+ if debug:
+ print >>sys.stderr, "Debug: Suite=%s" % suite
+ testRunner = unittest.TextTestRunner(verbosity=verbosity)
+ # run suites (if we were called from test_all) or suite...
+ if type(suite) == type([]):
+ for s in suite:
+ testRunner.run(s)
+ else:
+ testRunner.run(suite)
diff --git a/test/alltests.py b/test/alltests.py
new file mode 100755
index 000000000..930cbdc1b
--- /dev/null
+++ b/test/alltests.py
@@ -0,0 +1,41 @@
+#!/usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+"""
+
+import time
+start = time.time()
+
+import sys, os
+
+
+class Tee:
+
+ """Write to a file and a stream (default: stdout) simultaneously."""
+
+ def __init__(self, filename, stream=sys.__stdout__):
+ self.file = open(filename, 'w')
+ self.stream = stream
+
+ def write(self, string):
+ string = string.encode('raw-unicode-escape')
+ self.stream.write(string)
+ self.file.write(string)
+
+# must redirect stderr *before* first import of unittest
+sys.stdout = sys.stderr = Tee('alltests.out')
+
+import UnitTestFolder
+
+
+if __name__ == '__main__':
+ path, script = os.path.split(sys.argv[0])
+ suite = UnitTestFolder.loadModulesFromFolder(path, 'test_', subfolders=1)
+ UnitTestFolder.main(suite)
+ finish = time.time()
+ print 'Elapsed time: %.3f seconds' % (finish - start)
diff --git a/test/difflib.py b/test/difflib.py
new file mode 100644
index 000000000..a41d4d5ba
--- /dev/null
+++ b/test/difflib.py
@@ -0,0 +1,1089 @@
+#! /usr/bin/env python
+
+"""
+Module difflib -- helpers for computing deltas between objects.
+
+Function get_close_matches(word, possibilities, n=3, cutoff=0.6):
+ Use SequenceMatcher to return list of the best "good enough" matches.
+
+Function ndiff(a, b):
+ Return a delta: the difference between `a` and `b` (lists of strings).
+
+Function restore(delta, which):
+ Return one of the two sequences that generated an ndiff delta.
+
+Class SequenceMatcher:
+ A flexible class for comparing pairs of sequences of any type.
+
+Class Differ:
+ For producing human-readable deltas from sequences of lines of text.
+"""
+
+__all__ = ['get_close_matches', 'ndiff', 'restore', 'SequenceMatcher',
+ 'Differ']
+
+TRACE = 0
+
+class SequenceMatcher:
+
+ """
+ SequenceMatcher is a flexible class for comparing pairs of sequences of
+ any type, so long as the sequence elements are hashable. The basic
+ algorithm predates, and is a little fancier than, an algorithm
+ published in the late 1980's by Ratcliff and Obershelp under the
+ hyperbolic name "gestalt pattern matching". The basic idea is to find
+ the longest contiguous matching subsequence that contains no "junk"
+ elements (R-O doesn't address junk). The same idea is then applied
+ recursively to the pieces of the sequences to the left and to the right
+ of the matching subsequence. This does not yield minimal edit
+ sequences, but does tend to yield matches that "look right" to people.
+
+ SequenceMatcher tries to compute a "human-friendly diff" between two
+ sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the
+ longest *contiguous* & junk-free matching subsequence. That's what
+ catches peoples' eyes. The Windows(tm) windiff has another interesting
+ notion, pairing up elements that appear uniquely in each sequence.
+ That, and the method here, appear to yield more intuitive difference
+ reports than does diff. This method appears to be the least vulnerable
+ to synching up on blocks of "junk lines", though (like blank lines in
+ ordinary text files, or maybe "
" lines in HTML files). That may be
+ because this is the only method of the 3 that has a *concept* of
+ "junk" .
+
+ Example, comparing two strings, and considering blanks to be "junk":
+
+ >>> s = SequenceMatcher(lambda x: x == " ",
+ ... "private Thread currentThread;",
+ ... "private volatile Thread currentThread;")
+ >>>
+
+ .ratio() returns a float in [0, 1], measuring the "similarity" of the
+ sequences. As a rule of thumb, a .ratio() value over 0.6 means the
+ sequences are close matches:
+
+ >>> print round(s.ratio(), 3)
+ 0.866
+ >>>
+
+ If you're only interested in where the sequences match,
+ .get_matching_blocks() is handy:
+
+ >>> for block in s.get_matching_blocks():
+ ... print "a[%d] and b[%d] match for %d elements" % block
+ a[0] and b[0] match for 8 elements
+ a[8] and b[17] match for 6 elements
+ a[14] and b[23] match for 15 elements
+ a[29] and b[38] match for 0 elements
+
+ Note that the last tuple returned by .get_matching_blocks() is always a
+ dummy, (len(a), len(b), 0), and this is the only case in which the last
+ tuple element (number of elements matched) is 0.
+
+ If you want to know how to change the first sequence into the second,
+ use .get_opcodes():
+
+ >>> for opcode in s.get_opcodes():
+ ... print "%6s a[%d:%d] b[%d:%d]" % opcode
+ equal a[0:8] b[0:8]
+ insert a[8:8] b[8:17]
+ equal a[8:14] b[17:23]
+ equal a[14:29] b[23:38]
+
+ See the Differ class for a fancy human-friendly file differencer, which
+ uses SequenceMatcher both to compare sequences of lines, and to compare
+ sequences of characters within similar (near-matching) lines.
+
+ See also function get_close_matches() in this module, which shows how
+ simple code building on SequenceMatcher can be used to do useful work.
+
+ Timing: Basic R-O is cubic time worst case and quadratic time expected
+ case. SequenceMatcher is quadratic time for the worst case and has
+ expected-case behavior dependent in a complicated way on how many
+ elements the sequences have in common; best case time is linear.
+
+ Methods:
+
+ __init__(isjunk=None, a='', b='')
+ Construct a SequenceMatcher.
+
+ set_seqs(a, b)
+ Set the two sequences to be compared.
+
+ set_seq1(a)
+ Set the first sequence to be compared.
+
+ set_seq2(b)
+ Set the second sequence to be compared.
+
+ find_longest_match(alo, ahi, blo, bhi)
+ Find longest matching block in a[alo:ahi] and b[blo:bhi].
+
+ get_matching_blocks()
+ Return list of triples describing matching subsequences.
+
+ get_opcodes()
+ Return list of 5-tuples describing how to turn a into b.
+
+ ratio()
+ Return a measure of the sequences' similarity (float in [0,1]).
+
+ quick_ratio()
+ Return an upper bound on .ratio() relatively quickly.
+
+ real_quick_ratio()
+ Return an upper bound on ratio() very quickly.
+ """
+
+ def __init__(self, isjunk=None, a='', b=''):
+ """Construct a SequenceMatcher.
+
+ Optional arg isjunk is None (the default), or a one-argument
+ function that takes a sequence element and returns true iff the
+ element is junk. None is equivalent to passing "lambda x: 0", i.e.
+ no elements are considered to be junk. For example, pass
+ lambda x: x in " \\t"
+ if you're comparing lines as sequences of characters, and don't
+ want to synch up on blanks or hard tabs.
+
+ Optional arg a is the first of two sequences to be compared. By
+ default, an empty string. The elements of a must be hashable. See
+ also .set_seqs() and .set_seq1().
+
+ Optional arg b is the second of two sequences to be compared. By
+ default, an empty string. The elements of b must be hashable. See
+ also .set_seqs() and .set_seq2().
+ """
+
+ # Members:
+ # a
+ # first sequence
+ # b
+ # second sequence; differences are computed as "what do
+ # we need to do to 'a' to change it into 'b'?"
+ # b2j
+ # for x in b, b2j[x] is a list of the indices (into b)
+ # at which x appears; junk elements do not appear
+ # b2jhas
+ # b2j.has_key
+ # fullbcount
+ # for x in b, fullbcount[x] == the number of times x
+ # appears in b; only materialized if really needed (used
+ # only for computing quick_ratio())
+ # matching_blocks
+ # a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k];
+ # ascending & non-overlapping in i and in j; terminated by
+ # a dummy (len(a), len(b), 0) sentinel
+ # opcodes
+ # a list of (tag, i1, i2, j1, j2) tuples, where tag is
+ # one of
+ # 'replace' a[i1:i2] should be replaced by b[j1:j2]
+ # 'delete' a[i1:i2] should be deleted
+ # 'insert' b[j1:j2] should be inserted
+ # 'equal' a[i1:i2] == b[j1:j2]
+ # isjunk
+ # a user-supplied function taking a sequence element and
+ # returning true iff the element is "junk" -- this has
+ # subtle but helpful effects on the algorithm, which I'll
+ # get around to writing up someday <0.9 wink>.
+ # DON'T USE! Only __chain_b uses this. Use isbjunk.
+ # isbjunk
+ # for x in b, isbjunk(x) == isjunk(x) but much faster;
+ # it's really the has_key method of a hidden dict.
+ # DOES NOT WORK for x in a!
+
+ self.isjunk = isjunk
+ self.a = self.b = None
+ self.set_seqs(a, b)
+
+ def set_seqs(self, a, b):
+ """Set the two sequences to be compared.
+
+ >>> s = SequenceMatcher()
+ >>> s.set_seqs("abcd", "bcde")
+ >>> s.ratio()
+ 0.75
+ """
+
+ self.set_seq1(a)
+ self.set_seq2(b)
+
+ def set_seq1(self, a):
+ """Set the first sequence to be compared.
+
+ The second sequence to be compared is not changed.
+
+ >>> s = SequenceMatcher(None, "abcd", "bcde")
+ >>> s.ratio()
+ 0.75
+ >>> s.set_seq1("bcde")
+ >>> s.ratio()
+ 1.0
+ >>>
+
+ SequenceMatcher computes and caches detailed information about the
+ second sequence, so if you want to compare one sequence S against
+ many sequences, use .set_seq2(S) once and call .set_seq1(x)
+ repeatedly for each of the other sequences.
+
+ See also set_seqs() and set_seq2().
+ """
+
+ if a is self.a:
+ return
+ self.a = a
+ self.matching_blocks = self.opcodes = None
+
+ def set_seq2(self, b):
+ """Set the second sequence to be compared.
+
+ The first sequence to be compared is not changed.
+
+ >>> s = SequenceMatcher(None, "abcd", "bcde")
+ >>> s.ratio()
+ 0.75
+ >>> s.set_seq2("abcd")
+ >>> s.ratio()
+ 1.0
+ >>>
+
+ SequenceMatcher computes and caches detailed information about the
+ second sequence, so if you want to compare one sequence S against
+ many sequences, use .set_seq2(S) once and call .set_seq1(x)
+ repeatedly for each of the other sequences.
+
+ See also set_seqs() and set_seq1().
+ """
+
+ if b is self.b:
+ return
+ self.b = b
+ self.matching_blocks = self.opcodes = None
+ self.fullbcount = None
+ self.__chain_b()
+
+ # For each element x in b, set b2j[x] to a list of the indices in
+ # b where x appears; the indices are in increasing order; note that
+ # the number of times x appears in b is len(b2j[x]) ...
+ # when self.isjunk is defined, junk elements don't show up in this
+ # map at all, which stops the central find_longest_match method
+ # from starting any matching block at a junk element ...
+ # also creates the fast isbjunk function ...
+ # note that this is only called when b changes; so for cross-product
+ # kinds of matches, it's best to call set_seq2 once, then set_seq1
+ # repeatedly
+
+ def __chain_b(self):
+ # Because isjunk is a user-defined (not C) function, and we test
+ # for junk a LOT, it's important to minimize the number of calls.
+ # Before the tricks described here, __chain_b was by far the most
+ # time-consuming routine in the whole module! If anyone sees
+ # Jim Roskind, thank him again for profile.py -- I never would
+ # have guessed that.
+ # The first trick is to build b2j ignoring the possibility
+ # of junk. I.e., we don't call isjunk at all yet. Throwing
+ # out the junk later is much cheaper than building b2j "right"
+ # from the start.
+ b = self.b
+ self.b2j = b2j = {}
+ self.b2jhas = b2jhas = b2j.has_key
+ for i in xrange(len(b)):
+ elt = b[i]
+ if b2jhas(elt):
+ b2j[elt].append(i)
+ else:
+ b2j[elt] = [i]
+
+ # Now b2j.keys() contains elements uniquely, and especially when
+ # the sequence is a string, that's usually a good deal smaller
+ # than len(string). The difference is the number of isjunk calls
+ # saved.
+ isjunk, junkdict = self.isjunk, {}
+ if isjunk:
+ for elt in b2j.keys():
+ if isjunk(elt):
+ junkdict[elt] = 1 # value irrelevant; it's a set
+ del b2j[elt]
+
+ # Now for x in b, isjunk(x) == junkdict.has_key(x), but the
+ # latter is much faster. Note too that while there may be a
+ # lot of junk in the sequence, the number of *unique* junk
+ # elements is probably small. So the memory burden of keeping
+ # this dict alive is likely trivial compared to the size of b2j.
+ self.isbjunk = junkdict.has_key
+
+ def find_longest_match(self, alo, ahi, blo, bhi):
+ """Find longest matching block in a[alo:ahi] and b[blo:bhi].
+
+ If isjunk is not defined:
+
+ Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where
+ alo <= i <= i+k <= ahi
+ blo <= j <= j+k <= bhi
+ and for all (i',j',k') meeting those conditions,
+ k >= k'
+ i <= i'
+ and if i == i', j <= j'
+
+ In other words, of all maximal matching blocks, return one that
+ starts earliest in a, and of all those maximal matching blocks that
+ start earliest in a, return the one that starts earliest in b.
+
+ >>> s = SequenceMatcher(None, " abcd", "abcd abcd")
+ >>> s.find_longest_match(0, 5, 0, 9)
+ (0, 4, 5)
+
+ If isjunk is defined, first the longest matching block is
+ determined as above, but with the additional restriction that no
+ junk element appears in the block. Then that block is extended as
+ far as possible by matching (only) junk elements on both sides. So
+ the resulting block never matches on junk except as identical junk
+ happens to be adjacent to an "interesting" match.
+
+ Here's the same example as before, but considering blanks to be
+ junk. That prevents " abcd" from matching the " abcd" at the tail
+ end of the second sequence directly. Instead only the "abcd" can
+ match, and matches the leftmost "abcd" in the second sequence:
+
+ >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
+ >>> s.find_longest_match(0, 5, 0, 9)
+ (1, 0, 4)
+
+ If no blocks match, return (alo, blo, 0).
+
+ >>> s = SequenceMatcher(None, "ab", "c")
+ >>> s.find_longest_match(0, 2, 0, 1)
+ (0, 0, 0)
+ """
+
+ # CAUTION: stripping common prefix or suffix would be incorrect.
+ # E.g.,
+ # ab
+ # acab
+ # Longest matching block is "ab", but if common prefix is
+ # stripped, it's "a" (tied with "b"). UNIX(tm) diff does so
+ # strip, so ends up claiming that ab is changed to acab by
+ # inserting "ca" in the middle. That's minimal but unintuitive:
+ # "it's obvious" that someone inserted "ac" at the front.
+ # Windiff ends up at the same place as diff, but by pairing up
+ # the unique 'b's and then matching the first two 'a's.
+
+ a, b, b2j, isbjunk = self.a, self.b, self.b2j, self.isbjunk
+ besti, bestj, bestsize = alo, blo, 0
+ # find longest junk-free match
+ # during an iteration of the loop, j2len[j] = length of longest
+ # junk-free match ending with a[i-1] and b[j]
+ j2len = {}
+ nothing = []
+ for i in xrange(alo, ahi):
+ # look at all instances of a[i] in b; note that because
+ # b2j has no junk keys, the loop is skipped if a[i] is junk
+ j2lenget = j2len.get
+ newj2len = {}
+ for j in b2j.get(a[i], nothing):
+ # a[i] matches b[j]
+ if j < blo:
+ continue
+ if j >= bhi:
+ break
+ k = newj2len[j] = j2lenget(j-1, 0) + 1
+ if k > bestsize:
+ besti, bestj, bestsize = i-k+1, j-k+1, k
+ j2len = newj2len
+
+ # Now that we have a wholly interesting match (albeit possibly
+ # empty!), we may as well suck up the matching junk on each
+ # side of it too. Can't think of a good reason not to, and it
+ # saves post-processing the (possibly considerable) expense of
+ # figuring out what to do with it. In the case of an empty
+ # interesting match, this is clearly the right thing to do,
+ # because no other kind of match is possible in the regions.
+ while besti > alo and bestj > blo and \
+ isbjunk(b[bestj-1]) and \
+ a[besti-1] == b[bestj-1]:
+ besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
+ while besti+bestsize < ahi and bestj+bestsize < bhi and \
+ isbjunk(b[bestj+bestsize]) and \
+ a[besti+bestsize] == b[bestj+bestsize]:
+ bestsize = bestsize + 1
+
+ if TRACE:
+ print "get_matching_blocks", alo, ahi, blo, bhi
+ print " returns", besti, bestj, bestsize
+ return besti, bestj, bestsize
+
+ def get_matching_blocks(self):
+ """Return list of triples describing matching subsequences.
+
+ Each triple is of the form (i, j, n), and means that
+ a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in
+ i and in j.
+
+ The last triple is a dummy, (len(a), len(b), 0), and is the only
+ triple with n==0.
+
+ >>> s = SequenceMatcher(None, "abxcd", "abcd")
+ >>> s.get_matching_blocks()
+ [(0, 0, 2), (3, 2, 2), (5, 4, 0)]
+ """
+
+ if self.matching_blocks is not None:
+ return self.matching_blocks
+ self.matching_blocks = []
+ la, lb = len(self.a), len(self.b)
+ self.__helper(0, la, 0, lb, self.matching_blocks)
+ self.matching_blocks.append( (la, lb, 0) )
+ if TRACE:
+ print '*** matching blocks', self.matching_blocks
+ return self.matching_blocks
+
+ # builds list of matching blocks covering a[alo:ahi] and
+ # b[blo:bhi], appending them in increasing order to answer
+
+ def __helper(self, alo, ahi, blo, bhi, answer):
+ i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi)
+ # a[alo:i] vs b[blo:j] unknown
+ # a[i:i+k] same as b[j:j+k]
+ # a[i+k:ahi] vs b[j+k:bhi] unknown
+ if k:
+ if alo < i and blo < j:
+ self.__helper(alo, i, blo, j, answer)
+ answer.append(x)
+ if i+k < ahi and j+k < bhi:
+ self.__helper(i+k, ahi, j+k, bhi, answer)
+
+ def get_opcodes(self):
+ """Return list of 5-tuples describing how to turn a into b.
+
+ Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple
+ has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the
+ tuple preceding it, and likewise for j1 == the previous j2.
+
+ The tags are strings, with these meanings:
+
+ 'replace': a[i1:i2] should be replaced by b[j1:j2]
+ 'delete': a[i1:i2] should be deleted.
+ Note that j1==j2 in this case.
+ 'insert': b[j1:j2] should be inserted at a[i1:i1].
+ Note that i1==i2 in this case.
+ 'equal': a[i1:i2] == b[j1:j2]
+
+ >>> a = "qabxcd"
+ >>> b = "abycdf"
+ >>> s = SequenceMatcher(None, a, b)
+ >>> for tag, i1, i2, j1, j2 in s.get_opcodes():
+ ... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
+ ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
+ delete a[0:1] (q) b[0:0] ()
+ equal a[1:3] (ab) b[0:2] (ab)
+ replace a[3:4] (x) b[2:3] (y)
+ equal a[4:6] (cd) b[3:5] (cd)
+ insert a[6:6] () b[5:6] (f)
+ """
+
+ if self.opcodes is not None:
+ return self.opcodes
+ i = j = 0
+ self.opcodes = answer = []
+ for ai, bj, size in self.get_matching_blocks():
+ # invariant: we've pumped out correct diffs to change
+ # a[:i] into b[:j], and the next matching block is
+ # a[ai:ai+size] == b[bj:bj+size]. So we need to pump
+ # out a diff to change a[i:ai] into b[j:bj], pump out
+ # the matching block, and move (i,j) beyond the match
+ tag = ''
+ if i < ai and j < bj:
+ tag = 'replace'
+ elif i < ai:
+ tag = 'delete'
+ elif j < bj:
+ tag = 'insert'
+ if tag:
+ answer.append( (tag, i, ai, j, bj) )
+ i, j = ai+size, bj+size
+ # the list of matching blocks is terminated by a
+ # sentinel with size 0
+ if size:
+ answer.append( ('equal', ai, i, bj, j) )
+ return answer
+
+ def ratio(self):
+ """Return a measure of the sequences' similarity (float in [0,1]).
+
+ Where T is the total number of elements in both sequences, and
+ M is the number of matches, this is 2,0*M / T.
+ Note that this is 1 if the sequences are identical, and 0 if
+ they have nothing in common.
+
+ .ratio() is expensive to compute if you haven't already computed
+ .get_matching_blocks() or .get_opcodes(), in which case you may
+ want to try .quick_ratio() or .real_quick_ratio() first to get an
+ upper bound.
+
+ >>> s = SequenceMatcher(None, "abcd", "bcde")
+ >>> s.ratio()
+ 0.75
+ >>> s.quick_ratio()
+ 0.75
+ >>> s.real_quick_ratio()
+ 1.0
+ """
+
+ matches = reduce(lambda sum, triple: sum + triple[-1],
+ self.get_matching_blocks(), 0)
+ return 2.0 * matches / (len(self.a) + len(self.b))
+
+ def quick_ratio(self):
+ """Return an upper bound on ratio() relatively quickly.
+
+ This isn't defined beyond that it is an upper bound on .ratio(), and
+ is faster to compute.
+ """
+
+ # viewing a and b as multisets, set matches to the cardinality
+ # of their intersection; this counts the number of matches
+ # without regard to order, so is clearly an upper bound
+ if self.fullbcount is None:
+ self.fullbcount = fullbcount = {}
+ for elt in self.b:
+ fullbcount[elt] = fullbcount.get(elt, 0) + 1
+ fullbcount = self.fullbcount
+ # avail[x] is the number of times x appears in 'b' less the
+ # number of times we've seen it in 'a' so far ... kinda
+ avail = {}
+ availhas, matches = avail.has_key, 0
+ for elt in self.a:
+ if availhas(elt):
+ numb = avail[elt]
+ else:
+ numb = fullbcount.get(elt, 0)
+ avail[elt] = numb - 1
+ if numb > 0:
+ matches = matches + 1
+ return 2.0 * matches / (len(self.a) + len(self.b))
+
+ def real_quick_ratio(self):
+ """Return an upper bound on ratio() very quickly.
+
+ This isn't defined beyond that it is an upper bound on .ratio(), and
+ is faster to compute than either .ratio() or .quick_ratio().
+ """
+
+ la, lb = len(self.a), len(self.b)
+ # can't have more matches than the number of elements in the
+ # shorter sequence
+ return 2.0 * min(la, lb) / (la + lb)
+
+def get_close_matches(word, possibilities, n=3, cutoff=0.6):
+ """Use SequenceMatcher to return list of the best "good enough" matches.
+
+ word is a sequence for which close matches are desired (typically a
+ string).
+
+ possibilities is a list of sequences against which to match word
+ (typically a list of strings).
+
+ Optional arg n (default 3) is the maximum number of close matches to
+ return. n must be > 0.
+
+ Optional arg cutoff (default 0.6) is a float in [0, 1]. Possibilities
+ that don't score at least that similar to word are ignored.
+
+ The best (no more than n) matches among the possibilities are returned
+ in a list, sorted by similarity score, most similar first.
+
+ >>> get_close_matches("appel", ["ape", "apple", "peach", "puppy"])
+ ['apple', 'ape']
+ >>> import keyword as _keyword
+ >>> get_close_matches("wheel", _keyword.kwlist)
+ ['while']
+ >>> get_close_matches("apple", _keyword.kwlist)
+ []
+ >>> get_close_matches("accept", _keyword.kwlist)
+ ['except']
+ """
+
+ if not n > 0:
+ raise ValueError("n must be > 0: " + `n`)
+ if not 0.0 <= cutoff <= 1.0:
+ raise ValueError("cutoff must be in [0.0, 1.0]: " + `cutoff`)
+ result = []
+ s = SequenceMatcher()
+ s.set_seq2(word)
+ for x in possibilities:
+ s.set_seq1(x)
+ if s.real_quick_ratio() >= cutoff and \
+ s.quick_ratio() >= cutoff and \
+ s.ratio() >= cutoff:
+ result.append((s.ratio(), x))
+ # Sort by score.
+ result.sort()
+ # Retain only the best n.
+ result = result[-n:]
+ # Move best-scorer to head of list.
+ result.reverse()
+ # Strip scores.
+ return [x for score, x in result]
+
+
+def _count_leading(line, ch):
+ """
+ Return number of `ch` characters at the start of `line`.
+
+ Example:
+
+ >>> _count_leading(' abc', ' ')
+ 3
+ """
+
+ i, n = 0, len(line)
+ while i < n and line[i] == ch:
+ i += 1
+ return i
+
+class Differ:
+ r"""
+ Differ is a class for comparing sequences of lines of text, and
+ producing human-readable differences or deltas. Differ uses
+ SequenceMatcher both to compare sequences of lines, and to compare
+ sequences of characters within similar (near-matching) lines.
+
+ Each line of a Differ delta begins with a two-letter code:
+
+ '- ' line unique to sequence 1
+ '+ ' line unique to sequence 2
+ ' ' line common to both sequences
+ '? ' line not present in either input sequence
+
+ Lines beginning with '? ' attempt to guide the eye to intraline
+ differences, and were not present in either input sequence. These lines
+ can be confusing if the sequences contain tab characters.
+
+ Note that Differ makes no claim to produce a *minimal* diff. To the
+ contrary, minimal diffs are often counter-intuitive, because they synch
+ up anywhere possible, sometimes accidental matches 100 pages apart.
+ Restricting synch points to contiguous matches preserves some notion of
+ locality, at the occasional cost of producing a longer diff.
+
+ Example: Comparing two texts.
+
+ First we set up the texts, sequences of individual single-line strings
+ ending with newlines (such sequences can also be obtained from the
+ `readlines()` method of file-like objects):
+
+ >>> text1 = ''' 1. Beautiful is better than ugly.
+ ... 2. Explicit is better than implicit.
+ ... 3. Simple is better than complex.
+ ... 4. Complex is better than complicated.
+ ... '''.splitlines(1)
+ >>> len(text1)
+ 4
+ >>> text1[0][-1]
+ '\n'
+ >>> text2 = ''' 1. Beautiful is better than ugly.
+ ... 3. Simple is better than complex.
+ ... 4. Complicated is better than complex.
+ ... 5. Flat is better than nested.
+ ... '''.splitlines(1)
+
+ Next we instantiate a Differ object:
+
+ >>> d = Differ()
+
+ Note that when instantiating a Differ object we may pass functions to
+ filter out line and character 'junk'. See Differ.__init__ for details.
+
+ Finally, we compare the two:
+
+ >>> result = d.compare(text1, text2)
+
+ 'result' is a list of strings, so let's pretty-print it:
+
+ >>> from pprint import pprint as _pprint
+ >>> _pprint(result)
+ [' 1. Beautiful is better than ugly.\n',
+ '- 2. Explicit is better than implicit.\n',
+ '- 3. Simple is better than complex.\n',
+ '+ 3. Simple is better than complex.\n',
+ '? ++\n',
+ '- 4. Complex is better than complicated.\n',
+ '? ^ ---- ^\n',
+ '+ 4. Complicated is better than complex.\n',
+ '? ++++ ^ ^\n',
+ '+ 5. Flat is better than nested.\n']
+
+ As a single multi-line string it looks like this:
+
+ >>> print ''.join(result),
+ 1. Beautiful is better than ugly.
+ - 2. Explicit is better than implicit.
+ - 3. Simple is better than complex.
+ + 3. Simple is better than complex.
+ ? ++
+ - 4. Complex is better than complicated.
+ ? ^ ---- ^
+ + 4. Complicated is better than complex.
+ ? ++++ ^ ^
+ + 5. Flat is better than nested.
+
+ Methods:
+
+ __init__(linejunk=None, charjunk=None)
+ Construct a text differencer, with optional filters.
+
+ compare(a, b)
+ Compare two sequences of lines; return the resulting delta (list).
+ """
+
+ def __init__(self, linejunk=None, charjunk=None):
+ """
+ Construct a text differencer, with optional filters.
+
+ The two optional keyword parameters are for filter functions:
+
+ - `linejunk`: A function that should accept a single string argument,
+ and return true iff the string is junk. The module-level function
+ `IS_LINE_JUNK` may be used to filter out lines without visible
+ characters, except for at most one splat ('#').
+
+ - `charjunk`: A function that should accept a string of length 1. The
+ module-level function `IS_CHARACTER_JUNK` may be used to filter out
+ whitespace characters (a blank or tab; **note**: bad idea to include
+ newline in this!).
+ """
+
+ self.linejunk = linejunk
+ self.charjunk = charjunk
+ self.results = []
+
+ def compare(self, a, b):
+ r"""
+ Compare two sequences of lines; return the resulting delta (list).
+
+ Each sequence must contain individual single-line strings ending with
+ newlines. Such sequences can be obtained from the `readlines()` method
+ of file-like objects. The list returned is also made up of
+ newline-terminated strings, ready to be used with the `writelines()`
+ method of a file-like object.
+
+ Example:
+
+ >>> print ''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(1),
+ ... 'ore\ntree\nemu\n'.splitlines(1))),
+ - one
+ ? ^
+ + ore
+ ? ^
+ - two
+ - three
+ ? -
+ + tree
+ + emu
+ """
+
+ cruncher = SequenceMatcher(self.linejunk, a, b)
+ for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
+ if tag == 'replace':
+ self._fancy_replace(a, alo, ahi, b, blo, bhi)
+ elif tag == 'delete':
+ self._dump('-', a, alo, ahi)
+ elif tag == 'insert':
+ self._dump('+', b, blo, bhi)
+ elif tag == 'equal':
+ self._dump(' ', a, alo, ahi)
+ else:
+ raise ValueError, 'unknown tag ' + `tag`
+ results = self.results
+ self.results = []
+ return results
+
+ def _dump(self, tag, x, lo, hi):
+ """Store comparison results for a same-tagged range."""
+ for i in xrange(lo, hi):
+ self.results.append('%s %s' % (tag, x[i]))
+
+ def _plain_replace(self, a, alo, ahi, b, blo, bhi):
+ assert alo < ahi and blo < bhi
+ # dump the shorter block first -- reduces the burden on short-term
+ # memory if the blocks are of very different sizes
+ if bhi - blo < ahi - alo:
+ self._dump('+', b, blo, bhi)
+ self._dump('-', a, alo, ahi)
+ else:
+ self._dump('-', a, alo, ahi)
+ self._dump('+', b, blo, bhi)
+
+ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
+ r"""
+ When replacing one block of lines with another, search the blocks
+ for *similar* lines; the best-matching pair (if any) is used as a
+ synch point, and intraline difference marking is done on the
+ similar pair. Lots of work, but often worth it.
+
+ Example:
+
+ >>> d = Differ()
+ >>> d._fancy_replace(['abcDefghiJkl\n'], 0, 1, ['abcdefGhijkl\n'], 0, 1)
+ >>> print ''.join(d.results),
+ - abcDefghiJkl
+ ? ^ ^ ^
+ + abcdefGhijkl
+ ? ^ ^ ^
+ """
+
+ if TRACE:
+ self.results.append('*** _fancy_replace %s %s %s %s\n'
+ % (alo, ahi, blo, bhi))
+ self._dump('>', a, alo, ahi)
+ self._dump('<', b, blo, bhi)
+
+ # don't synch up unless the lines have a similarity score of at
+ # least cutoff; best_ratio tracks the best score seen so far
+ best_ratio, cutoff = 0.74, 0.75
+ cruncher = SequenceMatcher(self.charjunk)
+ eqi, eqj = None, None # 1st indices of equal lines (if any)
+
+ # search for the pair that matches best without being identical
+ # (identical lines must be junk lines, & we don't want to synch up
+ # on junk -- unless we have to)
+ for j in xrange(blo, bhi):
+ bj = b[j]
+ cruncher.set_seq2(bj)
+ for i in xrange(alo, ahi):
+ ai = a[i]
+ if ai == bj:
+ if eqi is None:
+ eqi, eqj = i, j
+ continue
+ cruncher.set_seq1(ai)
+ # computing similarity is expensive, so use the quick
+ # upper bounds first -- have seen this speed up messy
+ # compares by a factor of 3.
+ # note that ratio() is only expensive to compute the first
+ # time it's called on a sequence pair; the expensive part
+ # of the computation is cached by cruncher
+ if cruncher.real_quick_ratio() > best_ratio and \
+ cruncher.quick_ratio() > best_ratio and \
+ cruncher.ratio() > best_ratio:
+ best_ratio, best_i, best_j = cruncher.ratio(), i, j
+ if best_ratio < cutoff:
+ # no non-identical "pretty close" pair
+ if eqi is None:
+ # no identical pair either -- treat it as a straight replace
+ self._plain_replace(a, alo, ahi, b, blo, bhi)
+ return
+ # no close pair, but an identical pair -- synch up on that
+ best_i, best_j, best_ratio = eqi, eqj, 1.0
+ else:
+ # there's a close pair, so forget the identical pair (if any)
+ eqi = None
+
+ # a[best_i] very similar to b[best_j]; eqi is None iff they're not
+ # identical
+ if TRACE:
+ self.results.append('*** best_ratio %s %s %s %s\n'
+ % (best_ratio, best_i, best_j))
+ self._dump('>', a, best_i, best_i+1)
+ self._dump('<', b, best_j, best_j+1)
+
+ # pump out diffs from before the synch point
+ self._fancy_helper(a, alo, best_i, b, blo, best_j)
+
+ # do intraline marking on the synch pair
+ aelt, belt = a[best_i], b[best_j]
+ if eqi is None:
+ # pump out a '-', '?', '+', '?' quad for the synched lines
+ atags = btags = ""
+ cruncher.set_seqs(aelt, belt)
+ for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes():
+ la, lb = ai2 - ai1, bj2 - bj1
+ if tag == 'replace':
+ atags += '^' * la
+ btags += '^' * lb
+ elif tag == 'delete':
+ atags += '-' * la
+ elif tag == 'insert':
+ btags += '+' * lb
+ elif tag == 'equal':
+ atags += ' ' * la
+ btags += ' ' * lb
+ else:
+ raise ValueError, 'unknown tag ' + `tag`
+ self._qformat(aelt, belt, atags, btags)
+ else:
+ # the synch pair is identical
+ self.results.append(' ' + aelt)
+
+ # pump out diffs from after the synch point
+ self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
+
+ def _fancy_helper(self, a, alo, ahi, b, blo, bhi):
+ if alo < ahi:
+ if blo < bhi:
+ self._fancy_replace(a, alo, ahi, b, blo, bhi)
+ else:
+ self._dump('-', a, alo, ahi)
+ elif blo < bhi:
+ self._dump('+', b, blo, bhi)
+
+ def _qformat(self, aline, bline, atags, btags):
+ r"""
+ Format "?" output and deal with leading tabs.
+
+ Example:
+
+ >>> d = Differ()
+ >>> d._qformat('\tabcDefghiJkl\n', '\t\tabcdefGhijkl\n',
+ ... ' ^ ^ ^ ', '+ ^ ^ ^ ')
+ >>> for line in d.results: print repr(line)
+ ...
+ '- \tabcDefghiJkl\n'
+ '? \t ^ ^ ^\n'
+ '+ \t\tabcdefGhijkl\n'
+ '? \t ^ ^ ^\n'
+ """
+
+ # Can hurt, but will probably help most of the time.
+ common = min(_count_leading(aline, "\t"),
+ _count_leading(bline, "\t"))
+ common = min(common, _count_leading(atags[:common], " "))
+ atags = atags[common:].rstrip()
+ btags = btags[common:].rstrip()
+
+ self.results.append("- " + aline)
+ if atags:
+ self.results.append("? %s%s\n" % ("\t" * common, atags))
+
+ self.results.append("+ " + bline)
+ if btags:
+ self.results.append("? %s%s\n" % ("\t" * common, btags))
+
+# With respect to junk, an earlier version of ndiff simply refused to
+# *start* a match with a junk element. The result was cases like this:
+# before: private Thread currentThread;
+# after: private volatile Thread currentThread;
+# If you consider whitespace to be junk, the longest contiguous match
+# not starting with junk is "e Thread currentThread". So ndiff reported
+# that "e volatil" was inserted between the 't' and the 'e' in "private".
+# While an accurate view, to people that's absurd. The current version
+# looks for matching blocks that are entirely junk-free, then extends the
+# longest one of those as far as possible but only with matching junk.
+# So now "currentThread" is matched, then extended to suck up the
+# preceding blank; then "private" is matched, and extended to suck up the
+# following blank; then "Thread" is matched; and finally ndiff reports
+# that "volatile " was inserted before "Thread". The only quibble
+# remaining is that perhaps it was really the case that " volatile"
+# was inserted after "private". I can live with that .
+
+import re
+
+def IS_LINE_JUNK(line, pat=re.compile(r"\s*#?\s*$").match):
+ r"""
+ Return 1 for ignorable line: iff `line` is blank or contains a single '#'.
+
+ Examples:
+
+ >>> IS_LINE_JUNK('\n')
+ 1
+ >>> IS_LINE_JUNK(' # \n')
+ 1
+ >>> IS_LINE_JUNK('hello\n')
+ 0
+ """
+
+ return pat(line) is not None
+
+def IS_CHARACTER_JUNK(ch, ws=" \t"):
+ r"""
+ Return 1 for ignorable character: iff `ch` is a space or tab.
+
+ Examples:
+
+ >>> IS_CHARACTER_JUNK(' ')
+ 1
+ >>> IS_CHARACTER_JUNK('\t')
+ 1
+ >>> IS_CHARACTER_JUNK('\n')
+ 0
+ >>> IS_CHARACTER_JUNK('x')
+ 0
+ """
+
+ return ch in ws
+
+del re
+
+def ndiff(a, b, linejunk=IS_LINE_JUNK, charjunk=IS_CHARACTER_JUNK):
+ r"""
+ Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
+
+ Optional keyword parameters `linejunk` and `charjunk` are for filter
+ functions (or None):
+
+ - linejunk: A function that should accept a single string argument, and
+ return true iff the string is junk. The default is module-level function
+ IS_LINE_JUNK, which filters out lines without visible characters, except
+ for at most one splat ('#').
+
+ - charjunk: A function that should accept a string of length 1. The
+ default is module-level function IS_CHARACTER_JUNK, which filters out
+ whitespace characters (a blank or tab; note: bad idea to include newline
+ in this!).
+
+ Tools/scripts/ndiff.py is a command-line front-end to this function.
+
+ Example:
+
+ >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
+ ... 'ore\ntree\nemu\n'.splitlines(1))
+ >>> print ''.join(diff),
+ - one
+ ? ^
+ + ore
+ ? ^
+ - two
+ - three
+ ? -
+ + tree
+ + emu
+ """
+ return Differ(linejunk, charjunk).compare(a, b)
+
+def restore(delta, which):
+ r"""
+ Return one of the two sequences that generated a delta.
+
+ Given a `delta` produced by `Differ.compare()` or `ndiff()`, extract
+ lines originating from file 1 or 2 (parameter `which`), stripping off line
+ prefixes.
+
+ Examples:
+
+ >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
+ ... 'ore\ntree\nemu\n'.splitlines(1))
+ >>> print ''.join(restore(diff, 1)),
+ one
+ two
+ three
+ >>> print ''.join(restore(diff, 2)),
+ ore
+ tree
+ emu
+ """
+ try:
+ tag = {1: "- ", 2: "+ "}[int(which)]
+ except KeyError:
+ raise ValueError, ('unknown delta choice (must be 1 or 2): %r'
+ % which)
+ prefixes = (" ", tag)
+ results = []
+ for line in delta:
+ if line[:2] in prefixes:
+ results.append(line[2:])
+ return results
+
+def _test():
+ import doctest, difflib
+ return doctest.testmod(difflib)
+
+if __name__ == "__main__":
+ _test()
diff --git a/test/test_nodes.py b/test/test_nodes.py
new file mode 100755
index 000000000..15e633357
--- /dev/null
+++ b/test/test_nodes.py
@@ -0,0 +1,83 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Test module for nodes.py.
+"""
+
+import unittest
+from DocutilsTestSupport import nodes
+
+debug = 0
+
+
+class TextTests(unittest.TestCase):
+
+ def setUp(self):
+ self.text = nodes.Text('Line 1.\nLine 2.')
+
+ def test_repr(self):
+ self.assertEquals(repr(self.text), r"<#text: 'Line 1.\nLine 2.'>")
+
+ def test_str(self):
+ self.assertEquals(str(self.text), 'Line 1.\nLine 2.')
+
+ def test_asdom(self):
+ dom = self.text.asdom()
+ self.assertEquals(dom.toxml(), 'Line 1.\nLine 2.')
+ dom.unlink()
+
+ def test_astext(self):
+ self.assertEquals(self.text.astext(), 'Line 1.\nLine 2.')
+
+ def test_pformat(self):
+ self.assertEquals(self.text.pformat(), 'Line 1.\nLine 2.\n')
+
+
+class ElementTests(unittest.TestCase):
+
+ def test_empty(self):
+ element = nodes.Element()
+ self.assertEquals(repr(element), '')
+ self.assertEquals(str(element), '')
+ dom = element.asdom()
+ self.assertEquals(dom.toxml(), '')
+ dom.unlink()
+ element['attr'] = '1'
+ self.assertEquals(repr(element), '')
+ self.assertEquals(str(element), '')
+ dom = element.asdom()
+ self.assertEquals(dom.toxml(), '')
+ dom.unlink()
+ self.assertEquals(element.pformat(), '\n')
+
+ def test_withtext(self):
+ element = nodes.Element('text\nmore', nodes.Text('text\nmore'))
+ self.assertEquals(repr(element), r">")
+ self.assertEquals(str(element), 'text\nmore')
+ dom = element.asdom()
+ self.assertEquals(dom.toxml(), 'text\nmore')
+ dom.unlink()
+ element['attr'] = '1'
+ self.assertEquals(repr(element), r">")
+ self.assertEquals(str(element),
+ 'text\nmore')
+ dom = element.asdom()
+ self.assertEquals(dom.toxml(),
+ 'text\nmore')
+ dom.unlink()
+ self.assertEquals(element.pformat(),
+"""\
+
+ text
+ more
+""")
+
+
+if __name__ == '__main__':
+ unittest.main()
diff --git a/test/test_parsers/test_rst/test_TableParser.py b/test/test_parsers/test_rst/test_TableParser.py
new file mode 100755
index 000000000..ed6083d50
--- /dev/null
+++ b/test/test_parsers/test_rst/test_TableParser.py
@@ -0,0 +1,197 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Tests for states.py.
+"""
+
+import DocutilsTestSupport
+
+def suite():
+ s = DocutilsTestSupport.TableParserTestSuite()
+ s.generateTests(totest)
+ return s
+
+totest = {}
+
+totest['tables'] = [
+["""\
++-------------------------------------+
+| A table with one cell and one line. |
++-------------------------------------+
+""",
+[(0, 0, 2, 38, ['A table with one cell and one line.'])],
+([37],
+ [],
+ [[(0, 0, 1, ['A table with one cell and one line.'])]])],
+["""\
++--------------+--------------+
+| A table with | two columns. |
++--------------+--------------+
+""",
+[(0, 0, 2, 15, ['A table with']),
+ (0, 15, 2, 30, ['two columns.'])],
+([14, 14],
+ [],
+ [[(0, 0, 1, ['A table with']),
+ (0, 0, 1, ['two columns.'])]])],
+["""\
++--------------+-------------+
+| A table with | two columns |
++--------------+-------------+
+| and | two rows. |
++--------------+-------------+
+""",
+[(0, 0, 2, 15, ['A table with']),
+ (0, 15, 2, 29, ['two columns']),
+ (2, 0, 4, 15, ['and']),
+ (2, 15, 4, 29, ['two rows.'])],
+([14, 13],
+ [],
+ [[(0, 0, 1, ['A table with']),
+ (0, 0, 1, ['two columns'])],
+ [(0, 0, 3, ['and']),
+ (0, 0, 3, ['two rows.'])]])],
+["""\
++--------------------------+
+| A table with three rows, |
++------------+-------------+
+| and two | columns. |
++------------+-------------+
+| First and last rows |
+| contain column spans. |
++--------------------------+
+""",
+[(0, 0, 2, 27, ['A table with three rows,']),
+ (2, 0, 4, 13, ['and two']),
+ (2, 13, 4, 27, ['columns.']),
+ (4, 0, 7, 27, ['First and last rows', 'contain column spans.'])],
+([12, 13],
+ [],
+ [[(0, 1, 1, ['A table with three rows,']),
+ None],
+ [(0, 0, 3, ['and two']),
+ (0, 0, 3, ['columns.'])],
+ [(0, 1, 5, ['First and last rows', 'contain column spans.']),
+ None]])],
+["""\
++------------+-------------+---------------+
+| A table | two rows in | and row spans |
+| with three +-------------+ to left and |
+| columns, | the middle, | right. |
++------------+-------------+---------------+
+""",
+[(0, 0, 4, 13, ['A table', 'with three', 'columns,']),
+ (0, 13, 2, 27, ['two rows in']),
+ (0, 27, 4, 43, ['and row spans', 'to left and', 'right.']),
+ (2, 13, 4, 27, ['the middle,'])],
+([12, 13, 15],
+ [],
+ [[(1, 0, 1, ['A table', 'with three', 'columns,']),
+ (0, 0, 1, ['two rows in']),
+ (1, 0, 1, ['and row spans', 'to left and', 'right.'])],
+ [None,
+ (0, 0, 3, ['the middle,']),
+ None]])],
+["""\
++------------+-------------+---------------+
+| A table | | two rows in | and funny |
+| with 3 +--+-------------+-+ stuff. |
+| columns, | the middle, | | |
++------------+-------------+---------------+
+""",
+[(0, 0, 4, 13, ['A table |', 'with 3 +--', 'columns,']),
+ (0, 13, 2, 27, ['two rows in']),
+ (0, 27, 4, 43, [' and funny', '-+ stuff.', ' |']),
+ (2, 13, 4, 27, ['the middle,'])],
+([12, 13, 15],
+ [],
+ [[(1, 0, 1, ['A table |', 'with 3 +--', 'columns,']),
+ (0, 0, 1, ['two rows in']),
+ (1, 0, 1, [' and funny', '-+ stuff.', ' |'])],
+ [None,
+ (0, 0, 3, ['the middle,']),
+ None]])],
+["""\
++-----------+-------------------------+
+| W/NW cell | N/NE cell |
+| +-------------+-----------+
+| | Middle cell | E/SE cell |
++-----------+-------------+ |
+| S/SE cell | |
++-------------------------+-----------+
+""",
+[(0, 0, 4, 12, ['W/NW cell', '', '']),
+ (0, 12, 2, 38, ['N/NE cell']),
+ (2, 12, 4, 26, ['Middle cell']),
+ (2, 26, 6, 38, ['E/SE cell', '', '']),
+ (4, 0, 6, 26, ['S/SE cell'])],
+([11, 13, 11],
+ [],
+ [[(1, 0, 1, ['W/NW cell', '', '']),
+ (0, 1, 1, ['N/NE cell']),
+ None],
+ [None,
+ (0, 0, 3, ['Middle cell']),
+ (1, 0, 3, ['E/SE cell', '', ''])],
+ [(0, 1, 5, ['S/SE cell']),
+ None,
+ None]])],
+["""\
++--------------+-------------+
+| A bad table. | |
++--------------+ |
+| Cells must be rectangles. |
++----------------------------+
+""",
+'TableMarkupError: Malformed table; parse incomplete.',
+'TableMarkupError: Malformed table; parse incomplete.'],
+["""\
++-------------------------------+
+| A table with two header rows, |
++------------+------------------+
+| the first | with a span. |
++============+==================+
+| Two body | rows, |
++------------+------------------+
+| the second with a span. |
++-------------------------------+
+""",
+[(0, 0, 2, 32, ['A table with two header rows,']),
+ (2, 0, 4, 13, ['the first']),
+ (2, 13, 4, 32, ['with a span.']),
+ (4, 0, 6, 13, ['Two body']),
+ (4, 13, 6, 32, ['rows,']),
+ (6, 0, 8, 32, ['the second with a span.'])],
+([12, 18],
+ [[(0, 1, 1, ['A table with two header rows,']),
+ None],
+ [(0, 0, 3, ['the first']),
+ (0, 0, 3, ['with a span.'])]],
+ [[(0, 0, 5, ['Two body']),
+ (0, 0, 5, ['rows,'])],
+ [(0, 1, 7, ['the second with a span.']),
+ None]])],
+["""\
++-------------------------------+
+| A table with two head/body |
++=============+=================+
+| row | separators. |
++=============+=================+
+| That's bad. | |
++-------------+-----------------+
+""",
+'TableMarkupError: Multiple head/body row separators in table '
+'(at line offset 2 and 4); only one allowed.',
+'TableMarkupError: Multiple head/body row separators in table '
+'(at line offset 2 and 4); only one allowed.'],
+]
+
+if __name__ == '__main__':
+ import unittest
+ unittest.main(defaultTest='suite')
diff --git a/test/test_parsers/test_rst/test_block_quotes.py b/test/test_parsers/test_rst/test_block_quotes.py
new file mode 100755
index 000000000..e047d0e92
--- /dev/null
+++ b/test/test_parsers/test_rst/test_block_quotes.py
@@ -0,0 +1,124 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Tests for states.py.
+"""
+
+import DocutilsTestSupport
+
+def suite():
+ s = DocutilsTestSupport.ParserTestSuite()
+ s.generateTests(totest)
+ return s
+
+totest = {}
+
+totest['block_quotes'] = [
+["""\
+Line 1.
+Line 2.
+
+ Indented.
+""",
+"""\
+
+
+ Line 1.
+ Line 2.
+
+
+ Indented.
+"""],
+["""\
+Line 1.
+Line 2.
+
+ Indented 1.
+
+ Indented 2.
+""",
+"""\
+
+
+ Line 1.
+ Line 2.
+
+
+ Indented 1.
+
+
+ Indented 2.
+"""],
+["""\
+Line 1.
+Line 2.
+ Unexpectedly indented.
+""",
+"""\
+
+
+ Line 1.
+ Line 2.
+
+
+ Unexpected indentation at line 3.
+
+
+ Unexpectedly indented.
+"""],
+["""\
+Line 1.
+Line 2.
+
+ Indented.
+no blank line
+""",
+"""\
+
+
+ Line 1.
+ Line 2.
+
+
+ Indented.
+
+
+ Unindent without blank line at line 5.
+
+ no blank line
+"""],
+["""\
+Here is a paragraph.
+
+ Indent 8 spaces.
+
+ Indent 4 spaces.
+
+Is this correct? Should it generate a warning?
+Yes, it is correct, no warning necessary.
+""",
+"""\
+
+
+ Here is a paragraph.
+
+
+
+ Indent 8 spaces.
+
+ Indent 4 spaces.
+
+ Is this correct? Should it generate a warning?
+ Yes, it is correct, no warning necessary.
+"""],
+]
+
+if __name__ == '__main__':
+ import unittest
+ unittest.main(defaultTest='suite')
diff --git a/test/test_parsers/test_rst/test_bullet_lists.py b/test/test_parsers/test_rst/test_bullet_lists.py
new file mode 100755
index 000000000..b9552042e
--- /dev/null
+++ b/test/test_parsers/test_rst/test_bullet_lists.py
@@ -0,0 +1,181 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Tests for states.py.
+"""
+
+import DocutilsTestSupport
+
+def suite():
+ s = DocutilsTestSupport.ParserTestSuite()
+ s.generateTests(totest)
+ return s
+
+totest = {}
+
+totest['bullet_lists'] = [
+["""\
+- item
+""",
+"""\
+
+
+
+
+ item
+"""],
+["""\
+* item 1
+
+* item 2
+""",
+"""\
+
+
+
+
+ item 1
+
+
+ item 2
+"""],
+["""\
+No blank line between:
+
++ item 1
++ item 2
+""",
+"""\
+
+
+ No blank line between:
+
+
+
+ item 1
+
+
+ item 2
+"""],
+["""\
+- item 1, para 1.
+
+ item 1, para 2.
+
+- item 2
+""",
+"""\
+
+
+
+
+ item 1, para 1.
+
+ item 1, para 2.
+
+
+ item 2
+"""],
+["""\
+- item 1, line 1
+ item 1, line 2
+- item 2
+""",
+"""\
+
+
+
+
+ item 1, line 1
+ item 1, line 2
+
+
+ item 2
+"""],
+["""\
+Different bullets:
+
+- item 1
+
++ item 2
+
+* item 3
+- item 4
+""",
+"""\
+
+
+ Different bullets:
+
+
+
+ item 1
+
+
+
+ item 2
+
+
+
+ item 3
+
+
+ Unindent without blank line at line 8.
+
+
+
+ item 4
+"""],
+["""\
+- item
+no blank line
+""",
+"""\
+
+
+
+
+ item
+
+
+ Unindent without blank line at line 2.
+
+ no blank line
+"""],
+["""\
+-
+
+empty item above
+""",
+"""\
+
+
+
+
+ empty item above
+"""],
+["""\
+-
+empty item above, no blank line
+""",
+"""\
+
+
+
+
+
+ Unindent without blank line at line 2.
+
+ empty item above, no blank line
+"""],
+]
+
+if __name__ == '__main__':
+ import unittest
+ unittest.main(defaultTest='suite')
diff --git a/test/test_parsers/test_rst/test_citations.py b/test/test_parsers/test_rst/test_citations.py
new file mode 100755
index 000000000..15568c1fd
--- /dev/null
+++ b/test/test_parsers/test_rst/test_citations.py
@@ -0,0 +1,139 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Tests for states.py.
+"""
+
+import DocutilsTestSupport
+
+def suite():
+ s = DocutilsTestSupport.ParserTestSuite()
+ s.generateTests(totest)
+ return s
+
+totest = {}
+
+totest['citations'] = [
+["""\
+.. [citation] This is a citation.
+""",
+"""\
+
+
+
+ citation
+
+ This is a citation.
+"""],
+["""\
+.. [citation1234] This is a citation with year.
+""",
+"""\
+
+
+
+ citation1234
+
+ This is a citation with year.
+"""],
+["""\
+.. [citation] This is a citation
+ on multiple lines.
+""",
+"""\
+
+
+
+ citation
+
+ This is a citation
+ on multiple lines.
+"""],
+["""\
+.. [citation1] This is a citation
+ on multiple lines with more space.
+
+.. [citation2] This is a citation
+ on multiple lines with less space.
+""",
+"""\
+
+
+
+ citation1
+
+ This is a citation
+ on multiple lines with more space.
+
+
+ citation2
+
+ This is a citation
+ on multiple lines with less space.
+"""],
+["""\
+.. [citation]
+ This is a citation on multiple lines
+ whose block starts on line 2.
+""",
+"""\
+
+
+
+ citation
+
+ This is a citation on multiple lines
+ whose block starts on line 2.
+"""],
+["""\
+.. [citation]
+
+That was an empty citation.
+""",
+"""\
+
+
+
+ citation
+
+ That was an empty citation.
+"""],
+["""\
+.. [citation]
+No blank line.
+""",
+"""\
+
+
+
+ citation
+
+
+ Unindent without blank line at line 2.
+
+ No blank line.
+"""],
+["""\
+.. [citation label with spaces] this isn't a citation
+
+.. [*citationlabelwithmarkup*] this isn't a citation
+""",
+"""\
+
+
+ [citation label with spaces] this isn't a citation
+
+ [*citationlabelwithmarkup*] this isn't a citation
+"""],
+]
+
+
+if __name__ == '__main__':
+ import unittest
+ unittest.main(defaultTest='suite')
diff --git a/test/test_parsers/test_rst/test_comments.py b/test/test_parsers/test_rst/test_comments.py
new file mode 100755
index 000000000..4e2e9db0d
--- /dev/null
+++ b/test/test_parsers/test_rst/test_comments.py
@@ -0,0 +1,238 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Tests for states.py.
+"""
+
+import DocutilsTestSupport
+
+def suite():
+ s = DocutilsTestSupport.ParserTestSuite()
+ s.generateTests(totest)
+ return s
+
+totest = {}
+
+totest['comments'] = [
+["""\
+.. A comment
+
+Paragraph.
+""",
+"""\
+
+
+ A comment
+
+ Paragraph.
+"""],
+["""\
+.. A comment
+ block.
+
+Paragraph.
+""",
+"""\
+
+
+ A comment
+ block.
+
+ Paragraph.
+"""],
+["""\
+..
+ A comment consisting of multiple lines
+ starting on the line after the
+ explicit markup start.
+""",
+"""\
+
+
+ A comment consisting of multiple lines
+ starting on the line after the
+ explicit markup start.
+"""],
+["""\
+.. A comment.
+.. Another.
+
+Paragraph.
+""",
+"""\
+
+
+ A comment.
+
+ Another.
+
+ Paragraph.
+"""],
+["""\
+.. A comment
+no blank line
+
+Paragraph.
+""",
+"""\
+
+
+ A comment
+
+
+ Unindent without blank line at line 2.
+
+ no blank line
+
+ Paragraph.
+"""],
+["""\
+.. A comment::
+
+Paragraph.
+""",
+"""\
+
+
+ A comment::
+
+ Paragraph.
+"""],
+["""\
+.. Next is an empty comment, which serves to end this comment and
+ prevents the following block quote being swallowed up.
+
+..
+
+ A block quote.
+""",
+"""\
+
+
+ Next is an empty comment, which serves to end this comment and
+ prevents the following block quote being swallowed up.
+
+
+
+ A block quote.
+"""],
+["""\
+term 1
+ definition 1
+
+ .. a comment
+
+term 2
+ definition 2
+""",
+"""\
+
+
+
+
+ term 1
+
+
+ definition 1
+
+ a comment
+
+
+ term 2
+
+
+ definition 2
+"""],
+["""\
+term 1
+ definition 1
+
+.. a comment
+
+term 2
+ definition 2
+""",
+"""\
+
+
+
+
+ term 1
+
+
+ definition 1
+
+ a comment
+
+
+
+ term 2
+
+
+ definition 2
+"""],
+["""\
++ bullet paragraph 1
+
+ bullet paragraph 2
+
+ .. comment between bullet paragraphs 2 and 3
+
+ bullet paragraph 3
+""",
+"""\
+
+
+
+
+ bullet paragraph 1
+
+ bullet paragraph 2
+
+ comment between bullet paragraphs 2 and 3
+
+ bullet paragraph 3
+"""],
+["""\
++ bullet paragraph 1
+
+ .. comment between bullet paragraphs 1 (leader) and 2
+
+ bullet paragraph 2
+""",
+"""\
+
+
+
+
+ bullet paragraph 1
+
+ comment between bullet paragraphs 1 (leader) and 2
+
+ bullet paragraph 2
+"""],
+["""\
++ bullet
+
+ .. trailing comment
+""",
+"""\
+
+
+
+
+ bullet
+
+ trailing comment
+"""],
+]
+
+if __name__ == '__main__':
+ import unittest
+ unittest.main(defaultTest='suite')
diff --git a/test/test_parsers/test_rst/test_definition_lists.py b/test/test_parsers/test_rst/test_definition_lists.py
new file mode 100755
index 000000000..daafd0f92
--- /dev/null
+++ b/test/test_parsers/test_rst/test_definition_lists.py
@@ -0,0 +1,317 @@
+#! /usr/bin/env python
+
+"""
+:Author: David Goodger
+:Contact: goodger@users.sourceforge.net
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This module has been placed in the public domain.
+
+Tests for states.py.
+"""
+
+import DocutilsTestSupport
+
+def suite():
+ s = DocutilsTestSupport.ParserTestSuite()
+ s.generateTests(totest)
+ return s
+
+totest = {}
+
+totest['definition_lists'] = [
+["""\
+term
+ definition
+""",
+"""\
+
+
+
+
+ term
+
+
+ definition
+"""],
+["""\
+term
+ definition
+
+paragraph
+""",
+"""\
+
+
+
+
+ term
+
+
+ definition
+
+ paragraph
+"""],
+["""\
+term
+ definition
+no blank line
+""",
+"""\
+
+
+
+
+ term
+
+
+ definition
+
+
+ Unindent without blank line at line 3.
+
+ no blank line
+"""],
+["""\
+A paragraph::
+ A literal block without a blank line first?
+""",
+"""\
+
+
+
+
+ A paragraph::
+
+
+
+ Blank line missing before literal block? Interpreted as a definition list item. At line 2.
+
+ A literal block without a blank line first?
+"""],
+["""\
+term 1
+ definition 1
+
+term 2
+ definition 2
+""",
+"""\
+
+
+
+
+ term 1
+
+
+ definition 1
+
+
+ term 2
+
+
+ definition 2
+"""],
+["""\
+term 1
+ definition 1 (no blank line below)
+term 2
+ definition 2
+""",
+"""\
+
+
+
+
+ term 1
+
+
+ definition 1 (no blank line below)
+
+
+ term 2
+
+
+ definition 2
+"""],
+["""\
+term 1
+ definition 1
+
+ term 1a
+ definition 1a
+
+ term 1b
+ definition 1b
+
+term 2
+ definition 2
+
+paragraph
+""",
+"""\
+