path: root/docs/dev
diff options
Diffstat (limited to 'docs/dev')
15 files changed, 8735 insertions, 0 deletions
diff --git a/docs/dev/distributing.txt b/docs/dev/distributing.txt
new file mode 100644
index 000000000..c81807279
--- /dev/null
+++ b/docs/dev/distributing.txt
@@ -0,0 +1,146 @@
+ Docutils_ Distributor's Guide
+:Author: Felix Wiemann
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This document has been placed in the public domain.
+.. _Docutils:
+.. contents::
+This document describes how to create packages of Docutils (e.g. for
+shipping with a Linux distribution). If you have any questions,
+please direct them to the Docutils-develop_ mailing list.
+First, please download the most current `release tarball`_ and unpack
+.. _Docutils-develop: ../user/mailing-lists.html#docutils-develop
+.. _release tarball:
+Docutils has the following dependencies:
+* Python 2.1 or later is required. While the compiler package from
+ the Tools/ directory of Python's source distribution must be
+ installed for the test suite to pass with Python 2.1, the
+ functionality available to end users should be available without the
+ compiler package as well. So just use ">= Python 2.1" in the
+ dependencies.
+* Docutils may optionally make use of the PIL (`Python Imaging
+ Library`_). If PIL is present, it is automatically detected by
+ Docutils.
+* There are three files in the ``extras/`` directory of the Docutils
+ distribution, ````, ````, and ````.
+ For Python 2.1/2.2, all of them must be installed (into the
+ ``site-packages/`` directory). Python 2.3 and later versions have
+ ``textwrap`` and ``optparse`` included in the standard library, so
+ only ```` is required here; installing the other files won't
+ hurt, though.
+ These files are automatically installed by the setup script (when
+ calling "python install").
+.. _Python Imaging Library:
+Python Files
+The Docutils Python files must be installed into the
+``site-packages/`` directory of Python. Running ``python
+install`` should do the trick, but if you want to place the files
+yourself, you can just install the ``docutils/`` directory of the
+Docutils tarball to ``/usr/lib/python/site-packages/docutils/``. In
+this case you should also compile the Python files to ``.pyc`` and/or
+``.pyo`` files so that Docutils doesn't need to be recompiled every
+time it's executed.
+The executable front-end tools are located in the ``tools/`` directory
+of the Docutils tarball.
+The ``rst2*.py`` tools (except ````) are intended for
+end-users. You should install them to ``/usr/bin/``. You do not need
+to change the names (e.g. to ````) because the
+``rst2`` prefix is unique.
+The documentation should be generated using ````. To
+generate HTML for all documentation files, go to the ``tools/``
+directory and run::
+ # Place html4css1.css in base directory.
+ cp ../docutils/writers/html4css1/html4css1.css ..
+ ./ --stylesheet-path=../html4css1.css ..
+Then install the following files to ``/usr/share/doc/docutils/`` (or
+wherever you install documentation):
+* All ``.html`` and ``.txt`` files in the base directory.
+* The ``docs/`` directory.
+ Do not install the contents of the ``docs/`` directory directly to
+ ``/usr/share/doc/docutils/``; it's incomplete and would contain
+ invalid references!
+* The ``licenses/`` directory.
+* ``html4css1.css`` in the base directory.
+Removing the ``.txt`` Files
+If you are tight with disk space, you can remove all ``.txt`` files in
+the tree except for:
+* those in the ``licenses/`` directory because they have not been
+ processed to HTML and
+* ``user/rst/cheatsheet.txt`` and ``user/rst/demo.txt``, which should
+ be readable in source form.
+Before you remove the ``.txt`` files you should rerun ````
+with the ``--no-source-link`` switch to avoid broken references to the
+source files.
+Other Files
+You may want to install the Emacs-Lisp files
+``tools/editors/emacs/*.el`` into the appropriate directory.
+Configuration File
+It is possible to have a system-wide configuration file at
+``/etc/docutils.conf``. However, this is usually not necessary. You
+should *not* install ``tools/docutils.conf`` into ``/etc/``.
+While you probably do not need to ship the tests with your
+distribution, you can test your package by installing it and then
+running ```` from the ``tests/`` directory of the Docutils
diff --git a/docs/dev/enthought-plan.txt b/docs/dev/enthought-plan.txt
new file mode 100644
index 000000000..0ab0d3c83
--- /dev/null
+++ b/docs/dev/enthought-plan.txt
@@ -0,0 +1,480 @@
+ Plan for Enthought API Documentation Tool
+:Author: David Goodger
+:Date: $Date$
+:Revision: $Revision$
+:Copyright: 2004 by `Enthought, Inc. <>`_
+:License: `Enthought License`_ (BSD-style)
+.. _Enthought License:
+This document should be read in conjunction with the `Enthought API
+Documentation Tool RFP`__ prepared by Janet Swisher.
+__ enthought-rfp.html
+.. contents::
+.. sectnum::
+In March 2004 at I met Eric Jones, president and CTO of `Enthought,
+Inc.`_, at `PyCon 2004`_ in Washington DC. He told me that Enthought
+was using reStructuredText_ for source code documentation, but they
+had some issues. He asked if I'd be interested in doing some work on
+a customized API documentation tool. Shortly after PyCon, Janet
+Swisher, Enthought's senior technical writer, contacted me to work out
+details. Some email, a trip to Austin in May, and plenty of Texas
+hospitality later, we had a project. This document will record the
+details, milestones, and evolution of the project.
+In a nutshell, Enthought is sponsoring the implementation of an open
+source API documentation tool that meets their needs. Fortuitously,
+their needs coincide well with the "Python Source Reader" description
+in `PEP 258`_. In other words, Enthought is funding some significant
+improvements to Docutils, improvements that were planned but never
+implemented due to time and other constraints. The implementation
+will take place gradually over several months, on a part-time basis.
+This is an ideal example of cooperation between a corporation and an
+open-source project. The corporation, the project, I personally, and
+the community all benefit. Enthought, whose commitment to open source
+is also evidenced by their sponsorship of SciPy_, benefits by
+obtaining a useful piece of software, much more quickly than would
+have been possible without their support. Docutils benefits directly
+from the implementation of one of its core subsystems. I benefit from
+the funding, which allows me to justify the long hours to my wife and
+family. All the corporations, projects, and individuals that make up
+the community will benefit from the end result, which will be great.
+All that's left now is to actually do the work!
+.. _PyCon 2004:
+.. _reStructuredText:
+.. _SciPy:
+Development Plan
+1. Analyze prior art, most notably Epydoc_ and HappyDoc_, to see how
+ they do what they do. I have no desire to reinvent wheels
+ unnecessarily. I want to take the best ideas from each tool,
+ combined with the outline in `PEP 258`_ (which will evolve), and
+ build at least the foundation of the definitive Python
+ auto-documentation tool.
+ .. _Epydoc:
+ .. _HappyDoc:
+ .. _PEP 258:
+2. Decide on a base platform. The best way to achieve Enthought's
+ goals in a reasonable time frame may be to extend Epydoc or
+ HappyDoc. Or it may be necessary to start fresh.
+3. Extend the reStructuredText parser. See `Proposed Changes to
+ reStructuredText`_ below.
+4. Depending on the base platform chosen, build or extend the
+ docstring & doc comment extraction tool. This may be the biggest
+ part of the project, but I won't be able to break it down into
+ details until more is known.
+If possible, all software and documentation files will be stored in
+the Subversion repository of Docutils and/or the base project, which
+are all publicly-available via anonymous pserver access.
+The Docutils project is very open about granting Subversion write
+access; so far, everyone who asked has been given access. Any
+Enthought staff member who would like Subversion write access will get
+If either Epydoc or HappyDoc is chosen as the base platform, I will
+ask the project's administrator for CVS access for myself and any
+Enthought staff member who wants it. If sufficient access is not
+granted -- although I doubt that there would be any problem -- we may
+have to begin a fork, which could be hosted on SourceForge, on
+Enthought's Subversion server, or anywhere else deemed appropriate.
+Copyright & License
+Most existing Docutils files have been placed in the public domain, as
+ :Copyright: This document has been placed in the public domain.
+This is in conjunction with the "Public Domain Dedication" section of
+The code and documentation originating from Enthought funding will
+have Enthought's copyright and license declaration. While I will try
+to keep Enthought-specific code and documentation separate from the
+existing files, there will inevitably be cases where it makes the most
+sense to extend existing files.
+I propose the following:
+1. New files related to this Enthought-funded work will be identified
+ with the following field-list headers::
+ :Copyright: 2004 by Enthought, Inc.
+ :License: Enthought License (BSD Style)
+ The license field text will be linked to the license file itself.
+2. For significant or major changes to an existing file (more than 10%
+ change), the headers shall change as follows (for example)::
+ :Copyright: 2001-2004 by David Goodger
+ :Copyright: 2004 by Enthought, Inc.
+ :License: BSD-style
+ If the Enthought-funded portion becomes greater than the previously
+ existing portion, Enthought's copyright line will be shown first.
+3. In cases of insignificant or minor changes to an existing file
+ (less than 10% change), the public domain status shall remain
+ unchanged.
+A section describing all of this will be added to the Docutils
+`COPYING`__ instructions file.
+If another project is chosen as the base project, similar changes
+would be made to their files, subject to negotiation.
+Proposed Changes to reStructuredText
+Doc Comment Syntax
+The "traits" construct is implemented as dictionaries, where
+standalone strings would be Python syntax errors. Therefore traits
+require documentation in comments. We also need a way to
+differentiate between ordinary "internal" comments and documentation
+comments (doc comments).
+Javadoc uses the following syntax for doc comments::
+ /**
+ * The first line of a multi-line doc comment begins with a slash
+ * and *two* asterisks. The doc comment ends normally.
+ */
+Python doesn't have multi-line comments; only single-line. A similar
+convention in Python might look like this::
+ ##
+ # The first line of a doc comment begins with *two* hash marks.
+ # The doc comment ends with the first non-comment line.
+ 'data' : AnyValue,
+ ## The double-hash-marks could occur on the first line of text,
+ # saving a line in the source.
+ 'data' : AnyValue,
+How to indicate the end of the doc comment? ::
+ ##
+ # The first line of a doc comment begins with *two* hash marks.
+ # The doc comment ends with the first non-comment line, or another
+ # double-hash-mark.
+ ##
+ # This is an ordinary, internal, non-doc comment.
+ 'data' : AnyValue,
+ ## First line of a doc comment, terse syntax.
+ # Second (and last) line. Ends here: ##
+ # This is an ordinary, internal, non-doc comment.
+ 'data' : AnyValue,
+Or do we even need to worry about this case? A simple blank line
+could be used::
+ ## First line of a doc comment, terse syntax.
+ # Second (and last) line. Ends with a blank line.
+ # This is an ordinary, internal, non-doc comment.
+ 'data' : AnyValue,
+Other possibilities::
+ #" Instead of double-hash-marks, we could use a hash mark and a
+ # quotation mark to begin the doc comment.
+ 'data' : AnyValue,
+ ## We could require double-hash-marks on every line. This has the
+ ## added benefit of delimiting the *end* of the doc comment, as
+ ## well as working well with line wrapping in Emacs
+ ## ("fill-paragraph" command).
+ # Ordinary non-doc comment.
+ 'data' : AnyValue,
+ #" A hash mark and a quotation mark on each line looks funny, and
+ #" it doesn't work well with line wrapping in Emacs.
+ 'data' : AnyValue,
+These styles (repeated on each line) work well with line wrapping in
+ ## #> #| #- #% #! #*
+These styles do *not* work well with line wrapping in Emacs::
+ #" #' #: #) #. #/ #@ #$ #^ #= #+ #_ #~
+The style of doc comment indicator used could be a runtime, global
+and/or per-module setting. That may add more complexity than it's
+worth though.
+I recommend adopting "#*" on every line::
+ # This is an ordinary non-doc comment.
+ #* This is a documentation comment, with an asterisk after the
+ #* hash marks on every line.
+ 'data' : AnyValue,
+I initially recommended adopting double-hash-marks::
+ # This is an ordinary non-doc comment.
+ ## This is a documentation comment, with double-hash-marks on
+ ## every line.
+ 'data' : AnyValue,
+But Janet Swisher rightly pointed out that this could collide with
+ordinary comments that are then block-commented. This applies to
+double-hash-marks on the first line only as well. So they're out.
+On the other hand, the JavaDoc-comment style ("##" on the first line
+only, "#" after that) is used in Fredrik Lundh's PythonDoc_. It may
+be worthwhile to conform to this syntax, reinforcing it as a standard.
+PythonDoc does not support terse doc comments (text after "##" on the
+first line).
+.. _PythonDoc:
+Enthought's Traits system has switched to a metaclass base, and traits
+are now defined via ordinary attributes. Therefore doc comments are
+no longer absolutely necessary; attribute docstrings will suffice.
+Doc comments may still be desirable though, since they allow
+documentation to precede the thing being documented.
+Docstring Density & Whitespace Minimization
+One problem with extensively documented classes & functions, is that
+there is a lot of screen space wasted on whitespace. Here's some
+current Enthought code (from lib/cp/fluids/
+ def max_gas(temperature, pressure, api, specific_gravity=.56):
+ """
+ Computes the maximum dissolved gas in oil using Batzle and
+ Wang (1992).
+ Parameters
+ ----------
+ temperature : sequence
+ Temperature in degrees Celsius
+ pressure : sequence
+ Pressure in MPa
+ api : sequence
+ Stock tank oil API
+ specific_gravity : sequence
+ Specific gravity of gas at STP, default is .56
+ Returns
+ -------
+ max_gor : sequence
+ Maximum dissolved gas in liters/liter
+ Description
+ -----------
+ This estimate is based on equations given by Mavko, Mukerji,
+ and Dvorkin, (1998, pp. 218-219, or 2003, p. 236) obtained
+ originally from Batzle and Wang (1992).
+ """
+ code...
+The docstring is 24 lines long.
+Rather than using subsections, field lists (which exist now) can save
+6 lines::
+ def max_gas(temperature, pressure, api, specific_gravity=.56):
+ """
+ Computes the maximum dissolved gas in oil using Batzle and
+ Wang (1992).
+ :Parameters:
+ temperature : sequence
+ Temperature in degrees Celsius
+ pressure : sequence
+ Pressure in MPa
+ api : sequence
+ Stock tank oil API
+ specific_gravity : sequence
+ Specific gravity of gas at STP, default is .56
+ :Returns:
+ max_gor : sequence
+ Maximum dissolved gas in liters/liter
+ :Description: This estimate is based on equations given by
+ Mavko, Mukerji, and Dvorkin, (1998, pp. 218-219, or 2003,
+ p. 236) obtained originally from Batzle and Wang (1992).
+ """
+ code...
+As with the "Description" field above, field bodies may begin on the
+same line as the field name, which also saves space.
+The output for field lists is typically a table structure. For
+ :Parameters:
+ temperature : sequence
+ Temperature in degrees Celsius
+ pressure : sequence
+ Pressure in MPa
+ api : sequence
+ Stock tank oil API
+ specific_gravity : sequence
+ Specific gravity of gas at STP, default is .56
+ :Returns:
+ max_gor : sequence
+ Maximum dissolved gas in liters/liter
+ :Description:
+ This estimate is based on equations given by Mavko,
+ Mukerji, and Dvorkin, (1998, pp. 218-219, or 2003, p. 236)
+ obtained originally from Batzle and Wang (1992).
+But the definition lists describing the parameters and return values
+are still wasteful of space. There are a lot of half-filled lines.
+Definition lists are currently defined as::
+ term : classifier
+ definition
+Where the classifier part is optional. Ideas for improvements:
+1. We could allow multiple classifiers::
+ term : classifier one : two : three ...
+ definition
+2. We could allow the definition on the same line as the term, using
+ some embedded/inline markup:
+ * "--" could be used, but only in limited and well-known contexts::
+ term -- definition
+ This is the syntax used by StructuredText (one of
+ reStructuredText's predecessors). It was not adopted for
+ reStructuredText because it is ambiguous -- people often use "--"
+ in their text, as I just did. But given a constrained context,
+ the ambiguity would be acceptable (or would it?). That context
+ would be: in docstrings, within a field list, perhaps only with
+ certain well-defined field names (parameters, returns).
+ * The "constrained context" above isn't really enough to make the
+ ambiguity acceptable. Instead, a slightly more verbose but far
+ less ambiguous syntax is possible::
+ term === definition
+ This syntax has advantages. Equals signs lend themselves to the
+ connotation of "definition". And whereas one or two equals signs
+ are commonly used in program code, three equals signs in a row
+ have no conflicting meanings that I know of. (Update: there
+ *are* uses out there.)
+ The problem with this approach is that using inline markup for
+ structure is inherently ambiguous in reStructuredText. For
+ example, writing *about* definition lists would be difficult::
+ ``term === definition`` is an example of a compact definition list item
+ The parser checks for structural markup before it does inline
+ markup processing. But the "===" should be protected by its inline
+ literal context.
+3. We could allow the definition on the same line as the term, using
+ structural markup. A variation on bullet lists would work well::
+ : term :: definition
+ : another term :: and a definition that
+ wraps across lines
+ Some ambiguity remains::
+ : term ``containing :: double colons`` :: definition
+ But the likelihood of such cases is negligible, and they can be
+ covered in the documentation.
+ Other possibilities for the definition delimiter include::
+ : term : classifier -- definition
+ : term : classifier --- definition
+ : term : classifier : : definition
+ : term : classifier === definition
+The third idea currently has the best chance of being adopted and
+Combining these ideas, the function definition becomes::
+ def max_gas(temperature, pressure, api, specific_gravity=.56):
+ """
+ Computes the maximum dissolved gas in oil using Batzle and
+ Wang (1992).
+ :Parameters:
+ : temperature : sequence :: Temperature in degrees Celsius
+ : pressure : sequence :: Pressure in MPa
+ : api : sequence :: Stock tank oil API
+ : specific_gravity : sequence :: Specific gravity of gas at
+ STP, default is .56
+ :Returns:
+ : max_gor : sequence :: Maximum dissolved gas in liters/liter
+ :Description: This estimate is based on equations given by
+ Mavko, Mukerji, and Dvorkin, (1998, pp. 218-219, or 2003,
+ p. 236) obtained originally from Batzle and Wang (1992).
+ """
+ code...
+The docstring is reduced to 14 lines, from the original 24. For
+longer docstrings with many parameters and return values, the
+difference would be more significant.
diff --git a/docs/dev/enthought-rfp.txt b/docs/dev/enthought-rfp.txt
new file mode 100644
index 000000000..986f5604f
--- /dev/null
+++ b/docs/dev/enthought-rfp.txt
@@ -0,0 +1,146 @@
+ Enthought API Documentation Tool
+ Request for Proposals
+:Author: Janet Swisher, Senior Technical Writer
+:Organization: `Enthought, Inc. <>`_
+:Copyright: 2004 by Enthought, Inc.
+:License: `Enthought License`_ (BSD Style)
+.. _Enthought License:
+The following is excerpted from the full RFP, and is published here
+with permission from `Enthought, Inc.`_ See the `Plan for Enthought
+API Documentation Tool`__.
+__ enthought-plan.html
+.. contents::
+.. sectnum::
+The documentation tool will address the following high-level goals:
+Documentation Extraction
+1. Documentation will be generated directly from Python source code,
+ drawing from the code structure, docstrings, and possibly other
+ comments.
+2. The tool will extract logical constructs as appropriate, minimizing
+ the need for comments that are redundant with the code structure.
+ The output should reflect both documented and undocumented
+ elements.
+Source Format
+1. The docstrings will be formatted in as terse syntax as possible.
+ Required tags, syntax, and white space should be minimized.
+2. The tool must support the use of Traits. Special comment syntax
+ for Traits may be necessary. Information about the Traits package
+ is available at In the
+ following example, each trait definition is prefaced by a plain
+ comment::
+ __traits__ = {
+ # The current selection within the frame.
+ 'selection' : Trait([], TraitInstance(list)),
+ # The frame has been activated or deactivated.
+ 'activated' : TraitEvent(),
+ 'closing' : TraitEvent(),
+ # The frame is closed.
+ 'closed' : TraitEvent(),
+ }
+3. Support for ReStructuredText (ReST) format is desirable, because
+ much of the existing docstrings uses ReST. However, the complete
+ ReST specification need not be supported, if a subset can achieve
+ the project goals. If the tool does not support ReST, the
+ contractor should also provide a tool or path to convert existing
+ docstrings.
+Output Format
+1. Documentation will be output as a navigable suite of HTML
+ files.
+2. The style of the HTML files will be customizable by a cascading
+ style sheet and/or a customizable template.
+3. Page elements such as headers and footer should be customizable, to
+ support differing requirements from one documentation project to
+ the next.
+Output Structure and Navigation
+1. The navigation scheme for the HTML files should not rely on frames,
+ and should harmonize with conversion to Microsoft HTML Help (.chm)
+ format.
+2. The output should be structured to make navigable the architecture
+ of the Python code. Packages, modules, classes, traits, and
+ functions should be presented in clear, logical hierarchies.
+ Diagrams or trees for inheritance, collaboration, sub-packaging,
+ etc. are desirable but not required.
+3. The output must include indexes that provide a comprehensive view
+ of all packages, modules, and classes. These indexes will provide
+ readers with a clear and exhaustive view of the code base. These
+ indexes should be presented in a way that is easily accessible and
+ allows easy navigation.
+4. Cross-references to other documented elements will be used
+ throughout the documentation, to enable the reader to move quickly
+ relevant information. For example, where type information for an
+ element is available, the type definition should be
+ cross-referenced.
+5. The HTML suite should provide consistent navigation back to the
+ home page, which will include the following information:
+ * Bibliographic information
+ - Author
+ - Copyright
+ - Release date
+ - Version number
+ * Abstract
+ * References
+ - Links to related internal docs (i.e., other docs for the same
+ product)
+ - Links to related external docs (e.g., supporting development
+ docs, Python support docs, docs for included packages)
+ It should be possible to specify similar information at the top
+ level of each package, so that packages can be included as
+ appropriate for a given application.
+Enthought intends to release the software under an open-source
+("BSD-style") license.
diff --git a/docs/dev/hacking.txt b/docs/dev/hacking.txt
new file mode 100644
index 000000000..d0ec9a3fb
--- /dev/null
+++ b/docs/dev/hacking.txt
@@ -0,0 +1,264 @@
+ Docutils_ Hacker's Guide
+:Author: Felix Wiemann
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This document has been placed in the public domain.
+:Abstract: This is the introduction to Docutils for all persons who
+ want to extend Docutils in some way.
+:Prerequisites: You have used reStructuredText_ and played around with
+ the `Docutils front-end tools`_ before. Some (basic) Python
+ knowledge is certainly helpful (though not necessary, strictly
+ speaking).
+.. _Docutils:
+.. _reStructuredText:
+.. _Docutils front-end tools: ../user/tools.html
+.. contents::
+Overview of the Docutils Architecture
+To give you an understanding of the Docutils architecture, we'll dive
+right into the internals using a practical example.
+Consider the following reStructuredText file::
+ My *favorite* language is Python_.
+ .. _Python:
+Using the ```` front-end tool, you would get an HTML output
+which looks like this::
+ [uninteresting HTML code removed]
+ <body>
+ <div class="document">
+ <p>My <em>favorite</em> language is <a class="reference" href="">Python</a>.</p>
+ </div>
+ </body>
+ </html>
+While this looks very simple, it's enough to illustrate all internal
+processing stages of Docutils. Let's see how this document is
+processed from the reStructuredText source to the final HTML output:
+Reading the Document
+The **Reader** reads the document from the source file and passes it
+to the parser (see below). The default reader is the standalone
+reader (``docutils/readers/``) which just reads the input
+data from a single text file. Unless you want to do really fancy
+things, there is no need to change that.
+Since you probably won't need to touch readers, we will just move on
+to the next stage:
+Parsing the Document
+The **Parser** analyzes the the input document and creates a **node
+tree** representation. In this case we are using the
+**reStructuredText parser** (``docutils/parsers/rst/``).
+To see what that node tree looks like, we call ```` (which
+can be found in the ``tools/`` directory of the Docutils distribution)
+with our example file (``test.txt``) as first parameter (Windows users
+might need to type ``python test.txt``)::
+ $ test.txt
+ <document source="test.txt">
+ <paragraph>
+ My
+ <emphasis>
+ favorite
+ language is
+ <reference name="Python" refname="python">
+ Python
+ .
+ <target ids="python" names="python" refuri="">
+Let us now examine the node tree:
+The top-level node is ``document``. It has a ``source`` attribute
+whose value is ``text.txt``. There are two children: A ``paragraph``
+node and a ``target`` node. The ``paragraph`` in turn has children: A
+text node ("My "), an ``emphasis`` node, a text node (" language is "),
+a ``reference`` node, and again a ``Text`` node (".").
+These node types (``document``, ``paragraph``, ``emphasis``, etc.) are
+all defined in ``docutils/``. The node types are internally
+arranged as a class hierarchy (for example, both ``emphasis`` and
+``reference`` have the common superclass ``Inline``). To get an
+overview of the node class hierarchy, use epydoc (type ``epydoc``) and look at the class hierarchy tree.
+Transforming the Document
+In the node tree above, the ``reference`` node does not contain the
+target URI (````) yet.
+Assigning the target URI (from the ``target`` node) to the
+``reference`` node is *not* done by the parser (the parser only
+translates the input document into a node tree).
+Instead, it's done by a **Transform**. In this case (resolving a
+reference), it's done by the ``ExternalTargets`` transform in
+In fact, there are quite a lot of Transforms, which do various useful
+things like creating the table of contents, applying substitution
+references or resolving auto-numbered footnotes.
+The Transforms are applied after parsing. To see how the node tree
+has changed after applying the Transforms, we use the
+```` tool:
+.. parsed-literal::
+ $ test.txt
+ <document source="test.txt">
+ <paragraph>
+ My
+ <emphasis>
+ favorite
+ language is
+ <reference name="Python" **refuri=""**>
+ Python
+ .
+ <target ids="python" names="python" ``refuri=""``>
+For our small test document, the only change is that the ``refname``
+attribute of the reference has been replaced by a ``refuri``
+attribute |---| the reference has been resolved.
+While this does not look very exciting, transforms are a powerful tool
+to apply any kind of transformation on the node tree.
+By the way, you can also get a "real" XML representation of the node
+tree by using ```` instead of ````.
+Writing the Document
+To get an HTML document out of the node tree, we use a **Writer**, the
+HTML writer in this case (``docutils/writers/``).
+The writer receives the node tree and returns the output document.
+For HTML output, we can test this using the ```` tool::
+ $ --link-stylesheet test.txt
+ <?xml version="1.0" encoding="utf-8" ?>
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
+ <html xmlns="" xml:lang="en" lang="en">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+ <meta name="generator" content="Docutils 0.3.10:" />
+ <title></title>
+ <link rel="stylesheet" href="../docutils/writers/html4css1/html4css1.css" type="text/css" />
+ </head>
+ <body>
+ <div class="document">
+ <p>My <em>favorite</em> language is <a class="reference" href="">Python</a>.</p>
+ </div>
+ </body>
+ </html>
+So here we finally have our HTML output. The actual document contents
+are in the fourth-last line. Note, by the way, that the HTML writer
+did not render the (invisible) ``target`` node |---| only the
+``paragraph`` node and its children appear in the HTML output.
+Extending Docutils
+Now you'll ask, "how do I actually extend Docutils?"
+First of all, once you are clear about *what* you want to achieve, you
+have to decide *where* to implement it |---| in the Parser (e.g. by
+adding a directive or role to the reStructuredText parser), as a
+Transform, or in the Writer. There is often one obvious choice among
+those three (Parser, Transform, Writer). If you are unsure, ask on
+the Docutils-develop_ mailing list.
+In order to find out how to start, it is often helpful to look at
+similar features which are already implemented. For example, if you
+want to add a new directive to the reStructuredText parser, look at
+the implementation of a similar directive in
+Modifying the Document Tree Before It Is Written
+You can modify the document tree right before the writer is called.
+One possibility is to use the publish_doctree_ and
+publish_from_doctree_ functions.
+To retrieve the document tree, call::
+ document = docutils.core.publish_doctree(...)
+Please see the docstring of publish_doctree for a list of parameters.
+.. XXX Need to write a well-readable list of (commonly used) options
+ of the publish_* functions. Probably in api/publisher.txt.
+``document`` is the root node of the document tree. You can now
+change the document by accessing the ``document`` node and its
+children |---| see `The Node Interface`_ below.
+When you're done with modifying the document tree, you can write it
+out by calling::
+ output = docutils.core.publish_from_doctree(document, ...)
+.. _publish_doctree: ../api/publisher.html#publish_doctree
+.. _publish_from_doctree: ../api/publisher.html#publish_from_doctree
+The Node Interface
+As described in the overview above, Docutils' internal representation
+of a document is a tree of nodes. We'll now have a look at the
+interface of these nodes.
+(To be completed.)
+What Now?
+This document is not complete. Many topics could (and should) be
+covered here. To find out with which topics we should write about
+first, we are awaiting *your* feedback. So please ask your questions
+on the Docutils-develop_ mailing list.
+.. _Docutils-develop: ../user/mailing-lists.html#docutils-develop
+.. |---| unicode:: 8212 .. em-dash
+ :trim:
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/dev/policies.txt b/docs/dev/policies.txt
new file mode 100644
index 000000000..25fb4f2e9
--- /dev/null
+++ b/docs/dev/policies.txt
@@ -0,0 +1,549 @@
+ Docutils Project Policies
+:Author: David Goodger; open to all Docutils developers
+:Date: $Date$
+:Revision: $Revision$
+:Copyright: This document has been placed in the public domain.
+.. contents::
+The Docutils project group is a meritocracy based on code contribution
+and lots of discussion [#bcs]_. A few quotes sum up the policies of
+the Docutils project. The IETF's classic credo (by MIT professor Dave
+Clark) is an ideal we can aspire to:
+ We reject: kings, presidents, and voting. We believe in: rough
+ consensus and running code.
+As architect, chief cook and bottle-washer, David Goodger currently
+functions as BDFN (Benevolent Dictator For Now). (But he would
+happily abdicate the throne given a suitable candidate. Any takers?)
+Eric S. Raymond, anthropologist of the hacker subculture, writes in
+his essay `The Magic Cauldron`_:
+ The number of contributors [to] projects is strongly and inversely
+ correlated with the number of hoops each project makes a user go
+ through to contribute.
+We will endeavour to keep the barrier to entry as low as possible.
+The policies below should not be thought of as barriers, but merely as
+a codification of experience to date. These are "best practices";
+guidelines, not absolutes. Exceptions are expected, tolerated, and
+used as a source of improvement. Feedback and criticism is welcome.
+As for control issues, Emmett Plant (CEO of the Foundation,
+originators of Ogg Vorbis) put it well when he said:
+ Open source dictates that you lose a certain amount of control
+ over your codebase, and that's okay with us.
+.. [#bcs] Phrase borrowed from `Ben Collins-Sussman of the Subversion
+ project <>`__.
+.. _The Magic Cauldron:
+Python Coding Conventions
+Contributed code will not be refused merely because it does not
+strictly adhere to these conditions; as long as it's internally
+consistent, clean, and correct, it probably will be accepted. But
+don't be surprised if the "offending" code gets fiddled over time to
+conform to these conventions.
+The Docutils project shall follow the generic coding conventions as
+specified in the `Style Guide for Python Code`_ and `Docstring
+Conventions`_ PEPs, summarized, clarified, and extended as follows:
+* 4 spaces per indentation level. No hard tabs.
+* Use only 7-bit ASCII, no 8-bit strings. See `Docutils
+ Internationalization`_.
+* No one-liner compound statements (i.e., no ``if x: return``: use two
+ lines & indentation), except for degenerate class or method
+ definitions (i.e., ``class X: pass`` is OK.).
+* Lines should be no more than 78 characters long.
+* Use "StudlyCaps" for class names (except for element classes in
+ docutils.nodes).
+* Use "lowercase" or "lowercase_with_underscores" for function,
+ method, and variable names. For short names, maximum two words,
+ joined lowercase may be used (e.g. "tagname"). For long names with
+ three or more words, or where it's hard to parse the split between
+ two words, use lowercase_with_underscores (e.g.,
+ "note_explicit_target", "explicit_target"). If in doubt, use
+ underscores.
+* Avoid lambda expressions, which are inherently difficult to
+ understand. Named functions are preferable and superior: they're
+ faster (no run-time compilation), and well-chosen names serve to
+ document and aid understanding.
+* Avoid functional constructs (filter, map, etc.). Use list
+ comprehensions instead.
+* Avoid ``from __future__ import`` constructs. They are inappropriate
+ for production code.
+* Use 'single quotes' for string literals, and """triple double
+ quotes""" for docstrings.
+.. _Style Guide for Python Code:
+.. _Docstring Conventions:
+.. _Docutils Internationalization: ../howto/i18n.html#python-code
+Documentation Conventions
+* Docutils documentation is written using reStructuredText, of course.
+* Use 7-bit ASCII if at all possible, and Unicode substitutions when
+ necessary.
+* Use the following section title adornment styles::
+ ================
+ Document Title
+ ================
+ --------------------------------------------
+ Document Subtitle, or Major Division Title
+ --------------------------------------------
+ Section
+ =======
+ Subsection
+ ----------
+ Sub-Subsection
+ ``````````````
+ Sub-Sub-Subsection
+ ..................
+* Use two blank lines before each section/subsection/etc. title. One
+ blank line is sufficient between immediately adjacent titles.
+* Add a bibliographic field list immediately after the document
+ title/subtitle. See the beginning of this document for an example.
+* Add an Emacs "local variables" block in a comment at the end of the
+ document. See the end of this document for an example.
+Copyrights and Licensing
+The majority of the Docutils project code and documentation has been
+placed in the public domain. Unless clearly and explicitly indicated
+otherwise, any patches (modifications to existing files) submitted to
+the project for inclusion (via Subversion, SourceForge trackers,
+mailing lists, or private email) are assumed to be in the public
+domain as well.
+Any new files contributed to the project should clearly state their
+intentions regarding copyright, in one of the following ways:
+* Public domain (preferred): include the statement "This
+ module/document has been placed in the public domain."
+* Copyright & open source license: include a copyright notice, along
+ with either an embedded license statement, a reference to an
+ accompanying license file, or a license URL.
+One of the goals of the Docutils project, once complete, is to be
+incorporated into the Python standard library. At that time copyright
+of the Docutils code will be assumed by or transferred to the Python
+Software Foundation (PSF), and will be released under Python's
+license. If the copyright/license option is chosen for new files, the
+license should be compatible with Python's current license, and the
+author(s) of the files should be willing to assign copyright to the
+PSF. The PSF accepts the `Academic Free License v. 2.1
+<>`_ and the `Apache
+License, Version 2.0 <>`_.
+Subversion Repository
+Please see the `repository documentation`_ for details on how to
+access Docutils' Subversion repository. Anyone can access the
+repository anonymously. Only project developers can make changes.
+(If you would like to become a project developer, just ask!) Also see
+`Setting Up For Docutils Development`_ below for some useful info.
+Unless you really *really* know what you're doing, please do *not* use
+``svn import``. It's quite easy to mess up the repository with an
+.. _repository documentation: repository.html
+(These branch policies go into effect with Docutils 0.4.)
+The "docutils" directory of the **trunk** (a.k.a. the **Docutils
+core**) is used for active -- but stable, fully tested, and reviewed
+-- development.
+There will be at least one active **maintenance branch** at a time,
+based on at least the latest feature release. For example, when
+Docutils 0.5 is released, its maintenance branch will take over, and
+the 0.4.x maintenance branch may be retired. Maintenance branches
+will receive bug fixes only; no new features will be allowed here.
+Obvious and uncontroversial bug fixes *with tests* can be checked in
+directly to the core and to the maintenance branches. Don't forget to
+add test cases! Many (but not all) bug fixes will be applicable both
+to the core and to the maintenance branches; these should be applied
+to both. No patches or dedicated branches are required for bug fixes,
+but they may be used. It is up to the discretion of project
+developers to decide which mechanism to use for each case.
+Feature additions and API changes will be done in **feature
+branches**. Feature branches will not be managed in any way.
+Frequent small checkins are encouraged here. Feature branches must be
+discussed on the docutils-develop mailing list and reviewed before
+being merged into the core.
+Review Criteria
+Before a new feature, an API change, or a complex, disruptive, or
+controversial bug fix can be checked in to the core or into a
+maintenance branch, it must undergo review. These are the criteria:
+* The branch must be complete, and include full documentation and
+ tests.
+* There should ideally be one branch merge commit per feature or
+ change. In other words, each branch merge should represent a
+ coherent change set.
+* The code must be stable and uncontroversial. Moving targets and
+ features under debate are not ready to be merged.
+* The code must work. The test suite must complete with no failures.
+ See `Docutils Testing`_.
+The review process will ensure that at least one other set of eyeballs
+& brains sees the code before it enters the core. In addition to the
+above, the general `Check-ins`_ policy (below) also applies.
+.. _Docutils Testing: testing.html
+Changes or additions to the Docutils core and maintenance branches
+carry a commitment to the Docutils user community. Developers must be
+prepared to fix and maintain any code they have committed.
+The Docutils core (``trunk/docutils`` directory) and maintenance
+branches should always be kept in a stable state (usable and as
+problem-free as possible). All changes to the Docutils core or
+maintenance branches must be in `good shape`_, usable_, documented_,
+tested_, and `reasonably complete`_.
+* _`Good shape` means that the code is clean, readable, and free of
+ junk code (unused legacy code; by analogy to "junk DNA").
+* _`Usable` means that the code does what it claims to do. An "XYZ
+ Writer" should produce reasonable XYZ output.
+* _`Documented`: The more complete the documentation the better.
+ Modules & files must be at least minimally documented internally.
+ `Docutils Front-End Tools`_ should have a new section for any
+ front-end tool that is added. `Docutils Configuration Files`_
+ should be modified with any settings/options defined. For any
+ non-trivial change, the HISTORY.txt_ file should be updated.
+* _`Tested` means that unit and/or functional tests, that catch all
+ bugs fixed and/or cover all new functionality, have been added to
+ the test suite. These tests must be checked by running the test
+ suite under all supported Python versions, and the entire test suite
+ must pass. See `Docutils Testing`_.
+* _`Reasonably complete` means that the code must handle all input.
+ Here "handle" means that no input can cause the code to fail (cause
+ an exception, or silently and incorrectly produce nothing).
+ "Reasonably complete" does not mean "finished" (no work left to be
+ done). For example, a writer must handle every standard element
+ from the Docutils document model; for unimplemented elements, it
+ must *at the very least* warn that "Output for element X is not yet
+ implemented in writer Y".
+If you really want to check code directly into the Docutils core,
+you can, but you must ensure that it fulfills the above criteria
+first. People will start to use it and they will expect it to work!
+If there are any issues with your code, or if you only have time for
+gradual development, you should put it on a branch or in the sandbox
+first. It's easy to move code over to the Docutils core once it's
+It is the responsibility and obligation of all developers to keep the
+Docutils core and maintenance branches stable. If a commit is made to
+the core or maintenance branch which breaks any test, the solution is
+simply to revert the change. This is not vindictive; it's practical.
+We revert first, and discuss later.
+Docutils will pursue an open and trusting policy for as long as
+possible, and deal with any aberrations if (and hopefully not when)
+they happen. We'd rather see a torrent of loose contributions than
+just a trickle of perfect-as-they-stand changes. The occasional
+mistake is easy to fix. That's what Subversion is for!
+.. _Docutils Front-End Tools: ../user/tools.html
+.. _Docutils Configuration Files: ../user/config.html
+.. _HISTORY.txt: ../../HISTORY.txt
+Version Numbering
+Docutils version numbering uses a ``major.minor.micro`` scheme (x.y.z;
+for example, 0.4.1).
+**Major releases** (x.0, e.g. 1.0) will be rare, and will represent
+major changes in API, functionality, or commitment. For example, as
+long as the major version of Docutils is 0, it is to be considered
+*experimental code*. When Docutils reaches version 1.0, the major
+APIs will be considered frozen and backward compatibility will become
+of paramount importance.
+Releases that change the minor number (x.y, e.g. 0.5) will be
+**feature releases**; new features from the `Docutils core`_ will be
+Releases that change the micro number (x.y.z, e.g. 0.4.1) will be
+**bug-fix releases**. No new features will be introduced in these
+releases; only bug fixes off of `maintenance branches`_ will be
+This policy was adopted in October 2005, and will take effect with
+Docutils version 0.4. Prior to version 0.4, Docutils didn't have an
+official version numbering policy, and micro releases contained both
+bug fixes and new features.
+.. _Docutils core:
+.. _maintenance branches:
+Snapshot tarballs will be generated regularly from
+* the Docutils core, representing the current cutting-edge state of
+ development;
+* each active maintenance branch, for bug fixes;
+* each development branch, representing the unstable
+ seat-of-your-pants bleeding edge.
+The ``sandbox/infrastructure/docutils-update`` shell script, run as an
+hourly cron job on the BerliOS server, is responsible for
+automatically generating the snapshots and updating the web site. See
+the `web site docs <website.html>`__.
+Setting Up For Docutils Development
+When making changes to the code, testing is a must. The code should
+be run to verify that it produces the expected results, and the entire
+test suite should be run too. The modified Docutils code has to be
+accessible to Python for the tests to have any meaning. There are two
+ways to keep the Docutils code accessible during development:
+1. Update your ``PYTHONPATH`` environment variable so that Python
+ picks up your local working copy of the code. This is the
+ recommended method.
+ We'll assume that the Docutils trunk is checked out under your
+ ~/projects/ directory as follows::
+ svn co svn+ssh://<user> \
+ docutils
+ For the bash shell, add this to your ``~/.profile``::
+ PYTHONPATH=$HOME/projects/docutils/docutils
+ PYTHONPATH=$PYTHONPATH:$HOME/projects/docutils/docutils/extras
+ The first line points to the directory containing the ``docutils``
+ package. The second line adds the directory containing the
+ third-party modules Docutils depends on. The third line exports
+ this environment variable. You may also wish to add the ``tools``
+ directory to your ``PATH``::
+ PATH=$PATH:$HOME/projects/docutils/docutils/tools
+ export PATH
+2. Before you run anything, every time you make a change, reinstall
+ Docutils::
+ python install
+ .. CAUTION::
+ This method is **not** recommended for day-to-day development;
+ it's too easy to forget. Confusion inevitably ensues.
+ If you install Docutils this way, Python will always pick up the
+ last-installed copy of the code. If you ever forget to
+ reinstall the "docutils" package, Python won't see your latest
+ changes.
+A useful addition to the ``docutils`` top-level directory in branches
+and alternate copies of the code is a ``set-PATHS`` file
+containing the following lines::
+ # source this file
+ export PYTHONPATH=$PWD:$PWD/extras
+ export PATH=$PWD/tools:$PATH
+Open a shell for this branch, ``cd`` to the ``docutils`` top-level
+directory, and "source" this file. For example, using the bash
+ $ cd some-branch/docutils
+ $ . set-PATHS
+Mailing Lists
+Developers are recommended to subscribe to all `Docutils mailing
+.. _Docutils mailing lists: ../user/mailing-lists.html
+The Wiki
+There is a development wiki at as
+a scratchpad for transient notes. Please use the repository for
+permament document storage.
+The Sandbox
+The `sandbox directory`_ is a place to play around, to try out and
+share ideas. It's a part of the Subversion repository but it isn't
+distributed as part of Docutils releases. Feel free to check in code
+to the sandbox; that way people can try it out but you won't have to
+worry about it working 100% error-free, as is the goal of the Docutils
+core. Each developer who wants to play in the sandbox should create
+either a project-specific subdirectory or personal subdirectory
+(suggested name: SourceForge ID, nickname, or given name + family
+initial). It's OK to make a mess in your personal space! But please,
+play nice.
+Please update the `sandbox README`_ file with links and a brief
+description of your work.
+In order to minimize the work necessary for others to install and try
+out new, experimental components, the following sandbox directory
+structure is recommended::
+ sandbox/
+ project_name/ # For a collaborative project.
+ # Structure as in userid/component_name below.
+ userid/ # For personal space.
+ component_name/ # A verbose name is best.
+ README.txt # Please explain the requirements,
+ # purpose/goals, and usage.
+ docs/
+ ...
+ # The component is a single module.
+ # *OR* (but *not* both)
+ component/ # The component is a package.
+ # Contains the Reader/Writer class.
+ # Other modules and data files used
+ data.txt # by this component.
+ ...
+ test/ # Test suite.
+ ...
+ tools/ # For front ends etc.
+ ...
+ # Use Distutils to install the component
+ # code and tools/ files into the right
+ # places in Docutils.
+Some sandbox projects are destined to become Docutils components once
+completed. Others, such as add-ons to Docutils or applications of
+Docutils, graduate to become `parallel projects`_.
+.. _sandbox README:
+.. _sandbox directory:
+.. _parallel project:
+Parallel Projects
+Parallel projects contain useful code that is not central to the
+functioning of Docutils. Examples are specialized add-ons or
+plug-ins, and applications of Docutils. They use Docutils, but
+Docutils does not require their presence to function.
+An official parallel project will have its own directory beside (or
+parallel to) the main ``docutils`` directory in the Subversion
+repository. It can have its own web page in the domain, its own file releases and
+downloadable snapshots, and even a mailing list if that proves useful.
+However, an official parallel project has implications: it is expected
+to be maintained and continue to work with changes to the core
+A parallel project requires a project leader, who must commit to
+coordinate and maintain the implementation:
+* Answer questions from users and developers.
+* Review suggestions, bug reports, and patches.
+* Monitor changes and ensure the quality of the code and
+ documentation.
+* Coordinate with Docutils to ensure interoperability.
+* Put together official project releases.
+Of course, related projects may be created independently of Docutils.
+The advantage of a parallel project is that the SourceForge
+environment and the developer and user communities are already
+established. Core Docutils developers are available for consultation
+and may contribute to the parallel project. It's easier to keep the
+projects in sync when there are changes made to the core Docutils
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/dev/pysource.dtd b/docs/dev/pysource.dtd
new file mode 100644
index 000000000..fb8af4091
--- /dev/null
+++ b/docs/dev/pysource.dtd
@@ -0,0 +1,259 @@
+ Docutils Python Source DTD
+:Author: David Goodger
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This DTD has been placed in the public domain.
+:Filename: pysource.dtd
+This DTD (document type definition) extends the Generic DTD (see
+More information about this DTD and the Docutils project can be found
+at The latest version of this DTD
+is available from
+The formal public identifier for this DTD is::
+ +//IDN Docutils Python Source//EN//XML
+ Parameter Entity Overrides
+<!ENTITY % additional.section.elements
+ " | package_section | module_section | class_section
+ | method_section | function_section
+ | module_attribute_section | function_attribute_section
+ | class_attribute_section | instance_attribute_section ">
+<!ENTITY % additional.inline.elements
+ " | package | module | class | method | function
+ | variable | parameter | type | attribute
+ | module_attribute | class_attribute | instance_attribute
+ | exception_class | warning_class ">
+ Generic DTD
+This DTD extends the Docutils Generic DTD, available from
+<!ENTITY % docutils PUBLIC
+ "+//IDN Docutils Generic//EN//XML"
+ "docutils.dtd">
+ Additional Section Elements
+<!ELEMENT package_section
+ (package, fullname?, import_list?, %structure.model;)>
+<!ATTLIST package_section %basic.atts;>
+<!ELEMENT module_section
+ (module, fullname?, import_list?, %structure.model;)>
+<!ATTLIST module_section %basic.atts;>
+<!ELEMENT class_section
+ (class, inheritance_list?, fullname?, subclasses?,
+ %structure.model;)>
+<!ATTLIST class_section %basic.atts;>
+<!ELEMENT method_section
+ (method, parameter_list?, fullname?, overrides?,
+ %structure.model;)>
+<!ATTLIST method_section %basic.atts;>
+<!ELEMENT function_section
+ (function, parameter_list?, fullname?, %structure.model;)>
+<!ATTLIST function_section %basic.atts;>
+<!ELEMENT module_attribute_section
+ (attribute, initial_value?, fullname?, %structure.model;)>
+<!ATTLIST module_attribute_section %basic.atts;>
+<!ELEMENT function_attribute_section
+ (attribute, initial_value?, fullname?, %structure.model;)>
+<!ATTLIST function_attribute_section %basic.atts;>
+<!ELEMENT class_attribute_section
+ (attribute, initial_value?, fullname?, overrides?,
+ %structure.model;)>
+<!ATTLIST class_attribute_section %basic.atts;>
+<!ELEMENT instance_attribute_section
+ (attribute, initial_value?, fullname?, overrides?,
+ %structure.model;)>
+<!ATTLIST instance_attribute_section %basic.atts;>
+ Section Subelements
+<!ELEMENT fullname
+ (package | module | class | method | function | attribute)+>
+<!ATTLIST fullname %basic.atts;>
+<!ELEMENT import_list (import_item+)>
+<!ATTLIST import_list %basic.atts;>
+Support ``import module``, ``import module as alias``, ``from module
+import identifier``, and ``from module import identifier as alias``.
+<!ELEMENT import_item (fullname, identifier?, alias?)>
+<!ATTLIST import_item %basic.atts;>
+<!ELEMENT inheritance_list (class+)>
+<!ATTLIST inheritance_list %basic.atts;>
+<!ELEMENT subclasses (class+)>
+<!ATTLIST subclasses %basic.atts;>
+<!ELEMENT parameter_list
+ ((parameter_item+, optional_parameters*) | optional_parameters+)>
+<!ATTLIST parameter_list %basic.atts;>
+<!ELEMENT parameter_item
+ ((parameter | parameter_tuple), parameter_default?)>
+<!ATTLIST parameter_item %basic.atts;>
+<!ELEMENT optional_parameters (parameter_item+, optional_parameters*)>
+<!ATTLIST optional_parameters %basic.atts;>
+<!ELEMENT parameter_tuple (parameter | parameter_tuple)+>
+<!ATTLIST parameter_tuple %basic.atts;>
+<!ELEMENT parameter_default (#PCDATA)>
+<!ATTLIST parameter_default %basic.atts;>
+<!ELEMENT overrides (fullname+)>
+<!ATTLIST overrides %basic.atts;>
+<!ELEMENT initial_value (#PCDATA)>
+<!ATTLIST initial_value %basic.atts;>
+ Additional Inline Elements
+<!-- Also used as the `package_section` identifier/title. -->
+<!ELEMENT package (#PCDATA)>
+<!ATTLIST package
+ %basic.atts;
+ %reference.atts;>
+<!-- Also used as the `module_section` identifier/title. -->
+<!ELEMENT module (#PCDATA)>
+<!ATTLIST module
+ %basic.atts;
+ %reference.atts;>
+Also used as the `class_section` identifier/title, and in the
+`inheritance` element.
+<!ELEMENT class (#PCDATA)>
+<!ATTLIST class
+ %basic.atts;
+ %reference.atts;>
+<!-- Also used as the `method_section` identifier/title. -->
+<!ELEMENT method (#PCDATA)>
+<!ATTLIST method
+ %basic.atts;
+ %reference.atts;>
+<!-- Also used as the `function_section` identifier/title. -->
+<!ELEMENT function (#PCDATA)>
+<!ATTLIST function
+ %basic.atts;
+ %reference.atts;>
+??? Use this instead of the ``*_attribute`` elements below? Add a
+"type" attribute to differentiate?
+Also used as the identifier/title for `module_attribute_section`,
+`class_attribute_section`, and `instance_attribute_section`.
+<!ELEMENT attribute (#PCDATA)>
+<!ATTLIST attribute
+ %basic.atts;
+ %reference.atts;>
+Also used as the `module_attribute_section` identifier/title. A module
+attribute is an exported module-level global variable.
+<!ELEMENT module_attribute (#PCDATA)>
+<!ATTLIST module_attribute
+ %basic.atts;
+ %reference.atts;>
+<!-- Also used as the `class_attribute_section` identifier/title. -->
+<!ELEMENT class_attribute (#PCDATA)>
+<!ATTLIST class_attribute
+ %basic.atts;
+ %reference.atts;>
+Also used as the `instance_attribute_section` identifier/title.
+<!ELEMENT instance_attribute (#PCDATA)>
+<!ATTLIST instance_attribute
+ %basic.atts;
+ %reference.atts;>
+<!ELEMENT variable (#PCDATA)>
+<!ATTLIST variable
+ %basic.atts;
+ %reference.atts;>
+<!-- Also used in `parameter_list`. -->
+<!ELEMENT parameter (#PCDATA)>
+<!ATTLIST parameter
+ %basic.atts;
+ %reference.atts;
+ excess_positional %yesorno; #IMPLIED
+ excess_keyword %yesorno; #IMPLIED>
+<!ELEMENT type (#PCDATA)>
+<!ATTLIST type
+ %basic.atts;
+ %reference.atts;>
+<!ELEMENT exception_class (#PCDATA)>
+<!ATTLIST exception_class
+ %basic.atts;
+ %reference.atts;>
+<!ELEMENT warning_class (#PCDATA)>
+<!ATTLIST warning_class
+ %basic.atts;
+ %reference.atts;>
+Local Variables:
+mode: sgml
+indent-tabs-mode: nil
+fill-column: 70
diff --git a/docs/dev/pysource.txt b/docs/dev/pysource.txt
new file mode 100644
index 000000000..6f173a709
--- /dev/null
+++ b/docs/dev/pysource.txt
@@ -0,0 +1,130 @@
+ Python Source Reader
+:Author: David Goodger
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This document has been placed in the public domain.
+This document explores issues around extracting and processing
+docstrings from Python modules.
+For definitive element hierarchy details, see the "Python Plaintext
+Document Interface DTD" XML document type definition, pysource.dtd_
+(which modifies the generic docutils.dtd_). Descriptions below list
+'DTD elements' (XML 'generic identifiers' or tag names) corresponding
+to syntax constructs.
+.. contents::
+The Python Source Reader ("PySource") model that's evolving in my mind
+goes something like this:
+1. Extract the docstring/namespace [#]_ tree from the module(s) and/or
+ package(s).
+ .. [#] See `Docstring Extractor`_ below.
+2. Run the parser on each docstring in turn, producing a forest of
+ doctrees (per
+3. Join the docstring trees together into a single tree, running
+ transforms:
+ - merge hyperlinks
+ - merge namespaces
+ - create various sections like "Module Attributes", "Functions",
+ "Classes", "Class Attributes", etc.; see pysource.dtd_
+ - convert the above special sections to ordinary doctree nodes
+4. Run transforms on the combined doctree. Examples: resolving
+ cross-references/hyperlinks (including interpreted text on Python
+ identifiers); footnote auto-numbering; first field list ->
+ bibliographic elements.
+ (Or should step 4's transforms come before step 3?)
+5. Pass the resulting unified tree to the writer/builder.
+I've had trouble reconciling the roles of input parser and output
+writer with the idea of modes ("readers" or "directors"). Does the
+mode govern the tranformation of the input, the output, or both?
+Perhaps the mode should be split into two.
+For example, say the source of our input is a Python module. Our
+"input mode" should be the "Python Source Reader". It discovers (from
+``__docformat__``) that the input parser is "reStructuredText". If we
+want HTML, we'll specify the "HTML" output formatter. But there's a
+piece missing. What *kind* or *style* of HTML output do we want?
+PyDoc-style, LibRefMan style, etc. (many people will want to specify
+and control their own style). Is the output style specific to a
+particular output format (XML, HTML, etc.)? Is the style specific to
+the input mode? Or can/should they be independent?
+I envision interaction between the input parser, an "input mode" , and
+the output formatter. The same intermediate data format would be used
+between each of these, being transformed as it progresses.
+Docstring Extractor
+We need code that scans a parsed Python module, and returns an ordered
+tree containing the names, docstrings (including attribute and
+additional docstrings), and additional info (in parentheses below) of
+all of the following objects:
+- packages
+- modules
+- module attributes (+ values)
+- classes (+ inheritance)
+- class attributes (+ values)
+- instance attributes (+ values)
+- methods (+ formal parameters & defaults)
+- functions (+ formal parameters & defaults)
+(Extract comments too? For example, comments at the start of a module
+would be a good place for bibliographic field lists.)
+In order to evaluate interpreted text cross-references, namespaces for
+each of the above will also be required.
+See python-dev/docstring-develop thread "AST mining", started on
+Interpreted Text
+DTD elements: package, module, class, method, function,
+module_attribute, class_attribute, instance_attribute, variable,
+parameter, type, exception_class, warning_class.
+To classify identifiers explicitly, the role is given along with the
+identifier in either prefix or suffix form::
+ Use :method:`Keeper.storedata` to store the object's data in
+ ``:instance_attribute:.
+The role may be one of 'package', 'module', 'class', 'method',
+'function', 'module_attribute', 'class_attribute',
+'instance_attribute', 'variable', 'parameter', 'type',
+'exception_class', 'exception', 'warning_class', or 'warning'. Other
+roles may be defined.
+.. _pysource.dtd: pysource.dtd
+.. _docutils.dtd: ../ref/docutils.dtd
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ fill-column: 70
+ End:
diff --git a/docs/dev/release.txt b/docs/dev/release.txt
new file mode 100644
index 000000000..fa58bc46f
--- /dev/null
+++ b/docs/dev/release.txt
@@ -0,0 +1,168 @@
+ Docutils_ Release Procedure
+:Author: David Goodger; Felix Wiemann; open to all Docutils developers
+:Date: $Date$
+:Revision: $Revision$
+:Copyright: This document has been placed in the public domain.
+.. _Docutils:
+(Steps in boldface text are *not* covered by the release script at
+sandbox/fwiemann/ "Not covered" means that you aren't even
+reminded of them. Note: The script needs to be updated to
+reflect the recent move to Subversion!)
+* **Announce a check-in freeze on Docutils-develop. Post a list of
+ major changes since the last release and ask for additions.**
+ .. _CHANGES.txt:
+ **You may want to save this list of changes in a file
+ (e.g. CHANGES.txt) to have it at hand when you need it for posting
+ announcements or pasting it into forms.**
+* Change ``__version_details__`` in docutils/docutils/ to
+ "release" (from "repository").
+* Bump the _`version number` in the following files:
+ + docutils/
+ + docutils/docutils/
+ + docutils/test/functional/expected/* ("Generator: Docutils X.Y.Z")
+* Close the "Changes Since ..." section in docutils/HISTORY.txt.
+* Clear/unset the PYTHONPATH environment variable.
+* Create the release tarball:
+ (a) Create a new empty directory and ``cd`` into it.
+ (b) Get a clean snapshot of the main tree::
+ svn export svn://
+ (c) Use Distutils to create the release tarball::
+ cd docutils
+ python sdist
+* Expand and _`install` the release tarball in isolation:
+ (a) Expand the tarball in a new location, not over any existing
+ files.
+ (b) Remove the old installation from site-packages (including
+, and,
+ Install from expanded directory::
+ cd docutils-X.Y.Z
+ python install
+ The "install" command may require root permissions.
+ (c) Repeat step b) for all supported Python versions.
+* Run the _`test suite` from the expanded archive directory with all
+ supported Python versions: ``cd test ; python -u``.
+* Add a directory X.Y.Z (where X.Y.Z is the current version number
+ of Docutils) in the webroot (i.e. the ``htdocs/`` directory).
+ Put all documentation files into it::
+ cd docutils-X.Y.Z
+ rm -rf build
+ cd tools/
+ ./ ..
+ cd ..
+ find -name test -type d -prune -o -name \*.css -print0 \
+ -o -name \*.html -print0 -o -name \*.txt -print0 \
+ | tar -cjvf docutils-docs.tar.bz2 -T - --null
+ scp docutils-docs.tar.bz2 <username>
+ Now log in to and::
+ cd /home/groups/d/do/docutils/htdocs/
+ mkdir -m g+rwxs X.Y.Z
+ cd X.Y.Z
+ tar -xjvf ~/docutils-docs.tar.bz2
+ rm ~/docutils-docs.tar.bz2
+* Upload the release tarball::
+ $ ftp
+ Connected to
+ ...
+ Name ( anonymous
+ 331 Anonymous login ok, send your complete e-mail address as password.
+ Password:
+ ...
+ 230 Anonymous access granted, restrictions apply.
+ ftp> bin
+ 200 Type set to I.
+ ftp> cd /incoming
+ 250 CWD command successful.
+ ftp> put docutils-X.Y.Z.tar.gz
+* Access the _`file release system` on SourceForge (Admin
+ interface). Fill in the fields:
+ :Package ID: docutils
+ :Release Name: <use release number only, e.g. 0.3>
+ :Release Date: <today's date>
+ :Status: Active
+ :File Name: <select the file just uploaded>
+ :File Type: Source .gz
+ :Processor Type: Platform-Independent
+ :Release Notes: <insert README.txt file here>
+ :Change Log: <insert summary from CHANGES.txt_>
+ Also check the "Preserve my pre-formatted text" box.
+* For verifying the integrity of the release, download the release
+ tarball (you may need to wait up to 30 minutes), install_ it, and
+ re-run the `test suite`_.
+* Register with PyPI (``python register``).
+* Restore ``__version_details__`` in docutils/docutils/ to
+ "repository" (from "release").
+* Bump the `version number`_ again.
+* Add a new empty section "Changes Since ..." in HISTORY.txt.
+* Update the web page (web/index.txt).
+* Run docutils-update on the server.
+* **Send announcement email to:**
+ * (also announcing the end
+ of the check-in freeze)
+ *
+ *
+ *
+* **Add a SourceForge News item, with title "Docutils X.Y.Z released"
+ and containing the release tarball's download URL.**
+* **Register with FreshMeat.** (Add a `new release`__ for the
+ `Docutils project`__).
+ __
+ __
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/dev/repository.txt b/docs/dev/repository.txt
new file mode 100644
index 000000000..2c613b10e
--- /dev/null
+++ b/docs/dev/repository.txt
@@ -0,0 +1,217 @@
+ The Docutils_ Subversion Repository
+:Author: Felix Wiemann
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This document has been placed in the public domain.
+.. _Docutils:
+.. contents::
+Docutils uses a Subversion_ repository located at ````.
+Subversion is exhaustively documented in the `Subversion Book`_
+.. _Subversion:
+.. _Subversion Book:
+.. Note::
+ While the repository resides at BerliOS, all other project data
+ (web site, snapshots, releases, mailing lists, trackers) is hosted
+ at SourceForge.
+For the project policy on repository use (check-in requirements,
+branching, etc.), please see the `Docutils Project Policies`__.
+__ policies.html#subversion-repository
+Accessing the Repository
+Web Access
+The repository can be browsed and examined via the web at
+Anonymous Access
+Anonymous (read-only) access is available at ``svn://``.
+To check out the current main source tree of Docutils, type ::
+ svn checkout svn://
+To check out everything (main tree, sandboxes, and web site), type ::
+ svn checkout svn:// docutils
+This will create a working copy of the whole trunk in a new directory
+called ``docutils``.
+If you cannot use the ``svn`` port, you can also use the HTTP access
+method by substituting "" for
+Note that you should *not* check out ``svn://``
+(without "trunk"), because then you'd end up fetching the whole
+Docutils tree for every branch and tag over and over again, wasting
+your and BerliOS's bandwidth.
+To update your working copy later on, cd into the working copy and
+type ::
+ svn update
+Developer Access
+(Developers who had write-access for Docutils' CVS repository on should `register at BerliOS`__ and send a message with
+their BerliOS user name to `Felix Wiemann <>`_.)
+If you are a developer, you get read-write access via
+``svn+ssh://<user>``, where
+``<user>`` is your BerliOS user account name. So to retrieve a
+working copy, type ::
+ svn checkout svn+ssh://<user> \
+ docutils
+If you previously had an anonymous working copy and gained developer
+access, you can switch the URL associated with your working copy by
+typing ::
+ svn switch --relocate svn:// \
+ svn+ssh://<user>
+(Again, ``<user>`` is your BerliOS user account name.)
+If you cannot use the ``ssh`` port, you can also use the HTTPS access
+method by substituting "" for
+Setting Up Your Subversion Client For Development
+Before commiting changes to the repository, please ensure that the
+following lines are contained (and uncommented) in your
+~/.subversion/config file, so that new files are added with the
+correct properties set::
+ [miscellany]
+ # For your convenience:
+ global-ignores = ... *.pyc ...
+ # For correct properties:
+ enable-auto-props = yes
+ [auto-props]
+ *.py = svn:eol-style=native;svn:keywords=Author Date Id Revision
+ *.txt = svn:eol-style=native;svn:keywords=Author Date Id Revision
+ *.html = svn:eol-style=native;svn:keywords=Author Date Id Revision
+ *.xml = svn:eol-style=native;svn:keywords=Author Date Id Revision
+ *.tex = svn:eol-style=native;svn:keywords=Author Date Id Revision
+ *.css = svn:eol-style=native;svn:keywords=Author Date Id Revision
+ *.patch = svn:eol-style=native
+ *.sh = svn:eol-style=native;svn:executable;svn:keywords=Author Date Id Revision
+ *.png = svn:mime-type=image/png
+ *.jpg = svn:mime-type=image/jpeg
+ *.gif = svn:mime-type=image/gif
+Setting Up SSH Access
+With a public & private key pair, you can access the shell and
+Subversion servers without having to enter your password. There are
+two places to add your SSH public key on BerliOS: your web account and
+your shell account.
+* Adding your SSH key to your BerliOS web account:
+ 1. Log in on the web at Create your
+ account first if necessary. You should be taken to your "My
+ Personal Page" (
+ 2. Choose "Account Options" from the menu below the top banner.
+ 3. At the bottom of the "Account Maintenance" page
+ ( you'll find a "Shell
+ Account Information" section; click on "[Edit Keys]".
+ 4. Copy and paste your SSH public key into the edit box on this page
+ ( Further
+ instructions are available on this page.
+* Adding your SSH key to your BerliOS shell account:
+ 1. Log in to the BerliOS shell server::
+ ssh <user>
+ You'll be asked for your password, which you set when you created
+ your account.
+ 2. Create a .ssh directory in your home directory, and remove
+ permissions for group & other::
+ mkdir .ssh
+ chmod og-rwx .ssh
+ Exit the SSH session.
+ 3. Copy your public key to the .ssh directory on BerliOS::
+ scp .ssh/ <user>
+ Now you should be able to start an SSH session without needing your
+ password.
+Repository Layout
+The following tree shows the repository layout::
+ docutils/
+ |-- branches/
+ | |-- branch1/
+ | | |-- docutils/
+ | | |-- sandbox/
+ | | `-- web/
+ | `-- branch2/
+ | |-- docutils/
+ | |-- sandbox/
+ | `-- web/
+ |-- tags/
+ | |-- tag1/
+ | | |-- docutils/
+ | | |-- sandbox/
+ | | `-- web/
+ | `-- tag2/
+ | |-- docutils/
+ | |-- sandbox/
+ | `-- web/
+ `-- trunk/
+ |-- docutils/
+ |-- sandbox/
+ `-- web/
+``docutils/branches/`` and ``docutils/tags/`` contain (shallow) copies
+of the whole trunk.
+The main source tree lives at ``docutils/trunk/docutils/``, next to
+the sandboxes (``docutils/trunk/sandbox/``) and the web site files
diff --git a/docs/dev/rst/alternatives.txt b/docs/dev/rst/alternatives.txt
new file mode 100644
index 000000000..12874c5fb
--- /dev/null
+++ b/docs/dev/rst/alternatives.txt
@@ -0,0 +1,3129 @@
+ A Record of reStructuredText Syntax Alternatives
+:Author: David Goodger
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This document has been placed in the public domain.
+The following are ideas, alternatives, and justifications that were
+considered for reStructuredText syntax, which did not originate with
+Setext_ or StructuredText_. For an analysis of constructs which *did*
+originate with StructuredText or Setext, please see `Problems With
+StructuredText`_. See the `reStructuredText Markup Specification`_
+for full details of the established syntax.
+The ideas are divided into sections:
+* Implemented_: already done. The issues and alternatives are
+ recorded here for posterity.
+* `Not Implemented`_: these ideas won't be implemented.
+* Tabled_: these ideas should be revisited in the future.
+* `To Do`_: these ideas should be implemented. They're just waiting
+ for a champion to resolve issues and get them done.
+* `... Or Not To Do?`_: possible but questionable. These probably
+ won't be implemented, but you never know.
+.. _Setext:
+.. _StructuredText:
+.. _Problems with StructuredText: problems.html
+.. _reStructuredText Markup Specification:
+ ../../ref/rst/restructuredtext.html
+.. contents::
+ Implemented
+Field Lists
+Prior to the syntax for field lists being finalized, several
+alternatives were proposed.
+1. Unadorned RFC822_ everywhere::
+ Author: Me
+ Version: 1
+ Advantages: clean, precedent (RFC822-compliant). Disadvantage:
+ ambiguous (these paragraphs are a prime example).
+ Conclusion: rejected.
+2. Special case: use unadorned RFC822_ for the very first or very last
+ text block of a document::
+ """
+ Author: Me
+ Version: 1
+ The rest of the document...
+ """
+ Advantages: clean, precedent (RFC822-compliant). Disadvantages:
+ special case, flat (unnested) field lists only, still ambiguous::
+ """
+ Usage: cmdname [options] arg1 arg2 ...
+ We obviously *don't* want the like above to be interpreted as a
+ field list item. Or do we?
+ """
+ Conclusion: rejected for the general case, accepted for specific
+ contexts (PEPs, email).
+3. Use a directive::
+ .. fields::
+ Author: Me
+ Version: 1
+ Advantages: explicit and unambiguous, RFC822-compliant.
+ Disadvantage: cumbersome.
+ Conclusion: rejected for the general case (but such a directive
+ could certainly be written).
+4. Use Javadoc-style::
+ @Author: Me
+ @Version: 1
+ @param a: integer
+ Advantages: unambiguous, precedent, flexible. Disadvantages:
+ non-intuitive, ugly, not RFC822-compliant.
+ Conclusion: rejected.
+5. Use leading colons::
+ :Author: Me
+ :Version: 1
+ Advantages: unambiguous, obvious (*almost* RFC822-compliant),
+ flexible, perhaps even elegant. Disadvantages: no precedent, not
+ quite RFC822-compliant.
+ Conclusion: accepted!
+6. Use double colons::
+ Author:: Me
+ Version:: 1
+ Advantages: unambiguous, obvious? (*almost* RFC822-compliant),
+ flexible, similar to syntax already used for literal blocks and
+ directives. Disadvantages: no precedent, not quite
+ RFC822-compliant, similar to syntax already used for literal blocks
+ and directives.
+ Conclusion: rejected because of the syntax similarity & conflicts.
+Why is RFC822 compliance important? It's a universal Internet
+standard, and super obvious. Also, I'd like to support the PEP format
+(ulterior motive: get PEPs to use reStructuredText as their standard).
+But it *would* be easy to get used to an alternative (easy even to
+convert PEPs; probably harder to convert python-deviants ;-).
+Unfortunately, without well-defined context (such as in email headers:
+RFC822 only applies before any blank lines), the RFC822 format is
+ambiguous. It is very common in ordinary text. To implement field
+lists unambiguously, we need explicit syntax.
+The following question was posed in a footnote:
+ Should "bibliographic field lists" be defined at the parser level,
+ or at the DPS transformation level? In other words, are they
+ reStructuredText-specific, or would they also be applicable to
+ another (many/every other?) syntax?
+The answer is that bibliographic fields are a
+reStructuredText-specific markup convention. Other syntaxes may
+implement the bibliographic elements explicitly. For example, there
+would be no need for such a transformation for an XML-based markup
+.. _RFC822:
+Interpreted Text "Roles"
+The original purpose of interpreted text was as a mechanism for
+descriptive markup, to describe the nature or role of a word or
+phrase. For example, in XML we could say "<function>len</function>"
+to mark up "len" as a function. It is envisaged that within Python
+docstrings (inline documentation in Python module source files, the
+primary market for reStructuredText) the role of a piece of
+interpreted text can be inferred implicitly from the context of the
+docstring within the program source. For other applications, however,
+the role may have to be indicated explicitly.
+Interpreted text is enclosed in single backquotes (`).
+1. Initially, it was proposed that an explicit role could be indicated
+ as a word or phrase within the enclosing backquotes:
+ - As a prefix, separated by a colon and whitespace::
+ `role: interpreted text`
+ - As a suffix, separated by whitespace and a colon::
+ `interpreted text :role`
+ There are problems with the initial approach:
+ - There could be ambiguity with interpreted text containing colons.
+ For example, an index entry of "Mission: Impossible" would
+ require a backslash-escaped colon.
+ - The explicit role is descriptive markup, not content, and will
+ not be visible in the processed output. Putting it inside the
+ backquotes doesn't feel right; the *role* isn't being quoted.
+2. Tony Ibbs suggested that the role be placed outside the
+ backquotes::
+ role:`prefix` or `suffix`:role
+ This removes the embedded-colons ambiguity, but limits the role
+ identifier to be a single word (whitespace would be illegal).
+ Since roles are not meant to be visible after processing, the lack
+ of whitespace support is not important.
+ The suggested syntax remains ambiguous with respect to ratios and
+ some writing styles. For example, suppose there is a "signal"
+ identifier, and we write::
+ ...calculate the `signal`:noise ratio.
+ "noise" looks like a role.
+3. As an improvement on #2, we can bracket the role with colons::
+ :role:`prefix` or `suffix`:role:
+ This syntax is similar to that of field lists, which is fine since
+ both are doing similar things: describing.
+ This is the syntax chosen for reStructuredText.
+4. Another alternative is two colons instead of one::
+ role::`prefix` or `suffix`::role
+ But this is used for analogies ("A:B::C:D": "A is to B as C is to
+ D").
+ Both alternative #2 and #4 lack delimiters on both sides of the
+ role, making it difficult to parse (by the reader).
+5. Some kind of bracketing could be used:
+ - Parentheses::
+ (role)`prefix` or `suffix`(role)
+ - Braces::
+ {role}`prefix` or `suffix`{role}
+ - Square brackets::
+ [role]`prefix` or `suffix`[role]
+ - Angle brackets::
+ <role>`prefix` or `suffix`<role>
+ (The overlap of \*ML tags with angle brackets would be too
+ confusing and precludes their use.)
+Syntax #3 was chosen for reStructuredText.
+A problem with comments (actually, with all indented constructs) is
+that they cannot be followed by an indented block -- a block quote --
+without swallowing it up.
+I thought that perhaps comments should be one-liners only. But would
+this mean that footnotes, hyperlink targets, and directives must then
+also be one-liners? Not a good solution.
+Tony Ibbs suggested a "comment" directive. I added that we could
+limit a comment to a single text block, and that a "multi-block
+comment" could use "comment-start" and "comment-end" directives. This
+would remove the indentation incompatibility. A "comment" directive
+automatically suggests "footnote" and (hyperlink) "target" directives
+as well. This could go on forever! Bad choice.
+Garth Kidd suggested that an "empty comment", a ".." explicit markup
+start with nothing on the first line (except possibly whitespace) and
+a blank line immediately following, could serve as an "unindent". An
+empty comment does **not** swallow up indented blocks following it,
+so block quotes are safe. "A tiny but practical wart." Accepted.
+Anonymous Hyperlinks
+Alan Jaffray came up with this idea, along with the following syntax::
+ Search the `Python DOC-SIG mailing list archives`{}_.
+ .. _:
+The idea is sound and useful. I suggested a "double underscore"
+ Search the `Python DOC-SIG mailing list archives`__.
+ .. __:
+But perhaps single underscores are okay? The syntax looks better, but
+the hyperlink itself doesn't explicitly say "anonymous"::
+ Search the `Python DOC-SIG mailing list archives`_.
+ .. _:
+Mixing anonymous and named hyperlinks becomes confusing. The order of
+targets is not significant for named hyperlinks, but it is for
+anonymous hyperlinks::
+ Hyperlinks: anonymous_, named_, and another anonymous_.
+ .. _named: named
+ .. _: anonymous1
+ .. _: anonymous2
+Without the extra syntax of double underscores, determining which
+hyperlink references are anonymous may be difficult. We'd have to
+check which references don't have corresponding targets, and match
+those up with anonymous targets. Keeping to a simple consistent
+ordering (as with auto-numbered footnotes) seems simplest.
+reStructuredText will use the explicit double-underscore syntax for
+anonymous hyperlinks. An alternative (see `Reworking Explicit Markup
+(Round 1)`_ below) for the somewhat awkward ".. __:" syntax is "__"::
+ An anonymous__ reference.
+ __ http://anonymous
+Reworking Explicit Markup (Round 1)
+Alan Jaffray came up with the idea of `anonymous hyperlinks`_, added
+to reStructuredText. Subsequently it was asserted that hyperlinks
+(especially anonymous hyperlinks) would play an increasingly important
+role in reStructuredText documents, and therefore they require a
+simpler and more concise syntax. This prompted a review of the
+current and proposed explicit markup syntaxes with regards to
+improving usability.
+1. Original syntax::
+ .. _blah: internal hyperlink target
+ .. _blah: http://somewhere external hyperlink target
+ .. _blah: blahblah_ indirect hyperlink target
+ .. __: anonymous internal target
+ .. __: http://somewhere anonymous external target
+ .. __: blahblah_ anonymous indirect target
+ .. [blah] http://somewhere footnote
+ .. blah:: http://somewhere directive
+ .. blah: http://somewhere comment
+ .. Note::
+ The comment text was intentionally made to look like a hyperlink
+ target.
+ Origins:
+ * Except for the colon (a delimiter necessary to allow for
+ phrase-links), hyperlink target ``.. _blah:`` comes from Setext.
+ * Comment syntax from Setext.
+ * Footnote syntax from StructuredText ("named links").
+ * Directives and anonymous hyperlinks original to reStructuredText.
+ Advantages:
+ + Consistent explicit markup indicator: "..".
+ + Consistent hyperlink syntax: ".. _" & ":".
+ Disadvantages:
+ - Anonymous target markup is awkward: ".. __:".
+ - The explicit markup indicator ("..") is excessively overloaded?
+ - Comment text is limited (can't look like a footnote, hyperlink,
+ or directive). But this is probably not important.
+2. Alan Jaffray's proposed syntax #1::
+ __ _blah internal hyperlink target
+ __ blah: http://somewhere external hyperlink target
+ __ blah: blahblah_ indirect hyperlink target
+ __ anonymous internal target
+ __ http://somewhere anonymous external target
+ __ blahblah_ anonymous indirect target
+ __ [blah] http://somewhere footnote
+ .. blah:: http://somewhere directive
+ .. blah: http://somewhere comment
+ The hyperlink-connoted underscores have become first-level syntax.
+ Advantages:
+ + Anonymous targets are simpler.
+ + All hyperlink targets are one character shorter.
+ Disadvantages:
+ - Inconsistent internal hyperlink targets. Unlike all other named
+ hyperlink targets, there's no colon. There's an extra leading
+ underscore, but we can't drop it because without it, "blah" looks
+ like a relative URI. Unless we restore the colon::
+ __ blah: internal hyperlink target
+ - Obtrusive markup?
+3. Alan Jaffray's proposed syntax #2::
+ .. _blah internal hyperlink target
+ .. blah: http://somewhere external hyperlink target
+ .. blah: blahblah_ indirect hyperlink target
+ .. anonymous internal target
+ .. http://somewhere anonymous external target
+ .. blahblah_ anonymous indirect target
+ .. [blah] http://somewhere footnote
+ !! blah: http://somewhere directive
+ ## blah: http://somewhere comment
+ Leading underscores have been (almost) replaced by "..", while
+ comments and directives have gained their own syntax.
+ Advantages:
+ + Anonymous hyperlinks are simpler.
+ + Unique syntax for comments. Connotation of "comment" from
+ some programming languages (including our favorite).
+ + Unique syntax for directives. Connotation of "action!".
+ Disadvantages:
+ - Inconsistent internal hyperlink targets. Again, unlike all other
+ named hyperlink targets, there's no colon. There's a leading
+ underscore, matching the trailing underscores of references,
+ which no other hyperlink targets have. We can't drop that one
+ leading underscore though: without it, "blah" looks like a
+ relative URI. Again, unless we restore the colon::
+ .. blah: internal hyperlink target
+ - All (except for internal) hyperlink targets lack their leading
+ underscores, losing the "hyperlink" connotation.
+ - Obtrusive syntax for comments. Alternatives::
+ ;; blah: http://somewhere
+ (also comment syntax in Lisp & others)
+ ,, blah: http://somewhere
+ ("comma comma": sounds like "comment"!)
+ - Iffy syntax for directives. Alternatives?
+4. Tony Ibbs' proposed syntax::
+ .. _blah: internal hyperlink target
+ .. _blah: http://somewhere external hyperlink target
+ .. _blah: blahblah_ indirect hyperlink target
+ .. anonymous internal target
+ .. http://somewhere anonymous external target
+ .. blahblah_ anonymous indirect target
+ .. [blah] http://somewhere footnote
+ .. blah:: http://somewhere directive
+ .. blah: http://somewhere comment
+ This is the same as the current syntax, except for anonymous
+ targets which drop their "__: ".
+ Advantage:
+ + Anonymous targets are simpler.
+ Disadvantages:
+ - Anonymous targets lack their leading underscores, losing the
+ "hyperlink" connotation.
+ - Anonymous targets are almost indistinguishable from comments.
+ (Better to know "up front".)
+5. David Goodger's proposed syntax: Perhaps going back to one of
+ Alan's earlier suggestions might be the best solution. How about
+ simply adding "__ " as a synonym for ".. __: " in the original
+ syntax? These would become equivalent::
+ .. __: anonymous internal target
+ .. __: http://somewhere anonymous external target
+ .. __: blahblah_ anonymous indirect target
+ __ anonymous internal target
+ __ http://somewhere anonymous external target
+ __ blahblah_ anonymous indirect target
+Alternative 5 has been adopted.
+Backquotes in Phrase-Links
+[From a 2001-06-05 Doc-SIG post in reply to questions from Doug
+The first draft of the spec, posted to the Doc-SIG in November 2000,
+used square brackets for phrase-links. I changed my mind because:
+1. In the first draft, I had already decided on single-backquotes for
+ inline literal text.
+2. However, I wanted to minimize the necessity for backslash escapes,
+ for example when quoting Python repr-equivalent syntax that uses
+ backquotes.
+3. The processing of identifiers (function/method/attribute/module
+ etc. names) into hyperlinks is a useful feature. PyDoc recognizes
+ identifiers heuristically, but it doesn't take much imagination to
+ come up with counter-examples where PyDoc's heuristics would result
+ in embarassing failure. I wanted to do it deterministically, and
+ that called for syntax. I called this construct "interpreted
+ text".
+4. Leveraging off the ``*emphasis*/**strong**`` syntax, lead to the
+ idea of using double-backquotes as syntax.
+5. I worked out some rules for inline markup recognition.
+6. In combination with #5, double backquotes lent themselves to inline
+ literals, neatly satisfying #2, minimizing backslash escapes. In
+ fact, the spec says that no interpretation of any kind is done
+ within double-backquote inline literal text; backslashes do *no*
+ escaping within literal text.
+7. Single backquotes are then freed up for interpreted text.
+8. I already had square brackets required for footnote references.
+9. Since interpreted text will typically turn into hyperlinks, it was
+ a natural fit to use backquotes as the phrase-quoting syntax for
+ trailing-underscore hyperlinks.
+The original inspiration for the trailing underscore hyperlink syntax
+was Setext. But for phrases Setext used a very cumbersome
+``underscores_between_words_like_this_`` syntax.
+The underscores can be viewed as if they were right-pointing arrows:
+``-->``. So ``hyperlink_`` points away from the reference, and
+``.. _hyperlink:`` points toward the target.
+Substitution Mechanism
+Substitutions arose out of a Doc-SIG thread begun on 2001-10-28 by
+Alan Jaffray, "reStructuredText inline markup". It reminded me of a
+missing piece of the reStructuredText puzzle, first referred to in my
+contribution to "Documentation markup & processing / PEPs" (Doc-SIG
+Substitutions allow the power and flexibility of directives to be
+shared by inline text. They are a way to allow arbitrarily complex
+inline objects, while keeping the details out of the flow of text.
+They are the equivalent of SGML/XML's named entities. For example, an
+inline image (using reference syntax alternative 4d (vertical bars)
+and definition alternative 3, the alternatives chosen for inclusion in
+the spec)::
+ The |biohazard| symbol must be used on containers used to dispose
+ of medical waste.
+ .. |biohazard| image:: biohazard.png
+ [height=20 width=20]
+The ``|biohazard|`` substitution reference will be replaced in-line by
+whatever the ``.. |biohazard|`` substitution definition generates (in
+this case, an image). A substitution definition contains the
+substitution text bracketed with vertical bars, followed by a an
+embedded inline-compatible directive, such as "image". A transform is
+required to complete the substitution.
+Syntax alternatives for the reference:
+1. Use the existing interpreted text syntax, with a predefined role
+ such as "sub"::
+ The `biohazard`:sub: symbol...
+ Advantages: existing syntax, explicit. Disadvantages: verbose,
+ obtrusive.
+2. Use a variant of the interpreted text syntax, with a new suffix
+ akin to the underscore in phrase-link references::
+ (a) `name`@
+ (b) `name`#
+ (c) `name`&
+ (d) `name`/
+ (e) `name`<
+ (f) `name`::
+ (g) `name`:
+ Due to incompatibility with other constructs and ordinary text
+ usage, (f) and (g) are not possible.
+3. Use interpreted text syntax with a fixed internal format::
+ (a) `:name:`
+ (b) `name:`
+ (c) `name::`
+ (d) `::name::`
+ (e) `%name%`
+ (f) `#name#`
+ (g) `/name/`
+ (h) `&name&`
+ (i) `|name|`
+ (j) `[name]`
+ (k) `<name>`
+ (l) `&name;`
+ (m) `'name'`
+ To avoid ML confusion (k) and (l) are definitely out. Square
+ brackets (j) won't work in the target (the substitution definition
+ would be indistinguishable from a footnote).
+ The ```/name/``` syntax (g) is reminiscent of "s/find/sub"
+ substitution syntax in ed-like languages. However, it may have a
+ misleading association with regexps, and looks like an absolute
+ POSIX path. (i) is visually equivalent and lacking the
+ connotations.
+ A disadvantage of all of these is that they limit interpreted text,
+ albeit only slightly.
+4. Use specialized syntax, something new::
+ (a) #name#
+ (b) @name@
+ (c) /name/
+ (d) |name|
+ (e) <<name>>
+ (f) //name//
+ (g) ||name||
+ (h) ^name^
+ (i) [[name]]
+ (j) ~name~
+ (k) !name!
+ (l) =name=
+ (m) ?name?
+ (n) >name<
+ "#" (a) and "@" (b) are obtrusive. "/" (c) without backquotes
+ looks just like a POSIX path; it is likely for such usage to appear
+ in text.
+ "|" (d) and "^" (h) are feasible.
+5. Redefine the trailing underscore syntax. See definition syntax
+ alternative 4, below.
+Syntax alternatives for the definition:
+1. Use the existing directive syntax, with a predefined directive such
+ as "sub". It contains a further embedded directive resolving to an
+ inline-compatible object::
+ .. sub:: biohazard
+ .. image:: biohazard.png
+ [height=20 width=20]
+ .. sub:: parrot
+ That bird wouldn't *voom* if you put 10,000,000 volts
+ through it!
+ The advantages and disadvantages are the same as in inline
+ alternative 1.
+2. Use syntax as in #1, but with an embedded directivecompressed::
+ .. sub:: biohazard image:: biohazard.png
+ [height=20 width=20]
+ This is a bit better than alternative 1, but still too much.
+3. Use a variant of directive syntax, incorporating the substitution
+ text, obviating the need for a special "sub" directive name. If we
+ assume reference alternative 4d (vertical bars), the matching
+ definition would look like this::
+ .. |biohazard| image:: biohazard.png
+ [height=20 width=20]
+4. (Suggested by Alan Jaffray on Doc-SIG from 2001-11-06.)
+ Instead of adding new syntax, redefine the trailing underscore
+ syntax to mean "substitution reference" instead of "hyperlink
+ reference". Alan's example::
+ I had lunch with Jonathan_ today. We talked about Zope_.
+ .. _Jonathan: lj [user=jhl]
+ .. _Zope:
+ A problem with the proposed syntax is that URIs which look like
+ simple reference names (alphanum plus ".", "-", "_") would be
+ indistinguishable from substitution directive names. A more
+ consistent syntax would be::
+ I had lunch with Jonathan_ today. We talked about Zope_.
+ .. _Jonathan: lj:: user=jhl
+ .. _Zope:
+ (``::`` after ``.. _Jonathan: lj``.)
+ The "Zope" target is a simple external hyperlink, but the
+ "Jonathan" target contains a directive. Alan proposed is that the
+ reference text be replaced by whatever the referenced directive
+ (the "directive target") produces. A directive reference becomes a
+ hyperlink reference if the contents of the directive target resolve
+ to a hyperlink. If the directive target resolves to an icon, the
+ reference is replaced by an inline icon. If the directive target
+ resolves to a hyperlink, the directive reference becomes a
+ hyperlink reference.
+ This seems too indirect and complicated for easy comprehension.
+ The reference in the text will sometimes become a link, sometimes
+ not. Sometimes the reference text will remain, sometimes not. We
+ don't know *at the reference*::
+ This is a `hyperlink reference`_; its text will remain.
+ This is an `inline icon`_; its text will disappear.
+ That's a problem.
+The syntax that has been incorporated into the spec and parser is
+reference alternative 4d with definition alternative 3::
+ The |biohazard| symbol...
+ .. |biohazard| image:: biohazard.png
+ [height=20 width=20]
+We can also combine substitution references with hyperlink references,
+by appending a "_" (named hyperlink reference) or "__" (anonymous
+hyperlink reference) suffix to the substitution reference. This
+allows us to click on an image-link::
+ The |biohazard|_ symbol...
+ .. |biohazard| image:: biohazard.png
+ [height=20 width=20]
+ .. _biohazard:
+There have been several suggestions for the naming of these
+constructs, originally called "substitution references" and
+1. Candidate names for the reference construct:
+ (a) substitution reference
+ (b) tagging reference
+ (c) inline directive reference
+ (d) directive reference
+ (e) indirect inline directive reference
+ (f) inline directive placeholder
+ (g) inline directive insertion reference
+ (h) directive insertion reference
+ (i) insertion reference
+ (j) directive macro reference
+ (k) macro reference
+ (l) substitution directive reference
+2. Candidate names for the definition construct:
+ (a) substitution
+ (b) substitution directive
+ (c) tag
+ (d) tagged directive
+ (e) directive target
+ (f) inline directive
+ (g) inline directive definition
+ (h) referenced directive
+ (i) indirect directive
+ (j) indirect directive definition
+ (k) directive definition
+ (l) indirect inline directive
+ (m) named directive definition
+ (n) inline directive insertion definition
+ (o) directive insertion definition
+ (p) insertion definition
+ (q) insertion directive
+ (r) substitution definition
+ (s) directive macro definition
+ (t) macro definition
+ (u) substitution directive definition
+ (v) substitution definition
+"Inline directive reference" (1c) seems to be an appropriate term at
+first, but the term "inline" is redundant in the case of the
+reference. Its counterpart "inline directive definition" (2g) is
+awkward, because the directive definition itself is not inline.
+"Directive reference" (1d) and "directive definition" (2k) are too
+vague. "Directive definition" could be used to refer to any
+directive, not just those used for inline substitutions.
+One meaning of the term "macro" (1k, 2s, 2t) is too
+programming-language-specific. Also, macros are typically simple text
+substitution mechanisms: the text is substituted first and evaluated
+later. reStructuredText substitution definitions are evaluated in
+place at parse time and substituted afterwards.
+"Insertion" (1h, 1i, 2n-2q) is almost right, but it implies that
+something new is getting added rather than one construct being
+replaced by another.
+Which brings us back to "substitution". The overall best names are
+"substitution reference" (1a) and "substitution definition" (2v). A
+long way to go to add one word!
+Inline External Targets
+Currently reStructuredText has two hyperlink syntax variations:
+* Named hyperlinks::
+ This is a named reference_ of one word ("reference"). Here is
+ a `phrase reference`_. Phrase references may even cross `line
+ boundaries`_.
+ .. _reference:
+ .. _phrase reference:
+ .. _line boundaries:
+ + Advantages:
+ - The plaintext is readable.
+ - Each target may be reused multiple times (e.g., just write
+ ``"reference_"`` again).
+ - No syncronized ordering of references and targets is necessary.
+ + Disadvantages:
+ - The reference text must be repeated as target names; could lead
+ to mistakes.
+ - The target URLs may be located far from the references, and hard
+ to find in the plaintext.
+* Anonymous hyperlinks (in current reStructuredText)::
+ This is an anonymous reference__. Here is an anonymous
+ `phrase reference`__. Phrase references may even cross `line
+ boundaries`__.
+ __
+ __
+ __
+ + Advantages:
+ - The plaintext is readable.
+ - The reference text does not have to be repeated.
+ + Disadvantages:
+ - References and targets must be kept in sync.
+ - Targets cannot be reused.
+ - The target URLs may be located far from the references.
+For comparison and historical background, StructuredText also has two
+syntaxes for hyperlinks:
+* First, ``"reference text":URL``::
+ This is a "reference":
+ of one word ("reference"). Here is a "phrase
+ reference":
+* Second, ``"reference text",``::
+ This is a "reference",
+ of one word ("reference"). Here is a "phrase reference",
+Both syntaxes share advantages and disadvantages:
++ Advantages:
+ - The target is specified immediately adjacent to the reference.
++ Disadvantages:
+ - Poor plaintext readability.
+ - Targets cannot be reused.
+ - Both syntaxes use double quotes, common in ordinary text.
+ - In the first syntax, the URL and the last word are stuck
+ together, exacerbating the line wrap problem.
+ - The second syntax is too magical; text could easily be written
+ that way by accident (although only absolute URLs are recognized
+ here, perhaps because of the potential for ambiguity).
+A new type of "inline external hyperlink" has been proposed.
+1. On 2002-06-28, Simon Budig proposed__ a new syntax for
+ reStructuredText hyperlinks::
+ This is a reference_( of one
+ word ("reference"). Here is a `phrase
+ reference`_( Are
+ these examples, (single-underscore), named? If so, `anonymous
+ references`__( using two
+ underscores would probably be preferable.
+ __
+ The syntax, advantages, and disadvantages are similar to those of
+ StructuredText.
+ + Advantages:
+ - The target is specified immediately adjacent to the reference.
+ + Disadvantages:
+ - Poor plaintext readability.
+ - Targets cannot be reused (unless named, but the semantics are
+ unclear).
+ + Problems:
+ - The ``"`ref`_(URL)"`` syntax forces the last word of the
+ reference text to be joined to the URL, making a potentially
+ very long word that can't be wrapped (URLs can be very long).
+ The reference and the URL should be separate. This is a
+ symptom of the following point:
+ - The syntax produces a single compound construct made up of two
+ equally important parts, *with syntax in the middle*, *between*
+ the reference and the target. This is unprecedented in
+ reStructuredText.
+ - The "inline hyperlink" text is *not* a named reference (there's
+ no lookup by name), so it shouldn't look like one.
+ - According to the IETF standards RFC 2396 and RFC 2732,
+ parentheses are legal URI characters and curly braces are legal
+ email characters, making their use prohibitively difficult.
+ - The named/anonymous semantics are unclear.
+2. After an analysis__ of the syntax of (1) above, we came up with the
+ following compromise syntax::
+ This is an anonymous reference__
+ __<> of one word
+ ("reference"). Here is a `phrase reference`__
+ __<>. `Named
+ references`_ _<> use single
+ underscores.
+ __
+ The syntax builds on that of the existing "inline internal
+ targets": ``an _`inline internal target`.``
+ + Advantages:
+ - The target is specified immediately adjacent to the reference,
+ improving maintainability:
+ - References and targets are easily kept in sync.
+ - The reference text does not have to be repeated.
+ - The construct is executed in two parts: references identical to
+ existing references, and targets that are new but not too big a
+ stretch from current syntax.
+ - There's overwhelming precedent for quoting URLs with angle
+ brackets [#]_.
+ + Disadvantages:
+ - Poor plaintext readability.
+ - Lots of "line noise".
+ - Targets cannot be reused (unless named; see below).
+ To alleviate the readability issue slightly, we could allow the
+ target to appear later, such as after the end of the sentence::
+ This is a named reference__ of one word ("reference").
+ __<> Here is a `phrase
+ reference`__. __<>
+ Problem: this could only work for one reference at a time
+ (reference/target pairs must be proximate [refA trgA refB trgB],
+ not interleaved [refA refB trgA trgB] or nested [refA refB trgB
+ trgA]). This variation is too problematic; references and inline
+ external targets will have to be kept imediately adjacent (see (3)
+ below).
+ The ``"reference__ __<target>"`` syntax is actually for "anonymous
+ inline external targets", emphasized by the double underscores. It
+ follows that single trailing and leading underscores would lead to
+ *implicitly named* inline external targets. This would allow the
+ reuse of targets by name. So after ``"reference_ _<target>"``,
+ another ``"reference_"`` would point to the same target.
+ .. [#]
+ From RFC 2396 (URI syntax):
+ The angle-bracket "<" and ">" and double-quote (")
+ characters are excluded [from URIs] because they are often
+ used as the delimiters around URI in text documents and
+ protocol fields.
+ Using <> angle brackets around each URI is especially
+ recommended as a delimiting style for URI that contain
+ whitespace.
+ From RFC 822 (email headers):
+ Angle brackets ("<" and ">") are generally used to indicate
+ the presence of a one machine-usable reference (e.g.,
+ delimiting mailboxes), possibly including source-routing to
+ the machine.
+3. If it is best for references and inline external targets to be
+ immediately adjacent, then they might as well be integrated.
+ Here's an alternative syntax embedding the target URL in the
+ reference::
+ This is an anonymous `reference <
+ /reference/>`__ of one word ("reference"). Here is a `phrase
+ reference <>`__.
+ Advantages and disadvantages are similar to those in (2).
+ Readability is still an issue, but the syntax is a bit less
+ heavyweight (reduced line noise). Backquotes are required, even
+ for one-word references; the target URL is included within the
+ reference text, forcing a phrase context.
+ We'll call this variant "embedded URIs".
+ Problem: how to refer to a title like "HTML Anchors: <a>" (which
+ ends with an HTML/SGML/XML tag)? We could either require more
+ syntax on the target (like ``"`reference text
+ __<>`__"``), or require the odd conflicting
+ title to be escaped (like ``"`HTML Anchors: \<a>`__"``). The
+ latter seems preferable, and not too onerous.
+ Similarly to (2) above, a single trailing underscore would convert
+ the reference & inline external target from anonymous to implicitly
+ named, allowing reuse of targets by name.
+ I think this is the least objectionable of the syntax alternatives.
+Other syntax variations have been proposed (by Brett Cannon and Benja
+ `phrase reference`->
+ `phrase reference`@
+ `phrase reference`__ ->
+ `phrase reference` [->]
+ `phrase reference`__ [->]
+ `phrase reference` <>_
+None of these variations are clearly superior to #3 above. Some have
+problems that exclude their use.
+With any kind of inline external target syntax it comes down to the
+conflict between maintainability and plaintext readability. I don't
+see a major problem with reStructuredText's maintainability, and I
+don't want to sacrifice plaintext readability to "improve" it.
+The proponents of inline external targets want them for easily
+maintainable web pages. The arguments go something like this:
+- Named hyperlinks are difficult to maintain because the reference
+ text is duplicated as the target name.
+ To which I said, "So use anonymous hyperlinks."
+- Anonymous hyperlinks are difficult to maintain becuase the
+ references and targets have to be kept in sync.
+ "So keep the targets close to the references, grouped after each
+ paragraph. Maintenance is trivial."
+- But targets grouped after paragraphs break the flow of text.
+ "Surely less than URLs embedded in the text! And if the intent is
+ to produce web pages, not readable plaintext, then who cares about
+ the flow of text?"
+Many participants have voiced their objections to the proposed syntax:
+ Garth Kidd: "I strongly prefer the current way of doing it.
+ Inline is spectactularly messy, IMHO."
+ Tony Ibbs: "I vehemently agree... that the inline alternatives
+ being suggested look messy - there are/were good reasons they've
+ been taken out... I don't believe I would gain from the new
+ syntaxes."
+ Paul Moore: "I agree as well. The proposed syntax is far too
+ punctuation-heavy, and any of the alternatives discussed are
+ ambiguous or too subtle."
+Others have voiced their support:
+ fantasai: "I agree with Simon. In many cases, though certainly
+ not in all, I find parenthesizing the url in plain text flows
+ better than relegating it to a footnote."
+ Ken Manheimer: "I'd like to weigh in requesting some kind of easy,
+ direct inline reference link."
+(Interesting that those *against* the proposal have been using
+reStructuredText for a while, and those *for* the proposal are either
+new to the list ["fantasai", background unknown] or longtime
+StructuredText users [Ken Manheimer].)
+I was initially ambivalent/against the proposed "inline external
+targets". I value reStructuredText's readability very highly, and
+although the proposed syntax offers convenience, I don't know if the
+convenience is worth the cost in ugliness. Does the proposed syntax
+compromise readability too much, or should the choice be left up to
+the author? Perhaps if the syntax is *allowed* but its use strongly
+*discouraged*, for aesthetic/readability reasons?
+After a great deal of thought and much input from users, I've decided
+that there are reasonable use cases for this construct. The
+documentation should strongly caution against its use in most
+situations, recommending independent block-level targets instead.
+Syntax #3 above ("embedded URIs") will be used.
+Doctree Representation of Transitions
+(Although not reStructuredText-specific, this section fits best in
+this document.)
+Having added the "horizontal rule" construct to the `reStructuredText
+Markup Specification`_, a decision had to be made as to how to reflect
+the construct in the implementation of the document tree. Given this
+ Document
+ ========
+ Paragraph 1
+ --------
+ Paragraph 2
+The horizontal rule indicates a "transition" (in prose terms) or the
+start of a new "division". Before implementation, the parsed document
+tree would be::
+ <document>
+ <section names="document">
+ <title>
+ Document
+ <paragraph>
+ Paragraph 1
+ -------- <--- error here
+ <paragraph>
+ Paragraph 2
+There are several possibilities for the implementation:
+1. Implement horizontal rules as "divisions" or segments. A
+ "division" is a title-less, non-hierarchical section. The first
+ try at an implementation looked like this::
+ <document>
+ <section names="document">
+ <title>
+ Document
+ <paragraph>
+ Paragraph 1
+ <division>
+ <paragraph>
+ Paragraph 2
+ But the two paragraphs are really at the same level; they shouldn't
+ appear to be at different levels. There's really an invisible
+ "first division". The horizontal rule splits the document body
+ into two segments, which should be treated uniformly.
+2. Treating "divisions" uniformly brings us to the second
+ possibility::
+ <document>
+ <section names="document">
+ <title>
+ Document
+ <division>
+ <paragraph>
+ Paragraph 1
+ <division>
+ <paragraph>
+ Paragraph 2
+ With this change, documents and sections will directly contain
+ divisions and sections, but not body elements. Only divisions will
+ directly contain body elements. Even without a horizontal rule
+ anywhere, the body elements of a document or section would be
+ contained within a division element. This makes the document tree
+ deeper. This is similar to the way HTML_ treats document contents:
+ grouped within a ``<body>`` element.
+3. Implement them as "transitions", empty elements::
+ <document>
+ <section names="document">
+ <title>
+ Document
+ <paragraph>
+ Paragraph 1
+ <transition>
+ <paragraph>
+ Paragraph 2
+ A transition would be a "point element", not containing anything,
+ only identifying a point within the document structure. This keeps
+ the document tree flatter, but the idea of a "point element" like
+ "transition" smells bad. A transition isn't a thing itself, it's
+ the space between two divisions. However, transitions are a
+ practical solution.
+Solution 3 was chosen for incorporation into the document tree model.
+.. _HTML:
+Syntax for Line Blocks
+* An early idea: How about a literal-block-like prefix, perhaps
+ "``;;``"? (It is, after all, a *semi-literal* literal block, no?)
+ Example::
+ Take it away, Eric the Orchestra Leader! ;;
+ A one, two, a one two three four
+ Half a bee, philosophically,
+ must, *ipso facto*, half not be.
+ But half the bee has got to be,
+ *vis a vis* its entity. D'you see?
+ But can a bee be said to be
+ or not to be an entire bee,
+ when half the bee is not a bee,
+ due to some ancient injury?
+ Singing...
+ Kinda lame.
+* Another idea: in an ordinary paragraph, if the first line ends with
+ a backslash (escaping the newline), interpret the entire paragraph
+ as a verse block? For example::
+ Add just one backslash\
+ And this paragraph becomes
+ An awful haiku
+ (Awful, and arguably invalid, since in Japanese the word "haiku"
+ contains three syllables not two.)
+ This idea was superceded by the rules for escaped whitespace, useful
+ for `character-level inline markup`_.
+* In a `2004-02-22 docutils-develop message`__, Jarno Elonen proposed
+ a "plain list" syntax (and also provided a patch)::
+ | John Doe
+ | President, SuperDuper Corp.
+ |
+ __
+ This syntax is very natural. However, these "plain lists" seem very
+ similar to line blocks, and I see so little intrinsic "list-ness"
+ that I'm loathe to add a new object. I used the term "blurbs" to
+ remove the "list" connotation from the originally proposed name.
+ Perhaps line blocks could be refined to add the two properties they
+ currently lack:
+ A) long lines wrap nicely
+ B) HTML output doesn't look like program code in non-CSS web
+ browsers
+ (A) is an issue of all 3 aspects of Docutils: syntax (construct
+ behaviour), internal representation, and output. (B) is partly an
+ issue of internal representation but mostly of output.
+ReStructuredText will redefine line blocks with the "|"-quoting
+syntax. The following is my current thinking.
+Perhaps line block syntax like this would do::
+ | M6: James Bond
+ | MIB: Mr. J.
+ | IMF: not decided yet, but probably one of the following:
+ | Ethan Hunt
+ | Jim Phelps
+ | Claire Phelps
+ | CIA: Felix Leiter
+Note that the "nested" list does not have nested syntax (the "|" are
+not further indented); the leading whitespace would still be
+significant somehow (more below). As for long lines in the input,
+this could suffice::
+ | John Doe
+ | Founder, President, Chief Executive Officer, Cook, Bottle
+ Washer, and All-Round Great Guy
+ | SuperDuper Corp.
+ |
+The lack of "|" on the third line indicates that it's a continuation
+of the second line, wrapped.
+I don't see much point in allowing arbitrary nested content. Multiple
+paragraphs or bullet lists inside a "blurb" doesn't make sense to me.
+Simple nested line blocks should suffice.
+Internal Representation
+Line blocks are currently represented as text blobs as follows::
+ <!ELEMENT line_block %text.model;>
+ <!ATTLIST line_block
+ %basic.atts;
+ %fixedspace.att;>
+Instead, we could represent each line by a separate element::
+ <!ELEMENT line_block (line+)>
+ <!ATTLIST line_block %basic.atts;>
+ <!ELEMENT line %text.model;>
+ <!ATTLIST line %basic.atts;>
+We'd keep the significance of the leading whitespace of each line
+either by converting it to non-breaking spaces at output, or with a
+per-line margin. Non-breaking spaces are simpler (for HTML, anyway)
+but kludgey, and wouldn't support indented long lines that wrap. But
+should inter-word whitespace (i.e., not leading whitespace) be
+preserved? Currently it is preserved in line blocks.
+Representing a more complex line block may be tricky::
+ | But can a bee be said to be
+ | or not to be an entire bee,
+ | when half the bee is not a bee,
+ | due to some ancient injury?
+Perhaps the representation could allow for nested line blocks::
+ <!ELEMENT line_block (line | line_block)+>
+With this model, leading whitespace would no longer be significant.
+Instead, left margins are implied by the nesting. The example above
+could be represented as follows::
+ <line_block>
+ <line>
+ But can a bee be said to be
+ <line_block>
+ <line>
+ or not to be an entire bee,
+ <line_block>
+ <line>
+ when half the bee is not a bee,
+ <line_block>
+ <line>
+ due to some ancient injury?
+I wasn't sure what to do about even more complex line blocks::
+ | Indented
+ | Not indented
+ | Indented a bit
+ | A bit more
+ | Only one space
+How should that be parsed and nested? Should the first line have
+the same nesting level (== indentation in the output) as the fourth
+line, or the same as the last line? Mark Nodine suggested that such
+line blocks be parsed similarly to complexly-nested block quotes,
+which seems reasonable. In the example above, this would result in
+the nesting of first line matching the last line's nesting. In
+other words, the nesting would be relative to neighboring lines
+In HTML, line blocks are currently output as "<pre>" blocks, which
+gives us significant whitespace and line breaks, but doesn't allow
+long lines to wrap and causes monospaced output without stylesheets.
+Instead, we could output "<div>" elements parallelling the
+representation above, where each nested <div class="line_block"> would
+have an increased left margin (specified in the stylesheet).
+Jarno suggested the following HTML output::
+ <div class="line_block">
+ <span class="line">First, top level line</span><br class="hidden"/>
+ <div class="line_block"><span class="hidden">&nbsp;</span>
+ <span class="line">Second, once nested</span><br class="hidden"/>
+ <span class="line">Third, once nested</span><br class="hidden"/>
+ ...
+ </div>
+ ...
+ </div>
+The ``<br class="hidden" />`` and ``<span
+class="hidden">&nbsp;</span>`` are meant to support non-CSS and
+non-graphical browsers. I understand the case for "br", but I'm not
+so sure about hidden "&nbsp;". I question how much effort should be
+put toward supporting non-graphical and especially non-CSS browsers,
+at least for output.
+Should the lines themselves be ``<span>`` or ``<div>``? I don't like
+mixing inline and block-level elements.
+Implementation Plan
+We'll leave the old implementation in place (via the "line-block"
+directive only) until all Writers have been updated to support the new
+syntax & implementation. The "line-block" directive can then be
+updated to use the new internal representation, and its documentation
+will be updated to recommend the new syntax.
+List-Driven Tables
+The original idea came from Dylan Jay:
+ ... to use a two level bulleted list with something to
+ indicate it should be rendered as a table ...
+It's an interesting idea. It could be implemented in as a directive
+which transforms a uniform two-level list into a table. Using a
+directive would allow the author to explicitly set the table's
+orientation (by column or by row), the presence of row headers, etc.
+1. (Implemented in Docutils 0.3.8).
+ Bullet-list-tables might look like this::
+ .. list-table::
+ * - Treat
+ - Quantity
+ - Description
+ * - Albatross!
+ - 299
+ - On a stick!
+ * - Crunchy Frog!
+ - 1499
+ - If we took the bones out, it wouldn't be crunchy,
+ now would it?
+ * - Gannet Ripple!
+ - 199
+ - On a stick!
+ This list must be written in two levels. This wouldn't work::
+ .. list-table::
+ * Treat
+ * Albatross!
+ * Gannet!
+ * Crunchy Frog!
+ * Quantity
+ * 299
+ * 199
+ * 1499
+ * Description
+ * On a stick!
+ * On a stick!
+ * If we took the bones out...
+ The above is a single list of 12 items. The blank lines are not
+ significant to the markup. We'd have to explicitly specify how
+ many columns or rows to use, which isn't a good idea.
+2. Beni Cherniavsky suggested a field list alternative. It could look
+ like this::
+ .. field-list-table::
+ :headrows: 1
+ - :treat: Treat
+ :quantity: Quantity
+ :descr: Description
+ - :treat: Albatross!
+ :quantity: 299
+ :descr: On a stick!
+ - :treat: Crunchy Frog!
+ :quantity: 1499
+ :descr: If we took the bones out, it wouldn't be
+ crunchy, now would it?
+ Column order is determined from the order of fields in the first
+ row. Field order in all other rows is ignored. As a side-effect,
+ this allows trivial re-arrangement of columns. By using named
+ fields, it becomes possible to omit fields in some rows without
+ losing track of things, which is important for spans.
+3. An alternative to two-level bullet lists would be to use enumerated
+ lists for the table cells::
+ .. list-table::
+ * 1. Treat
+ 2. Quantity
+ 3. Description
+ * 1. Albatross!
+ 2. 299
+ 3. On a stick!
+ * 1. Crunchy Frog!
+ 2. 1499
+ 3. If we took the bones out, it wouldn't be crunchy,
+ now would it?
+ That provides better correspondence between cells in the same
+ column than does bullet-list syntax, but not as good as field list
+ syntax. I think that were only field-list-tables available, a lot
+ of users would use the equivalent degenerate case::
+ .. field-list-table::
+ - :1: Treat
+ :2: Quantity
+ :3: Description
+ ...
+4. Another natural variant is to allow a description list with field
+ lists as descriptions::
+ .. list-table::
+ :headrows: 1
+ Treat
+ :quantity: Quantity
+ :descr: Description
+ Albatross!
+ :quantity: 299
+ :descr: On a stick!
+ Crunchy Frog!
+ :quantity: 1499
+ :descr: If we took the bones out, it wouldn't be
+ crunchy, now would it?
+ This would make the whole first column a header column ("stub").
+ It's limited to a single column and a single paragraph fitting on
+ one source line. Also it wouldn't allow for empty cells or row
+ spans in the first column. But these are limitations that we could
+ live with, like those of simple tables.
+The List-driven table feature could be done in many ways. Each user
+will have their preferred usage. Perhaps a single "list-table"
+directive could handle them all, depending on which options and
+content are present.
+* How to indicate that there's 1 header row? Perhaps two lists? ::
+ .. list-table::
+ + - Treat
+ - Quantity
+ - Description
+ * - Albatross!
+ - 299
+ - On a stick!
+ This is probably too subtle though. Better would be a directive
+ option, like ``:headrows: 1``. An early suggestion for the header
+ row(s) was to use a directive option::
+ .. field-list-table::
+ :header:
+ - :treat: Treat
+ :quantity: Quantity
+ :descr: Description
+ - :treat: Albatross!
+ :quantity: 299
+ :descr: On a stick!
+ But the table data is at two levels and looks inconsistent.
+ In general, we cannot extract the header row from field lists' field
+ names because field names cannot contain everything one might put in
+ a table cell. A separate header row also allows shorter field names
+ and doesn't force one to rewrite the whole table when the header
+ text changes. But for simpler cases, we can offer a ":header:
+ fields" option, which does extract header cells from field names::
+ .. field-list-table::
+ :header: fields
+ - :Treat: Albatross!
+ :Quantity: 299
+ :Description: On a stick!
+* How to indicate the column widths? A directive option? ::
+ .. list-table::
+ :widths: 15 10 35
+ Automatic defaults from the text used?
+* How to handle row and/or column spans?
+ In a field list, column-spans can be indicated by specifying the
+ first and last fields, separated by space-dash-space or ellipsis::
+ - :foo - baz: quuux
+ - :foo ... baz: quuux
+ Commas were proposed for column spans::
+ - :foo, bar: quux
+ But non-adjacent columns become problematic. Should we report an
+ error, or duplicate the value into each span of adjacent columns (as
+ was suggested)? The latter suggestion is appealing but may be too
+ clever. Best perhaps to simply specify the two ends.
+ It was suggested that comma syntax should be allowed, too, in order
+ to allow the user to avoid trouble when changing the column order.
+ But changing the column order of a table with spans is not trivial;
+ we shouldn't make it easier to mess up.
+ One possible syntax for row-spans is to simply treat any row where a
+ field is missing as a row-span from the last row where it appeared.
+ Leaving a field empty would still be possible by writing a field
+ with empty content. But this is too implicit.
+ Another way would be to require an explicit continuation marker
+ (``...``/``-"-``/``"``?) in all but the first row of a spanned
+ field. Empty comments could work (".."). If implemented, the same
+ marker could also be supported in simple tables, which lack
+ row-spanning abilities.
+ Explicit markup like ":rowspan:" and ":colspan:" was also suggested.
+ Sometimes in a table, the first header row contains spans. It may
+ be necessary to provide a way to specify the column field names
+ independently of data rows. A directive option would do it.
+* We could specify "column-wise" or "row-wise" ordering, with the same
+ markup structure. For example, with definition data::
+ .. list-table::
+ :column-wise:
+ Treat
+ - Albatross!
+ - Crunchy Frog!
+ Quantity
+ - 299
+ - 1499
+ Description
+ - On a stick!
+ - If we took the bones out, it wouldn't be
+ crunchy, now would it?
+* A syntax for _`stubs in grid tables` is easy to imagine::
+ +------------------------++------------+----------+
+ | Header row, column 1 || Header 2 | Header 3 |
+ +========================++============+==========+
+ | body row 1, column 1 || column 2 | column 3 |
+ +------------------------++------------+----------+
+ Or this idea from Nick Moffitt::
+ +-----+---+---+
+ | XOR # T | F |
+ +=====+===+===+
+ | T # F | T |
+ +-----+---+---+
+ | F # T | F |
+ +-----+---+---+
+Auto-Enumerated Lists
+Implemented 2005-03-24: combination of variation 1 & 2.
+The advantage of auto-numbered enumerated lists would be similar to
+that of auto-numbered footnotes: lists could be written and rearranged
+without having to manually renumber them. The disadvantages are also
+the same: input and output wouldn't match exactly; the markup may be
+ugly or confusing (depending on which alternative is chosen).
+1. Use the "#" symbol. Example::
+ #. Item 1.
+ #. Item 2.
+ #. Item 3.
+ Advantages: simple, explicit. Disadvantage: enumeration sequence
+ cannot be specified (limited to arabic numerals); ugly.
+2. As a variation on #1, first initialize the enumeration sequence?
+ For example::
+ a) Item a.
+ #) Item b.
+ #) Item c.
+ Advantages: simple, explicit, any enumeration sequence possible.
+ Disadvantages: ugly; perhaps confusing with mixed concrete/abstract
+ enumerators.
+3. Alternative suggested by Fred Bremmer, from experience with MoinMoin::
+ 1. Item 1.
+ 1. Item 2.
+ 1. Item 3.
+ Advantages: enumeration sequence is explicit (could be multiple
+ "a." or "(I)" tokens). Disadvantages: perhaps confusing; otherwise
+ erroneous input (e.g., a duplicate item "1.") would pass silently,
+ either causing a problem later in the list (if no blank lines
+ between items) or creating two lists (with blanks).
+ Take this input for example::
+ 1. Item 1.
+ 1. Unintentional duplicate of item 1.
+ 2. Item 2.
+ Currently the parser will produce two list, "1" and "1,2" (no
+ warnings, because of the presence of blank lines). Using Fred's
+ notation, the current behavior is "1,1,2 -> 1 1,2" (without blank
+ lines between items, it would be "1,1,2 -> 1 [WARNING] 1,2"). What
+ should the behavior be with auto-numbering?
+ Fred has produced a patch__, whose initial behavior is as follows::
+ 1,1,1 -> 1,2,3
+ 1,2,2 -> 1,2,3
+ 3,3,3 -> 3,4,5
+ 1,2,2,3 -> 1,2,3 [WARNING] 3
+ 1,1,2 -> 1,2 [WARNING] 2
+ (After the "[WARNING]", the "3" would begin a new list.)
+ I have mixed feelings about adding this functionality to the spec &
+ parser. It would certainly be useful to some users (myself
+ included; I often have to renumber lists). Perhaps it's too
+ clever, asking the parser to guess too much. What if you *do* want
+ three one-item lists in a row, each beginning with "1."? You'd
+ have to use empty comments to force breaks. Also, I question
+ whether "1,2,2 -> 1,2,3" is optimal behavior.
+ In response, Fred came up with "a stricter and more explicit rule
+ [which] would be to only auto-number silently if *all* the
+ enumerators of a list were identical". In that case::
+ 1,1,1 -> 1,2,3
+ 1,2,2 -> 1,2 [WARNING] 2
+ 3,3,3 -> 3,4,5
+ 1,2,2,3 -> 1,2 [WARNING] 2,3
+ 1,1,2 -> 1,2 [WARNING] 2
+ Should any start-value be allowed ("3,3,3"), or should
+ auto-numbered lists be limited to begin with ordinal-1 ("1", "A",
+ "a", "I", or "i")?
+ __
+ &group_id=38414&atid=422032
+4. Alternative proposed by Tony Ibbs::
+ #1. First item.
+ #3. Aha - I edited this in later.
+ #2. Second item.
+ The initial proposal required unique enumerators within a list, but
+ this limits the convenience of a feature of already limited
+ applicability and convenience. Not a useful requirement; dropped.
+ Instead, simply prepend a "#" to a standard list enumerator to
+ indicate auto-enumeration. The numbers (or letters) of the
+ enumerators themselves are not significant, except:
+ - as a sequence indicator (arabic, roman, alphabetic; upper/lower),
+ - and perhaps as a start value (first list item).
+ Advantages: explicit, any enumeration sequence possible.
+ Disadvantages: a bit ugly.
+ Not Implemented
+Reworking Footnotes
+As a further wrinkle (see `Reworking Explicit Markup (Round 1)`_
+above), in the wee hours of 2002-02-28 I posted several ideas for
+changes to footnote syntax:
+ - Change footnote syntax from ``.. [1]`` to ``_[1]``? ...
+ - Differentiate (with new DTD elements) author-date "citations"
+ (``[GVR2002]``) from numbered footnotes? ...
+ - Render footnote references as superscripts without "[]"? ...
+These ideas are all related, and suggest changes in the
+reStructuredText syntax as well as the docutils tree model.
+The footnote has been used for both true footnotes (asides expanding
+on points or defining terms) and for citations (references to external
+works). Rather than dealing with one amalgam construct, we could
+separate the current footnote concept into strict footnotes and
+citations. Citations could be interpreted and treated differently
+from footnotes. Footnotes would be limited to numerical labels:
+manual ("1") and auto-numbered (anonymous "#", named "#label").
+The footnote is the only explicit markup construct (starts with ".. ")
+that directly translates to a visible body element. I've always been
+a little bit uncomfortable with the ".. " marker for footnotes because
+of this; ".. " has a connotation of "special", but footnotes aren't
+especially "special". Printed texts often put footnotes at the bottom
+of the page where the reference occurs (thus "foot note"). Some HTML
+designs would leave footnotes to be rendered the same positions where
+they're defined. Other online and printed designs will gather
+footnotes into a section near the end of the document, converting them
+to "endnotes" (perhaps using a directive in our case); but this
+"special processing" is not an intrinsic property of the footnote
+itself, but a decision made by the document author or processing
+Citations are almost invariably collected in a section at the end of a
+document or section. Citations "disappear" from where they are
+defined and are magically reinserted at some well-defined point.
+There's more of a connection to the "special" connotation of the ".. "
+syntax. The point at which the list of citations is inserted could be
+defined manually by a directive (e.g., ".. citations::"), and/or have
+default behavior (e.g., a section automatically inserted at the end of
+the document) that might be influenced by options to the Writer.
+Syntax proposals:
++ Footnotes:
+ - Current syntax::
+ .. [1] Footnote 1
+ .. [#] Auto-numbered footnote.
+ .. [#label] Auto-labeled footnote.
+ - The syntax proposed in the original 2002-02-28 Doc-SIG post:
+ remove the ".. ", prefix a "_"::
+ _[1] Footnote 1
+ _[#] Auto-numbered footnote.
+ _[#label] Auto-labeled footnote.
+ The leading underscore syntax (earlier dropped because
+ ``.. _[1]:`` was too verbose) is a useful reminder that footnotes
+ are hyperlink targets.
+ - Minimal syntax: remove the ".. [" and "]", prefix a "_", and
+ suffix a "."::
+ _1. Footnote 1.
+ _#. Auto-numbered footnote.
+ _#label. Auto-labeled footnote.
+ ``_1.``, ``_#.``, and ``_#label.`` are markers,
+ like list markers.
+ Footnotes could be rendered something like this in HTML
+ | 1. This is a footnote. The brackets could be dropped
+ | from the label, and a vertical bar could set them
+ | off from the rest of the document in the HTML.
+ Two-way hyperlinks on the footnote marker ("1." above) would also
+ help to differentiate footnotes from enumerated lists.
+ If converted to endnotes (by a directive/transform), a horizontal
+ half-line might be used instead. Page-oriented output formats
+ would typically use the horizontal line for true footnotes.
++ Footnote references:
+ - Current syntax::
+ [1]_, [#]_, [#label]_
+ - Minimal syntax to match the minimal footnote syntax above::
+ 1_, #_, #label_
+ As a consequence, pure-numeric hyperlink references would not be
+ possible; they'd be interpreted as footnote references.
++ Citation references: no change is proposed from the current footnote
+ reference syntax::
+ [GVR2001]_
++ Citations:
+ - Current syntax (footnote syntax)::
+ .. [GVR2001] Python Documentation; van Rossum, Drake, et al.;
+ - Possible new syntax::
+ _[GVR2001] Python Documentation; van Rossum, Drake, et al.;
+ _[DJG2002]
+ Docutils: Python Documentation Utilities project; Goodger
+ et al.;
+ Without the ".. " marker, subsequent lines would either have to
+ align as in one of the above, or we'd have to allow loose
+ alignment (I'd rather not)::
+ _[GVR2001] Python Documentation; van Rossum, Drake, et al.;
+I proposed adopting the "minimal" syntax for footnotes and footnote
+references, and adding citations and citation references to
+reStructuredText's repertoire. The current footnote syntax for
+citations is better than the alternatives given.
+From a reply by Tony Ibbs on 2002-03-01:
+ However, I think easier with examples, so let's create one::
+ Fans of Terry Pratchett are perhaps more likely to use
+ footnotes [1]_ in their own writings than other people
+ [2]_. Of course, in *general*, one only sees footnotes
+ in academic or technical writing - it's use in fiction
+ and letter writing is not normally considered good
+ style [4]_, particularly in emails (not a medium that
+ lends itself to footnotes).
+ .. [1] That is, little bits of referenced text at the
+ bottom of the page.
+ .. [2] Because Terry himself does, of course [3]_.
+ .. [3] Although he has the distinction of being
+ *funny* when he does it, and his fans don't always
+ achieve that aim.
+ .. [4] Presumably because it detracts from linear
+ reading of the text - this is, of course, the point.
+ and look at it with the second syntax proposal::
+ Fans of Terry Pratchett are perhaps more likely to use
+ footnotes [1]_ in their own writings than other people
+ [2]_. Of course, in *general*, one only sees footnotes
+ in academic or technical writing - it's use in fiction
+ and letter writing is not normally considered good
+ style [4]_, particularly in emails (not a medium that
+ lends itself to footnotes).
+ _[1] That is, little bits of referenced text at the
+ bottom of the page.
+ _[2] Because Terry himself does, of course [3]_.
+ _[3] Although he has the distinction of being
+ *funny* when he does it, and his fans don't always
+ achieve that aim.
+ _[4] Presumably because it detracts from linear
+ reading of the text - this is, of course, the point.
+ (I note here that if I have gotten the indentation of the
+ footnotes themselves correct, this is clearly not as nice. And if
+ the indentation should be to the left margin instead, I like that
+ even less).
+ and the third (new) proposal::
+ Fans of Terry Pratchett are perhaps more likely to use
+ footnotes 1_ in their own writings than other people
+ 2_. Of course, in *general*, one only sees footnotes
+ in academic or technical writing - it's use in fiction
+ and letter writing is not normally considered good
+ style 4_, particularly in emails (not a medium that
+ lends itself to footnotes).
+ _1. That is, little bits of referenced text at the
+ bottom of the page.
+ _2. Because Terry himself does, of course 3_.
+ _3. Although he has the distinction of being
+ *funny* when he does it, and his fans don't always
+ achieve that aim.
+ _4. Presumably because it detracts from linear
+ reading of the text - this is, of course, the point.
+ I think I don't, in practice, mind the targets too much (the use
+ of a dot after the number helps a lot here), but I do have a
+ problem with the body text, in that I don't naturally separate out
+ the footnotes as different than the rest of the text - instead I
+ keep wondering why there are numbers interspered in the text. The
+ use of brackets around the numbers ([ and ]) made me somehow parse
+ the footnote references as "odd" - i.e., not part of the body text
+ - and thus both easier to skip, and also (paradoxically) easier to
+ pick out so that I could follow them.
+ Thus, for the moment (and as always susceptable to argument), I'd
+ say -1 on the new form of footnote reference (i.e., I much prefer
+ the existing ``[1]_`` over the proposed ``1_``), and ambivalent
+ over the proposed target change.
+ That leaves David's problem of wanting to distinguish footnotes
+ and citations - and the only thing I can propose there is that
+ footnotes are numeric or # and citations are not (which, as a
+ human being, I can probably cope with!).
+From a reply by Paul Moore on 2002-03-01:
+ I think the current footnote syntax ``[1]_`` is *exactly* the
+ right balance of distinctness vs unobtrusiveness. I very
+ definitely don't think this should change.
+ On the target change, it doesn't matter much to me.
+From a further reply by Tony Ibbs on 2002-03-01, referring to the
+"[1]" form and actual usage in email:
+ Clearly this is a form people are used to, and thus we should
+ consider it strongly (in the same way that the usage of ``*..*``
+ to mean emphasis was taken partly from email practise).
+ Equally clearly, there is something "magical" for people in the
+ use of a similar form (i.e., ``[1]``) for both footnote reference
+ and footnote target - it seems natural to keep them similar.
+ ...
+ I think that this established plaintext usage leads me to strongly
+ believe we should retain square brackets at both ends of a
+ footnote. The markup of the reference end (a single trailing
+ underscore) seems about as minimal as we can get away with. The
+ markup of the target end depends on how one envisages the thing -
+ if ".." means "I am a target" (as I tend to see it), then that's
+ good, but one can also argue that the "_[1]" syntax has a neat
+ symmetry with the footnote reference itself, if one wishes (in
+ which case ".." presumably means "hidden/special" as David seems
+ to think, which is why one needs a ".." *and* a leading underline
+ for hyperlink targets.
+Given the persuading arguments voiced, we'll leave footnote & footnote
+reference syntax alone. Except that these discussions gave rise to
+the "auto-symbol footnote" concept, which has been added. Citations
+and citation references have also been added.
+Syntax for Questions & Answers
+Implement as a generic two-column marked list? As a standalone
+(non-directive) construct? (Is the markup ambiguous?) Add support to
+New elements would be required. Perhaps::
+ <!ELEMENT question_list (question_list_item+)>
+ <!ATTLIST question_list
+ numbering (none | local | global)
+ <!ELEMENT question_list_item (question, answer*)>
+ <!ELEMENT question %text.model;>
+ <!ELEMENT answer (%body.elements;)+>
+Originally I thought of implementing a Q&A list with special syntax::
+ Q: What am I?
+ A: You are a question-and-answer
+ list.
+ Q: What are you?
+ A: I am the omniscient "we".
+Where each "Q" and "A" could also be numbered (e.g., "Q1"). However,
+a simple enumerated or bulleted list will do just fine for syntax. A
+directive could treat the list specially; e.g. the first paragraph
+could be treated as a question, the remainder as the answer (multiple
+answers could be represented by nested lists). Without special
+syntax, this directive becomes low priority.
+As described in the FAQ__, no special syntax or directive is needed
+for this application.
+ #how-can-i-mark-up-a-faq-or-other-list-of-questions-answers
+ Tabled
+Reworking Explicit Markup (Round 2)
+See `Reworking Explicit Markup (Round 1)`_ for an earlier discussion.
+In April 2004, a new thread becan on docutils-develop: `Inconsistency
+in RST markup`__. Several arguments were made; the first argument
+begat later arguments. Below, the arguments are paraphrased "in
+quotes", with responses.
+1. References and targets take this form::
+ targetname_
+ .. _targetname: stuff
+ But footnotes, "which generate links just like targets do", are
+ written as::
+ [1]_
+ .. [1] stuff
+ "Footnotes should be written as"::
+ [1]_
+ .. _[1]: stuff
+ But they're not the same type of animal. That's not a "footnote
+ target", it's a *footnote*. Being a target is not a footnote's
+ primary purpose (an arguable point). It just happens to grow a
+ target automatically, for convenience. Just as a section title::
+ Title
+ =====
+ isn't a "title target", it's a *title*, which happens to grow a
+ target automatically. The consistency is there, it's just deeper
+ than at first glance.
+ Also, ".. [1]" was chosen for footnote syntax because it closely
+ resembles one form of actual footnote rendering. ".. _[1]:" is too
+ verbose; excessive punctuation is required to get the job done.
+ For more of the reasoning behind the syntax, see `Problems With
+ StructuredText (Hyperlinks) <problems.html#hyperlinks>`__ and
+ `Reworking Footnotes`_.
+2. "I expect directives to also look like ``.. this:`` [one colon]
+ because that also closely parallels the link and footnote target
+ markup."
+ There are good reasons for the two-colon syntax:
+ Two colons are used after the directive type for these reasons:
+ - Two colons are distinctive, and unlikely to be used in common
+ text.
+ - Two colons avoids clashes with common comment text like::
+ .. Danger: modify at your own risk!
+ - If an implementation of reStructuredText does not recognize a
+ directive (i.e., the directive-handler is not installed), a
+ level-3 (error) system message is generated, and the entire
+ directive block (including the directive itself) will be
+ included as a literal block. Thus "::" is a natural choice.
+ -- `restructuredtext.html#directives
+ <../../ref/rst/restructuredtext.html#directives>`__
+ The last reason is not particularly compelling; it's more of a
+ convenient coincidence or mnemonic.
+3. "Comments always seemed too easy. I almost never write comments.
+ I'd have no problem writing '.. comment:' in front of my comments.
+ In fact, it would probably be more readable, as comments *should*
+ be set off strongly, because they are very different from normal
+ text."
+ Many people do use comments though, and some applications of
+ reStructuredText require it. For example, all reStructuredText
+ PEPs (and this document!) have an Emacs stanza at the bottom, in a
+ comment. Having to write ".. comment::" would be very obtrusive.
+ Comments *should* be dirt-easy to do. It should be easy to
+ "comment out" a block of text. Comments in programming languages
+ and other markup languages are invariably easy.
+ Any author is welcome to preface their comments with "Comment:" or
+ "Do Not Print" or "Note to Editor" or anything they like. A
+ "comment" directive could easily be implemented. It might be
+ confused with admonition directives, like "note" and "caution"
+ though. In unrelated (and unpublished and unfinished) work, adding
+ a "comment" directive as a true document element was considered::
+ If structure is necessary, we could use a "comment" directive
+ (to avoid nonsensical DTD changes, the "comment" directive
+ could produce an untitled topic element).
+4. "One of the goals of reStructuredText is to be *readable* by people
+ who don't know it. This construction violates that: it is not at
+ all obvious to the uninitiated that text marked by '..' is a
+ comment. On the other hand, '.. comment:' would be totally
+ transparent."
+ Totally transparent, perhaps, but also very obtrusive. Another of
+ `reStructuredText's goals`_ is to be unobtrusive, and
+ ".. comment::" would violate that. The goals of reStructuredText
+ are many, and they conflict. Determining the right set of goals
+ and finding solutions that best fit is done on a case-by-case
+ basis.
+ Even readability is has two aspects. Being readable without any
+ prior knowledge is one. Being as easily read in raw form as in
+ processed form is the other. ".." may not contribute to the former
+ aspect, but ".. comment::" would certainly detract from the latter.
+ .. _author's note:
+ .. _reStructuredText's goals: ../../ref/rst/introduction.html#goals
+5. "Recently I sent someone an rst document, and they got confused; I
+ had to explain to them that '..' marks comments, *unless* it's a
+ directive, etc..."
+ The explanation of directives *is* roundabout, defining comments in
+ terms of not being other things. That's definitely a wart.
+6. "Under the current system, a mistyped directive (with ':' instead
+ of '::') will be silently ignored. This is an error that could
+ easily go unnoticed."
+ A parser option/setting like "--comments-on-stderr" would help.
+7. "I'd prefer to see double-dot-space / command / double-colon as the
+ standard Docutils markup-marker. It's unusual enough to avoid
+ being accidently used. Everything that starts with a double-dot
+ should end with a double-colon."
+ That would increase the punctuation verbosity of some constructs
+ considerably.
+8. Edward Loper proposed the following plan for backwards
+ compatibility:
+ 1. ".. foo" will generate a deprecation warning to stderr, and
+ nothing in the output (no system messages).
+ 2. ".. foo: bar" will be treated as a directive foo. If there
+ is no foo directive, then do the normal error output.
+ 3. ".. foo:: bar" will generate a deprecation warning to
+ stderr, and be treated as a directive. Or leave it valid?
+ So some existing documents might start printing deprecation
+ warnings, but the only existing documents that would *break*
+ would be ones that say something like::
+ .. warning: this should be a comment
+ instead of::
+ .. warning:: this should be a comment
+ Here, we're trading fairly common a silent error (directive
+ falsely treated as a comment) for a fairly uncommon explicitly
+ flagged error (comment falsely treated as directive). To make
+ things even easier, we could add a sentence to the
+ unknown-directive error. Something like "If you intended to
+ create a comment, please use '.. comment:' instead".
+On one hand, I understand and sympathize with the points raised. On
+the other hand, I think the current syntax strikes the right balance
+(but I acknowledge a possible lack of objectivity). On the gripping
+hand, the comment and directive syntax has become well established, so
+even if it's a wart, it may be a wart we have to live with.
+Making any of these changes would cause a lot of breakage or at least
+deprecation warnings. I'm not sure the benefit is worth the cost.
+For now, we'll treat this as an unresolved legacy issue.
+ To Do
+Nested Inline Markup
+These are collected notes on a long-discussed issue. The original
+mailing list messages should be referred to for details.
+* In a 2001-10-31 discussion I wrote:
+ Try, for example, `Ed Loper's 2001-03-21 post`_, which details
+ some rules for nested inline markup. I think the complexity is
+ prohibitive for the marginal benefit. (And if you can understand
+ that tree without going mad, you're a better man than I. ;-)
+ Inline markup is already fragile. Allowing nested inline markup
+ would only be asking for trouble IMHO. If it proves absolutely
+ necessary, it can be added later. The rules for what can appear
+ inside what must be well thought out first though.
+ .. _Ed Loper's 2001-03-21 post:
+ --
+* In a 2001-11-09 Doc-SIG post, I wrote:
+ The problem is that in the
+ what-you-see-is-more-or-less-what-you-get markup language that
+ is reStructuredText, the symbols used for inline markup ("*",
+ "**", "`", "``", etc.) may preclude nesting.
+ I've rethought this position. Nested markup is not precluded, just
+ tricky. People and software parse "double and 'single' quotes" all
+ the time. Continuing,
+ I've thought over how we might implement nested inline
+ markup. The first algorithm ("first identify the outer inline
+ markup as we do now, then recursively scan for nested inline
+ markup") won't work; counterexamples were given in my `last post
+ <>`__.
+ The second algorithm makes my head hurt::
+ while 1:
+ scan for start-string
+ if found:
+ push on stack
+ scan for start or end string
+ if new start string found:
+ recurse
+ elif matching end string found:
+ pop stack
+ elif non-matching end string found:
+ if its a markup error:
+ generate warning
+ elif the initial start-string was misinterpreted:
+ # e.g. in this case: ***strong** in emphasis*
+ restart with the other interpretation
+ # but it might be several layers back ...
+ ...
+ This is similar to how the parser does section title
+ recognition, but sections are much more regular and
+ deterministic.
+ Bottom line is, I don't think the benefits are worth the effort,
+ even if it is possible. I'm not going to try to write the code,
+ at least not now. If somebody codes up a consistent, working,
+ general solution, I'll be happy to consider it.
+ --
+* In a `2003-05-06 Docutils-Users post`__ Paul Tremblay proposed a new
+ syntax to allow for easier nesting. It eventually evolved into
+ this::
+ :role:[inline text]
+ The duplication with the existing interpreted text syntax is
+ problematic though.
+ __
+* Could the parser be extended to parse nested interpreted text? ::
+ :emphasis:`Some emphasized text with :strong:`some more
+ emphasized text` in it and **perhaps** :reference:`a link``
+* In a `2003-06-18 Docutils-Develop post`__, Mark Nodine reported on
+ his implementation of a form of nested inline markup in his
+ Perl-based parser (unpublished). He brought up some interesting
+ ideas. The implementation was flawed, however, by the change in
+ semantics required for backslash escapes.
+ __
+* Docutils-develop threads between David Abrahams, David Goodger, and
+ Mark Nodine (beginning 2004-01-16__ and 2004-01-19__) hashed out
+ many of the details of a potentially successful implementation, as
+ described below. David Abrahams checked in code to the "nesting"
+ branch of CVS, awaiting thorough review.
+ __
+ __
+It may be possible to accomplish nested inline markup in general with
+a more powerful inline markup parser. There may be some issues, but
+I'm not averse to the idea of nested inline markup in general. I just
+don't have the time or inclination to write a new parser now. Of
+course, a good patch would be welcome!
+I envisage something like this. Explicit-role interpreted text must
+be nestable. Prefix-based is probably preferred, since suffix-based
+will look like inline literals::
+ ``text`:role1:`:role2:
+But it can be disambiguated, so it ought to be left up to the author::
+ `\ `text`:role1:`:role2:
+In addition, other forms of inline markup may be nested if
+ *emphasized ``literal`` and |substitution ref| and link_*
+IOW, the parser ought to be as permissive as possible.
+Index Entries & Indexes
+Were I writing a book with an index, I guess I'd need two
+different kinds of index targets: inline/implicit and
+out-of-line/explicit. For example::
+ In this `paragraph`:index:, several words are being
+ `marked`:index: inline as implicit `index`:index:
+ entries.
+ .. index:: markup
+ .. index:: syntax
+ The explicit index directives above would refer to
+ this paragraph. It might also make sense to allow multiple
+ entries in an ``index`` directive:
+ .. index::
+ markup
+ syntax
+The words "paragraph", "marked", and "index" would become index
+entries pointing at the words in the first paragraph. The index
+entry words appear verbatim in the text. (Don't worry about the
+ugly ":index:" part; if indexing is the only/main application of
+interpreted text in your documents, it can be implicit and
+omitted.) The two directives provide manual indexing, where the
+index entry words ("markup" and "syntax") do not appear in the
+main text. We could combine the two directives into one::
+ .. index:: markup; syntax
+Semicolons instead of commas because commas could *be* part of the
+index target, like::
+ .. index:: van Rossum, Guido
+Another reason for index directives is because other inline markup
+wouldn't be possible within inline index targets.
+Sometimes index entries have multiple levels. Given::
+ .. index:: statement syntax: expression statements
+In a hypothetical index, combined with other entries, it might
+look like this::
+ statement syntax
+ expression statements ..... 56
+ assignment ................ 57
+ simple statements ......... 58
+ compound statements ....... 60
+Inline multi-level index targets could be done too. Perhaps
+something like::
+ When dealing with `expression statements <statement syntax:>`,
+ we must remember ...
+The opposite sense could also be possible::
+ When dealing with `index entries <:multi-level>`, there are
+ many permutations to consider.
+Also "see / see also" index entries.
+ Here's a paragraph.
+ .. index:: paragraph
+(The "index" directive above actually targets the *preceding*
+object.) The directive should produce something like this XML::
+ <paragraph>
+ <index_entry text="paragraph"/>
+ Here's a paragraph.
+ </paragraph>
+This kind of content model would also allow true inline
+ Here's a `paragraph`:index:.
+If the "index" role were the default for the application, it could be
+ Here's a `paragraph`.
+Both of these would result in this XML::
+ <paragraph>
+ Here's a <index_entry>paragraph</index_entry>.
+ </paragraph>
+from 2002-06-24 docutils-develop posts
+ If all of your index entries will appear verbatim in the text,
+ this should be sufficient. If not (e.g., if you want "Van Rossum,
+ Guido" in the index but "Guido van Rossum" in the text), we'll
+ have to figure out a supplemental mechanism, perhaps using
+ substitutions.
+I've thought a bit more on this, and I came up with two possibilities:
+1. Using interpreted text, embed the index entry text within the
+ interpreted text::
+ ... by `Guido van Rossum [Van Rossum, Guido]` ...
+ The problem with this is obvious: the text becomes cluttered and
+ hard to read. The processed output would drop the text in
+ brackets, which goes against the spirit of interpreted text.
+2. Use substitutions::
+ ... by |Guido van Rossum| ...
+ .. |Guido van Rossum| index:: Van Rossum, Guido
+ A problem with this is that each substitution definition must have
+ a unique name. A subsequent ``.. |Guido van Rossum| index:: BDFL``
+ would be illegal. Some kind of anonymous substitution definition
+ mechanism would be required, but I think that's going too far.
+Both of these alternatives are flawed. Any other ideas?
+ ... Or Not To Do?
+This is the realm of the possible but questionably probable. These
+ideas are kept here as a record of what has been proposed, for
+posterity and in case any of them prove to be useful.
+Compound Enumerated Lists
+Allow for compound enumerators, such as "1.1." or "1.a." or "1(a)", to
+allow for nested enumerated lists without indentation?
+Indented Lists
+Allow for variant styles by interpreting indented lists as if they
+weren't indented? For example, currently the list below will be
+parsed as a list within a block quote::
+ paragraph
+ * list item 1
+ * list item 2
+But a lot of people seem to write that way, and HTML browsers make it
+look as if that's the way it should be. The parser could check the
+contents of block quotes, and if they contain only a single list,
+remove the block quote wrapper. There would be two problems:
+1. What if we actually *do* want a list inside a block quote?
+2. What if such a list comes immediately after an indented construct,
+ such as a literal block?
+Both could be solved using empty comments (problem 2 already exists
+for a block quote after a literal block). But that's a hack.
+Perhaps a runtime setting, allowing or disabling this convenience,
+would be appropriate. But that raises issues too:
+ User A, who writes lists indented (and their config file is set up
+ to allow it), sends a file to user B, who doesn't (and their
+ config file disables indented lists). The result of processing by
+ the two users will be different.
+It may seem minor, but it adds ambiguity to the parser, which is bad.
+See the `Doc-SIG discussion starting 2001-04-18`__ with Ed Loper's
+"Structuring: a summary; and an attempt at EBNF", item 4 (and
+follow-ups, here__ and here__). Also `docutils-users, 2003-02-17`__
+and `beginning 2003-08-04`__.
+Sloppy Indentation of List Items
+Perhaps the indentation shouldn't be so strict. Currently, this is
+ 1. First line,
+ second line.
+Anything wrong with this? ::
+ 1. First line,
+ second line.
+Problem? ::
+ 1. First para.
+ Block quote. (no good: requires some indent relative to first
+ para)
+ Second Para.
+ 2. Have to carefully define where the literal block ends::
+ Literal block
+ Literal block?
+Hmm... Non-strict indentation isn't such a good idea.
+Lazy Indentation of List Items
+Another approach: Going back to the first draft of reStructuredText
+(2000-11-27 post to Doc-SIG)::
+ - This is the fourth item of the main list (no blank line above).
+ The second line of this item is not indented relative to the
+ bullet, which precludes it from having a second paragraph.
+Change that to *require* a blank line above and below, to reduce
+ambiguity. This "loosening" may be added later, once the parser's
+been nailed down. However, a serious drawback of this approach is to
+limit the content of each list item to a single paragraph.
+David's Idea for Lazy Indentation
+Consider a paragraph in a word processor. It is a single logical line
+of text which ends with a newline, soft-wrapped arbitrarily at the
+right edge of the page or screen. We can think of a plaintext
+paragraph in the same way, as a single logical line of text, ending
+with two newlines (a blank line) instead of one, and which may contain
+arbitrary line breaks (newlines) where it was accidentally
+hard-wrapped by an application. We can compensate for the accidental
+hard-wrapping by "unwrapping" every unindented second and subsequent
+line. The indentation of the first line of a paragraph or list item
+would determine the indentation for the entire element. Blank lines
+would be required between list items when using lazy indentation.
+The following example shows the lazy indentation of multiple body
+ - This is the first paragraph
+ of the first list item.
+ Here is the second paragraph
+ of the first list item.
+ - This is the first paragraph
+ of the second list item.
+ Here is the second paragraph
+ of the second list item.
+A more complex example shows the limitations of lazy indentation::
+ - This is the first paragraph
+ of the first list item.
+ Next is a definition list item:
+ Term
+ Definition. The indentation of the term is
+ required, as is the indentation of the definition's
+ first line.
+ When the definition extends to more than
+ one line, lazy indentation may occur. (This is the second
+ paragraph of the definition.)
+ - This is the first paragraph
+ of the second list item.
+ - Here is the first paragraph of
+ the first item of a nested list.
+ So this paragraph would be outside of the nested list,
+ but inside the second list item of the outer list.
+ But this paragraph is not part of the list at all.
+And the ambiguity remains::
+ - Look at the hyphen at the beginning of the next line
+ - is it a second list item marker, or a dash in the text?
+ Similarly, we may want to refer to numbers inside enumerated
+ lists:
+ 1. How many socks in a pair? There are
+ 2. How many pants in a pair? Exactly
+ 1. Go figure.
+Literal blocks and block quotes would still require consistent
+indentation for all their lines. For block quotes, we might be able
+to get away with only requiring that the first line of each contained
+element be indented. For example::
+ Here's a paragraph.
+ This is a paragraph inside a block quote.
+ Second and subsequent lines need not be indented at all.
+ - A bullet list inside
+ the block quote.
+ Second paragraph of the
+ bullet list inside the block quote.
+Although feasible, this form of lazy indentation has problems. The
+document structure and hierarchy is not obvious from the indentation,
+making the source plaintext difficult to read. This will also make
+keeping track of the indentation while writing difficult and
+error-prone. However, these problems may be acceptable for Wikis and
+email mode, where we may be able to rely on less complex structure
+(few nested lists, for example).
+Multiple Roles in Interpreted Text
+In reStructuredText, inline markup cannot be nested (yet; `see
+above`__). This also applies to interpreted text. In order to
+simultaneously combine multiple roles for a single piece of text, a
+syntax extension would be necessary. Ideas:
+1. Initial idea::
+ `interpreted text`:role1,role2:
+2. Suggested by Jason Diamond::
+ `interpreted text`:role1:role2:
+If a document is so complex as to require nested inline markup,
+perhaps another markup system should be considered. By design,
+reStructuredText does not have the flexibility of XML.
+__ `Nested Inline Markup`_
+Parameterized Interpreted Text
+In some cases it may be expedient to pass parameters to interpreted
+text, analogous to function calls. Ideas:
+1. Parameterize the interpreted text role itself (suggested by Jason
+ Diamond)::
+ `interpreted text`:role1(foo=bar):
+ Positional parameters could also be supported::
+ `CSS`:acronym(Cascading Style Sheets): is used for HTML, and
+ `CSS`:acronym(Content Scrambling System): is used for DVDs.
+ Technical problem: current interpreted text syntax does not
+ recognize roles containing whitespace. Design problem: this smells
+ like programming language syntax, but reStructuredText is not a
+ programming language.
+2. Put the parameters inside the interpreted text::
+ `CSS (Cascading Style Sheets)`:acronym: is used for HTML, and
+ `CSS (Content Scrambling System)`:acronym: is used for DVDs.
+ Although this could be defined on an individual basis (per role),
+ we ought to have a standard. Hyperlinks with embedded URIs already
+ use angle brackets; perhaps they could be used here too::
+ `CSS <Cascading Style Sheets>`:acronym: is used for HTML, and
+ `CSS <Content Scrambling System>`:acronym: is used for DVDs.
+ Do angle brackets connote URLs too much for this to be acceptable?
+ How about the "tag" connotation -- does it save them or doom them?
+3. `Nested inline markup`_ could prove useful here::
+ `CSS :def:`Cascading Style Sheets``:acronym: is used for HTML,
+ and `CSS :def:`Content Scrambling System``:acronym: is used for
+ DVDs.
+ Inline markup roles could even define the default roles of nested
+ inline markup, allowing this cleaner syntax::
+ `CSS `Cascading Style Sheets``:acronym: is used for HTML, and
+ `CSS `Content Scrambling System``:acronym: is used for DVDs.
+Does this push inline markup too far? Readability becomes a serious
+issue. Substitutions may provide a better alternative (at the expense
+of verbosity and duplication) by pulling the details out of the text
+ |CSS| is used for HTML, and |CSS-DVD| is used for DVDs.
+ .. |CSS| acronym:: Cascading Style Sheets
+ .. |CSS-DVD| acronym:: Content Scrambling System
+ :text: CSS
+This whole idea may be going beyond the scope of reStructuredText.
+Documents requiring this functionality may be better off using XML or
+another markup system.
+This argument comes up regularly when pushing the envelope of
+reStructuredText syntax. I think it's a useful argument in that it
+provides a check on creeping featurism. In many cases, the resulting
+verbosity produces such unreadable plaintext that there's a natural
+desire *not* to use it unless absolutely necessary. It's a matter of
+finding the right balance.
+Syntax for Interpreted Text Role Bindings
+The following syntax (idea from Jeffrey C. Jacobs) could be used to
+associate directives with roles::
+ .. :rewrite: class:: rewrite
+ `She wore ribbons in her hair and it lay with streaks of
+ grey`:rewrite:
+The syntax is similar to that of substitution declarations, and the
+directive/role association may resolve implementation issues. The
+semantics, ramifications, and implementation details would need to be
+worked out.
+The example above would implement the "rewrite" role as adding a
+``class="rewrite"`` attribute to the interpreted text ("inline"
+element). The stylesheet would then pick up on the "class" attribute
+to do the actual formatting.
+The advantage of the new syntax would be flexibility. Uses other than
+"class" may present themselves. The disadvantage is complexity:
+having to implement new syntax for a relatively specialized operation,
+and having new semantics in existing directives ("class::" would do
+something different).
+The `"role" directive`__ has been implemented.
+__ ../../ref/rst/directives.html#role
+Character Processing
+Several people have suggested adding some form of character processing
+to reStructuredText:
+* Some sort of automated replacement of ASCII sequences:
+ - ``--`` to em-dash (or ``--`` to en-dash, and ``---`` to em-dash).
+ - Convert quotes to curly quote entities. (Essentially impossible
+ for HTML? Unnecessary for TeX.)
+ - Various forms of ``:-)`` to smiley icons.
+ - ``"\ "`` to &nbsp;. Problem with line-wrapping though: it could
+ end up escaping the newline.
+ - Escaped newlines to <BR>.
+ - Escaped period or quote or dash as a disappearing catalyst to
+ allow character-level inline markup?
+* XML-style character entities, such as "&copy;" for the copyright
+ symbol.
+Docutils has no need of a character entity subsystem. Supporting
+Unicode and text encodings, character entities should be directly
+represented in the text: a copyright symbol should be represented by
+the copyright symbol character. If this is not possible in an
+authoring environment, a pre-processing stage can be added, or a table
+of substitution definitions can be devised.
+A "unicode" directive has been implemented to allow direct
+specification of esoteric characters. In combination with the
+substitution construct, "include" files defining common sets of
+character entities can be defined and used. `A set of character
+entity set definition files have been defined`__ (`tarball`__).
+There's also `a description and instructions for use`__.
+To allow for `character-level inline markup`_, a limited form of
+character processing has been added to the spec and parser: escaped
+whitespace characters are removed from the processed document. Any
+further character processing will be of this functional type, rather
+than of the character-encoding type.
+.. _character-level inline markup:
+ ../../ref/rst/restructuredtext.html#character-level-inline-markup
+* Directive idea::
+ .. text-replace:: "pattern" "replacement"
+ - Support Unicode "U+XXXX" codes.
+ - Support regexps, perhaps with alternative "regexp-replace"
+ directive.
+ - Flags for regexps; ":flags:" option, or individuals.
+ - Specifically, should the default be case-sensistive or
+ -insensitive?
+Page Or Line Breaks
+* Should ^L (or something else in reST) be defined to mean
+ force/suggest page breaks in whatever output we have?
+ A "break" or "page-break" directive would be easy to add. A new
+ doctree element would be required though (perhaps "break"). The
+ final behavior would be up to the Writer. The directive argument
+ could be one of page/column/recto/verso for added flexibility.
+ Currently ^L (Python's ``\f``) characters are treated as whitespace.
+ They're converted to single spaces, actually, as are vertical tabs
+ (^K, Python's ``\v``). It would be possible to recognize form feeds
+ as markup, but it requires some thought and discussion first. Are
+ there any downsides? Many editing environments do not allow the
+ insertion of control characters. Will it cause any harm? It would
+ be useful as a shorthand for the directive.
+ It's common practice to use ^L before Emacs "Local Variables"
+ lists::
+ ^L
+ ..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
+ These are already present in many PEPs and Docutils project
+ documents. From the Emacs manual (info):
+ A "local variables list" goes near the end of the file, in the
+ last page. (It is often best to put it on a page by itself.)
+ It would be unfortunate if this construct caused a final blank page
+ to be generated (for those Writers that recognize the page breaks).
+ We'll have to add a transform that looks for a "break" plus zero or
+ more comments at the end of a document, and removes them.
+ Probably a bad idea because there is no such thing as a page in a
+ generic document format.
+* Could the "break" concept above be extended to inline forms?
+ E.g. "^L" in the middle of a sentence could cause a line break.
+ Only recognize it at the end of a line (i.e., ``\f\n``)?
+ Or is formfeed inappropriate? Perhaps vertical tab (``\v``), but
+ even that's a stretch. Can't use carriage returns, since they're
+ commonly used for line endings.
+ Probably a bad idea as well because we do not want to use control
+ characters for well-readable and well-writable markup, and after all
+ we have the line block syntax for line breaks.
+Superscript Markup
+Add ``^superscript^`` inline markup? The only common non-markup uses
+of "^" I can think of are as short hand for "superscript" itself and
+for describing control characters ("^C to cancel"). The former
+supports the proposed syntax, and it could be argued that the latter
+ought to be literal text anyhow (e.g. "``^C`` to cancel").
+However, superscripts are seldom needed, and new syntax would break
+existing documents. When it's needed, the ``:superscript:``
+(``:sup:``) role can we used as well.
+Code Execution
+Add the following directives?
+- "exec": Execute Python code & insert the results. Call it
+ "python" to allow for other languages?
+- "system": Execute an ``os.system()`` call, and insert the results
+ (possibly as a literal block). Definitely dangerous! How to make
+ it safe? Perhaps such processing should be left outside of the
+ document, in the user's production system (a makefile or a script or
+ whatever). Or, the directive could be disabled by default and only
+ enabled with an explicit command-line option or config file setting.
+ Even then, an interactive prompt may be useful, such as:
+ The file.txt document you are processing contains a "system"
+ directive requesting that the ``sudo rm -rf /`` command be
+ executed. Allow it to execute? (y/N)
+- "eval": Evaluate an expression & insert the text. At parse
+ time or at substitution time? Dangerous? Perhaps limit to canned
+ macros; see text.date_.
+ .. ../todo.html#text-date
+It's too dangerous (or too complicated in the case of "eval"). We do
+not want to have such things in the core.
+``encoding`` Directive
+Add an "encoding" directive to specify the character encoding of the
+input data? Not a good idea for the following reasons:
+- When it sees the directive, the parser will already have read the
+ input data, and encoding determination will already have been done.
+- If a file with an "encoding" directive is edited and saved with
+ a different encoding, the directive may cause data corruption.
+Support for Annotations
+Add an "annotation" role, as the equivalent of the HTML "title"
+attribute? This is secondary information that may "pop up" when the
+pointer hovers over the main text. A corresponding directive would be
+required to associate annotations with the original text (by name, or
+positionally as in anonymous targets?).
+There have not been many requests for such feature, though. Also,
+cluttering WYSIWYG plaintext with annotations may not seem like a good
+idea, and there is no "tool tip" in formats other than HTML.
+``term`` Role
+Add a "term" role for unfamiliar or specialized terminology? Probably
+not; there is no real use case, and emphasis is enough for most cases.
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/dev/rst/problems.txt b/docs/dev/rst/problems.txt
new file mode 100644
index 000000000..bc0101cbf
--- /dev/null
+++ b/docs/dev/rst/problems.txt
@@ -0,0 +1,872 @@
+ Problems With StructuredText
+:Author: David Goodger
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This document has been placed in the public domain.
+There are several problems, unresolved issues, and areas of
+controversy within StructuredText_ (Classic and Next Generation). In
+order to resolve all these issues, this analysis brings all of the
+issues out into the open, enumerates all the alternatives, and
+proposes solutions to be incorporated into the reStructuredText_
+.. contents::
+Formal Specification
+The description in the original has been criticized
+for being vague. For practical purposes, "the code *is* the spec."
+Tony Ibbs has been working on deducing a `detailed description`_ from
+the documentation and code of StructuredTextNG_. Edward Loper's
+STMinus_ is another attempt to formalize a spec.
+For this kind of a project, the specification should always precede
+the code. Otherwise, the markup is a moving target which can never be
+adopted as a standard. Of course, a specification may be revised
+during lifetime of the code, but without a spec there is no visible
+control and thus no confidence.
+Understanding and Extending the Code
+The original StructuredText_ is a dense mass of sparsely commented
+code and inscrutable regular expressions. It was not designed to be
+extended and is very difficult to understand. StructuredTextNG_ has
+been designed to allow input (syntax) and output extensions, but its
+documentation (both internal [comments & docstrings], and external) is
+inadequate for the complexity of the code itself.
+For reStructuredText to become truly useful, perhaps even part of
+Python's standard library, it must have clear, understandable
+documentation and implementation code. For the implementation of
+reStructuredText to be taken seriously, it must be a sterling example
+of the potential of docstrings; the implementation must practice what
+the specification preaches.
+Section Structure via Indentation
+Setext_ required that body text be indented by 2 spaces. The original
+StructuredText_ and StructuredTextNG_ require that section structure
+be indicated through indentation, as "inspired by Python". For
+certain structures with a very limited, local extent (such as lists,
+block quotes, and literal blocks), indentation naturally indicates
+structure or hierarchy. For sections (which may have a very large
+extent), structure via indentation is unnecessary, unnatural and
+ambiguous. Rather, the syntax of the section title *itself* should
+indicate that it is a section title.
+The original StructuredText states that "A single-line paragraph whose
+immediately succeeding paragraphs are lower level is treated as a
+header." Requiring indentation in this way is:
+- Unnecessary. The vast majority of docstrings and standalone
+ documents will have no more than one level of section structure.
+ Requiring indentation for such docstrings is unnecessary and
+ irritating.
+- Unnatural. Most published works use title style (type size, face,
+ weight, and position) and/or section/subsection numbering rather
+ than indentation to indicate hierarchy. This is a tradition with a
+ very long history.
+- Ambiguous. A StructuredText header is indistinguishable from a
+ one-line paragraph followed by a block quote (precluding the use of
+ block quotes). Enumerated section titles are ambiguous (is it a
+ header? is it a list item?). Some additional adornment must be
+ required to confirm the line's role as a title, both to a parser and
+ to the human reader of the source text.
+Python's use of significant whitespace is a wonderful (if not
+original) innovation, however requiring indentation in ordinary
+written text is hypergeneralization.
+reStructuredText_ indicates section structure through title adornment
+style (as exemplified by this document). This is far more natural.
+In fact, it is already in widespread use in plain text documents,
+including in Python's standard distribution (such as the toplevel
+README_ file).
+Character Escaping Mechanism
+No matter what characters are chosen for markup, some day someone will
+want to write documentation *about* that markup or using markup
+characters in a non-markup context. Therefore, any complete markup
+language must have an escaping or encoding mechanism. For a
+lightweight markup system, encoding mechanisms like SGML/XML's '&ast;'
+are out. So an escaping mechanism is in. However, with carefully
+chosen markup, it should be necessary to use the escaping mechanism
+only infrequently.
+reStructuredText_ needs an escaping mechanism: a way to treat
+markup-significant characters as the characters themselves. Currently
+there is no such mechanism (although ZWiki uses '!'). What are the
+1. ``!``
+ (
+2. ``\``
+3. ``~``
+4. doubling of characters
+The best choice for this is the backslash (``\``). It's "the single
+most popular escaping character in the world!", therefore familiar and
+unsurprising. Since characters only need to be escaped under special
+circumstances, which are typically those explaining technical
+programming issues, the use of the backslash is natural and
+understandable. Python docstrings can be raw (prefixed with an 'r',
+as in 'r""'), which would obviate the need for gratuitous doubling-up
+of backslashes.
+(On 2001-03-29 on the Doc-SIG mailing list, GvR endorsed backslash
+escapes, saying, "'nuff said. Backslash it is." Although neither
+legally binding nor irrevocable nor any kind of guarantee of anything,
+it is a good sign.)
+The rule would be: An unescaped backslash followed by any markup
+character escapes the character. The escaped character represents the
+character itself, and is prevented from playing a role in any markup
+interpretation. The backslash is removed from the output. A literal
+backslash is represented by an "escaped backslash," two backslashes in
+a row.
+A carefully constructed set of recognition rules for inline markup
+will obviate the need for backslash-escapes in almost all cases; see
+`Delimitation of Inline Markup`_ below.
+When an expression (requiring backslashes and other characters used
+for markup) becomes too complicated and therefore unreadable, a
+literal block may be used instead. Inside literal blocks, no markup
+is recognized, therefore backslashes (for the purpose of escaping
+markup) become unnecessary.
+We could allow backslashes preceding non-markup characters to remain
+in the output. This would make describing regular expressions and
+other uses of backslashes easier. However, this would complicate the
+markup rules and would be confusing.
+Blank Lines in Lists
+Oft-requested in Doc-SIG (the earliest reference is dated 1996-08-13)
+is the ability to write lists without requiring blank lines between
+items. In docstrings, space is at a premium. Authors want to convey
+their API or usage information in as compact a form as possible.
+StructuredText_ requires blank lines between all body elements,
+including list items, even when boundaries are obvious from the markup
+In reStructuredText, blank lines are optional between list items.
+However, in order to eliminate ambiguity, a blank line is required
+before the first list item and after the last. Nested lists also
+require blank lines before the list start and after the list end.
+Bullet List Markup
+StructuredText_ includes 'o' as a bullet character. This is dangerous
+and counter to the language-independent nature of the markup. There
+are many languages in which 'o' is a word. For example, in Spanish::
+ Llamame a la casa
+ o al trabajo.
+ (Call me at home or at work.)
+And in Japanese (when romanized)::
+ Senshuu no doyoubi ni tegami
+ o kakimashita.
+ ([I] wrote a letter on Saturday last week.)
+If a paragraph containing an 'o' word wraps such that the 'o' is the
+first text on a line, or if a paragraph begins with such a word, it
+could be misinterpreted as a bullet list.
+In reStructuredText_, 'o' is not used as a bullet character. '-',
+'*', and '+' are the possible bullet characters.
+Enumerated List Markup
+StructuredText enumerated lists are allowed to begin with numbers and
+letters followed by a period or right-parenthesis, then whitespace.
+This has surprising consequences for writing styles. For example,
+this is recognized as an enumerated list item by StructuredText::
+ Mr. Creosote.
+People will write enumerated lists in all different ways. It is folly
+to try to come up with the "perfect" format for an enumerated list,
+and limit the docstring parser's recognition to that one format only.
+Rather, the parser should recognize a variety of enumerator styles.
+It is also recommended that the enumerator of the first list item be
+ordinal-1 ('1', 'A', 'a', 'I', or 'i'), as output formats may not be
+able to begin a list at an arbitrary enumeration.
+An initial idea was to require two or more consistent enumerated list
+items in a row. This idea proved impractical and was dropped. In
+practice, the presence of a proper enumerator is enough to reliably
+recognize an enumerated list item; any ambiguities are reported by the
+parser. Here's the original idea for posterity:
+ The parser should recognize a variety of enumerator styles, mark
+ each block as a potential enumerated list item (PELI), and
+ interpret the enumerators of adjacent PELIs to decide whether they
+ make up a consistent enumerated list.
+ If a PELI is labeled with a "1.", and is immediately followed by a
+ PELI labeled with a "2.", we've got an enumerated list. Or "(A)"
+ followed by "(B)". Or "i)" followed by "ii)", etc. The chances
+ of accidentally recognizing two adjacent and consistently labeled
+ PELIs, are acceptably small.
+ For an enumerated list to be recognized, the following must be
+ true:
+ - the list must consist of multiple adjacent list items (2 or
+ more)
+ - the enumerators must all have the same format
+ - the enumerators must be sequential
+Definition List Markup
+StructuredText uses ' -- ' (whitespace, two hyphens, whitespace) on
+the first line of a paragraph to indicate a definition list item. The
+' -- ' serves to separate the term (on the left) from the definition
+(on the right).
+Many people use ' -- ' as an em-dash in their text, conflicting with
+the StructuredText usage. Although the Chicago Manual of Style says
+that spaces should not be used around an em-dash, Peter Funk pointed
+out that this is standard usage in German (according to the Duden, the
+official German reference), and possibly in other languages as well.
+The widespread use of ' -- ' precludes its use for definition lists;
+it would violate the "unsurprising" criterion.
+A simpler, and at least equally visually distinctive construct
+(proposed by Guido van Rossum, who incidentally is a frequent user of
+' -- ') would do just as well::
+ term 1
+ Definition.
+ term 2
+ Definition 2, paragraph 1.
+ Definition 2, paragraph 2.
+A reStructuredText definition list item consists of a term and a
+definition. A term is a simple one-line paragraph. A definition is a
+block indented relative to the term, and may contain multiple
+paragraphs and other body elements. No blank line precedes a
+definition (this distinguishes definition lists from block quotes).
+Literal Blocks
+The StructuredText_ specification has literal blocks indicated by
+'example', 'examples', or '::' ending the preceding paragraph. STNG
+only recognizes '::'; 'example'/'examples' are not implemented. This
+is good; it fixes an unnecessary language dependency. The problem is
+what to do with the sometimes- unwanted '::'.
+In reStructuredText_ '::' at the end of a paragraph indicates that
+subsequent *indented* blocks are treated as literal text. No further
+markup interpretation is done within literal blocks (not even
+backslash-escapes). If the '::' is preceded by whitespace, '::' is
+omitted from the output; if '::' was the sole content of a paragraph,
+the entire paragraph is removed (no 'empty' paragraph remains). If
+'::' is preceded by a non-whitespace character, '::' is replaced by
+':' (i.e., the extra colon is removed).
+Thus, a section could begin with a literal block as follows::
+ Section Title
+ -------------
+ ::
+ print "this is example literal"
+The table markup scheme in classic StructuredText was horrible. Its
+omission from StructuredTextNG is welcome, and its markup will not be
+repeated here. However, tables themselves are useful in
+documentation. Alternatives:
+1. This format is the most natural and obvious. It was independently
+ invented (no great feat of creation!), and later found to be the
+ format supported by the `Emacs table mode`_::
+ +------------+------------+------------+--------------+
+ | Header 1 | Header 2 | Header 3 | Header 4 |
+ +============+============+============+==============+
+ | Column 1 | Column 2 | Column 3 & 4 span (Row 1) |
+ +------------+------------+------------+--------------+
+ | Column 1 & 2 span | Column 3 | - Column 4 |
+ +------------+------------+------------+ - Row 2 & 3 |
+ | 1 | 2 | 3 | - span |
+ +------------+------------+------------+--------------+
+ Tables are described with a visual outline made up of the
+ characters '-', '=', '|', and '+':
+ - The hyphen ('-') is used for horizontal lines (row separators).
+ - The equals sign ('=') is optionally used as a header separator
+ (as of version 1.5.24, this is not supported by the Emacs table
+ mode).
+ - The vertical bar ('|') is used for for vertical lines (column
+ separators).
+ - The plus sign ('+') is used for intersections of horizontal and
+ vertical lines.
+ Row and column spans are possible simply by omitting the column or
+ row separators, respectively. The header row separator must be
+ complete; in other words, a header cell may not span into the table
+ body. Each cell contains body elements, and may have multiple
+ paragraphs, lists, etc. Initial spaces for a left margin are
+ allowed; the first line of text in a cell determines its left
+ margin.
+2. Below is a simpler table structure. It may be better suited to
+ manual input than alternative #1, but there is no Emacs editing
+ mode available. One disadvantage is that it resembles section
+ titles; a one-column table would look exactly like section &
+ subsection titles. ::
+ ============ ============ ============ ==============
+ Header 1 Header 2 Header 3 Header 4
+ ============ ============ ============ ==============
+ Column 1 Column 2 Column 3 & 4 span (Row 1)
+ ------------ ------------ ---------------------------
+ Column 1 & 2 span Column 3 - Column 4
+ ------------------------- ------------ - Row 2 & 3
+ 1 2 3 - span
+ ============ ============ ============ ==============
+ The table begins with a top border of equals signs with a space at
+ each column boundary (regardless of spans). Each row is
+ underlined. Internal row separators are underlines of '-', with
+ spaces at column boundaries. The last of the optional head rows is
+ underlined with '=', again with spaces at column boundaries.
+ Column spans have no spaces in their underline. Row spans simply
+ lack an underline at the row boundary. The bottom boundary of the
+ table consists of '=' underlines. A blank line is required
+ following a table.
+3. A minimalist alternative is as follows::
+ ==== ===== ======== ======== ======= ==== ===== =====
+ Old State Input Action New State Notes
+ ----------- -------- ----------------- -----------
+ ids types new type sys.msg. dupname ids types
+ ==== ===== ======== ======== ======= ==== ===== =====
+ -- -- explicit -- -- new True
+ -- -- implicit -- -- new False
+ None False explicit -- -- new True
+ old False explicit implicit old new True
+ None True explicit explicit new None True
+ old True explicit explicit new,old None True [1]
+ None False implicit implicit new None False
+ old False implicit implicit new,old None False
+ None True implicit implicit new None True
+ old True implicit implicit new old True
+ ==== ===== ======== ======== ======= ==== ===== =====
+ The table begins with a top border of equals signs with one or more
+ spaces at each column boundary (regardless of spans). There must
+ be at least two columns in the table (to differentiate it from
+ section headers). Each line starts a new row. The rightmost
+ column is unbounded; text may continue past the edge of the table.
+ Each row/line must contain spaces at column boundaries, except for
+ explicit column spans. Underlines of '-' can be used to indicate
+ column spans, but should be used sparingly if at all. Lines
+ containing column span underlines may not contain any other text.
+ The last of the optional head rows is underlined with '=', again
+ with spaces at column boundaries. The bottom boundary of the table
+ consists of '=' underlines. A blank line is required following a
+ table.
+ This table sums up the features. Using all the features in such a
+ small space is not pretty though::
+ ======== ======== ========
+ Header 2 & 3 Span
+ ------------------
+ Header 1 Header 2 Header 3
+ ======== ======== ========
+ Each line is a new row.
+ Each row consists of one line only.
+ Row spans are not possible.
+ The last column may spill over to the right.
+ Column spans are possible with an underline joining columns.
+ ----------------------------
+ The span is limited to the row above the underline.
+ ======== ======== ========
+4. As a variation of alternative 3, bullet list syntax in the first
+ column could be used to indicate row starts. Multi-line rows are
+ possible, but row spans are not. For example::
+ ===== =====
+ col 1 col 2
+ ===== =====
+ - 1 Second column of row 1.
+ - 2 Second column of row 2.
+ Second line of paragraph.
+ - 3 Second column of row 3.
+ Second paragraph of row 3,
+ column 2
+ ===== =====
+ Column spans would be indicated on the line after the last line of
+ the row. To indicate a real bullet list within a first-column
+ cell, simply nest the bullets.
+5. In a further variation, we could simply assume that whitespace in
+ the first column implies a multi-line row; the text in other
+ columns is continuation text. For example::
+ ===== =====
+ col 1 col 2
+ ===== =====
+ 1 Second column of row 1.
+ 2 Second column of row 2.
+ Second line of paragraph.
+ 3 Second column of row 3.
+ Second paragraph of row 3,
+ column 2
+ ===== =====
+ Limitations of this approach:
+ - Cells in the first column are limited to one line of text.
+ - Cells in the first column *must* contain some text; blank cells
+ would lead to a misinterpretation. An empty comment ("..") is
+ sufficient.
+6. Combining alternative 3 and 4, a bullet list in the first column
+ could mean multi-line rows, and no bullet list means single-line
+ rows only.
+Alternatives 1 and 5 has been adopted by reStructuredText.
+Delimitation of Inline Markup
+StructuredText specifies that inline markup must begin with
+whitespace, precluding such constructs as parenthesized or quoted
+emphatic text::
+ "**What?**" she cried. (*exit stage left*)
+The `reStructuredText markup specification`_ allows for such
+constructs and disambiguates inline markup through a set of
+recognition rules. These recognition rules define the context of
+markup start-strings and end-strings, allowing markup characters to be
+used in most non-markup contexts without a problem (or a backslash).
+So we can say, "Use asterisks (*) around words or phrases to
+*emphasisze* them." The '(*)' will not be recognized as markup. This
+reduces the need for markup escaping to the point where an escape
+character is *almost* (but not quite!) unnecessary.
+StructuredText uses '_text_' to indicate underlining. To quote David
+Ascher in his 2000-01-21 Doc-SIG mailing list post, "Docstring
+grammar: a very revised proposal":
+ The tagging of underlined text with _'s is suboptimal. Underlines
+ shouldn't be used from a typographic perspective (underlines were
+ designed to be used in manuscripts to communicate to the
+ typesetter that the text should be italicized -- no well-typeset
+ book ever uses underlines), and conflict with double-underscored
+ Python variable names (__init__ and the like), which would get
+ truncated and underlined when that effect is not desired. Note
+ that while *complete* markup would prevent that truncation
+ ('__init__'), I think of docstring markups much like I think of
+ type annotations -- they should be optional and above all do no
+ harm. In this case the underline markup does harm.
+Underlining is not part of the reStructuredText specification.
+Inline Literals
+StructuredText's markup for inline literals (text left as-is,
+verbatim, usually in a monospaced font; as in HTML <TT>) is single
+quotes ('literals'). The problem with single quotes is that they are
+too often used for other purposes:
+- Apostrophes: "Don't blame me, 'cause it ain't mine, it's Chris'.";
+- Quoting text:
+ First Bruce: "Well Bruce, I heard the prime minister use it.
+ 'S'hot enough to boil a monkey's bum in 'ere your Majesty,' he
+ said, and she smiled quietly to herself."
+ In the UK, single quotes are used for dialogue in published works.
+- String literals: s = ''
+ 'text' \'text\' ''text'' "text" \"text\" ""text""
+ #text# @text@ `text` ^text^ ``text'' ``text``
+The examples below contain inline literals, quoted text, and
+apostrophes. Each example should evaluate to the following HTML::
+ Some <TT>code</TT>, with a 'quote', "double", ain't it grand?
+ Does <TT>a[b] = 'c' + "d" + `2^3`</TT> work?
+ 0. Some code, with a quote, double, ain't it grand?
+ Does a[b] = 'c' + "d" + `2^3` work?
+ 1. Some 'code', with a \'quote\', "double", ain\'t it grand?
+ Does 'a[b] = \'c\' + "d" + `2^3`' work?
+ 2. Some \'code\', with a 'quote', "double", ain't it grand?
+ Does \'a[b] = 'c' + "d" + `2^3`\' work?
+ 3. Some ''code'', with a 'quote', "double", ain't it grand?
+ Does ''a[b] = 'c' + "d" + `2^3`'' work?
+ 4. Some "code", with a 'quote', \"double\", ain't it grand?
+ Does "a[b] = 'c' + "d" + `2^3`" work?
+ 5. Some \"code\", with a 'quote', "double", ain't it grand?
+ Does \"a[b] = 'c' + "d" + `2^3`\" work?
+ 6. Some ""code"", with a 'quote', "double", ain't it grand?
+ Does ""a[b] = 'c' + "d" + `2^3`"" work?
+ 7. Some #code#, with a 'quote', "double", ain't it grand?
+ Does #a[b] = 'c' + "d" + `2^3`# work?
+ 8. Some @code@, with a 'quote', "double", ain't it grand?
+ Does @a[b] = 'c' + "d" + `2^3`@ work?
+ 9. Some `code`, with a 'quote', "double", ain't it grand?
+ Does `a[b] = 'c' + "d" + \`2^3\`` work?
+ 10. Some ^code^, with a 'quote', "double", ain't it grand?
+ Does ^a[b] = 'c' + "d" + `2\^3`^ work?
+ 11. Some ``code'', with a 'quote', "double", ain't it grand?
+ Does ``a[b] = 'c' + "d" + `2^3`'' work?
+ 12. Some ``code``, with a 'quote', "double", ain't it grand?
+ Does ``a[b] = 'c' + "d" + `2^3\``` work?
+Backquotes (#9 & #12) are the best choice. They are unobtrusive and
+relatviely rarely used (more rarely than ' or ", anyhow). Backquotes
+have the connotation of 'quotes', which other options (like carets,
+#10) don't.
+Analogously with ``*emph*`` & ``**strong**``, double-backquotes (#12)
+could be used for inline literals. If single-backquotes are used for
+'interpreted text' (context-sensitive domain-specific descriptive
+markup) such as function name hyperlinks in Python docstrings, then
+double-backquotes could be used for absolute-literals, wherein no
+processing whatsoever takes place. An advantage of double-backquotes
+would be that backslash-escaping would no longer be necessary for
+embedded single-backquotes; however, embedded double-backquotes (in an
+end-string context) would be illegal. See `Backquotes in
+Phrase-Links`__ in `Record of reStructuredText Syntax Alternatives`__.
+__ alternatives.html#backquotes-in-phrase-links
+__ alternatives.html
+Alternative choices are carets (#10) and TeX-style quotes (#11). For
+examples of TeX-style quoting, see
+Some existing uses of backquotes:
+1. As a synonym for repr() in Python.
+2. For command-interpolation in shell scripts.
+3. Used as open-quotes in TeX code (and carried over into plaintext
+ by TeXies).
+The inline markup start-string and end-string recognition rules
+defined by the `reStructuredText markup specification`_ would allow
+all of these cases inside inline literals, with very few exceptions.
+As a fallback, literal blocks could handle all cases.
+Outside of inline literals, the above uses of backquotes would require
+backslash-escaping. However, these are all prime examples of text
+that should be marked up with inline literals.
+If either backquotes or straight single-quotes are used as markup,
+TeX-quotes are too troublesome to support, so no special-casing of
+TeX-quotes should be done (at least at first). If TeX-quotes have to
+be used outside of literals, a single backslash-escaped would suffice:
+\``TeX quote''. Ugly, true, but very infrequently used.
+Using literal blocks is a fallback option which removes the need for
+ like this::
+ Here, we can do ``absolutely'' anything `'`'\|/|\ we like!
+No mechanism for inline literals is perfect, just as no escaping
+mechanism is perfect. No matter what we use, complicated inline
+expressions involving the inline literal quote and/or the backslash
+will end up looking ugly. We can only choose the least often ugly
+reStructuredText will use double backquotes for inline literals, and
+single backqoutes for interpreted text.
+There are three forms of hyperlink currently in StructuredText_:
+1. (Absolute & relative URIs.) Text enclosed by double quotes
+ followed by a colon, a URI, and concluded by punctuation plus white
+ space, or just white space, is treated as a hyperlink::
+ "Python":
+2. (Absolute URIs only.) Text enclosed by double quotes followed by a
+ comma, one or more spaces, an absolute URI and concluded by
+ punctuation plus white space, or just white space, is treated as a
+ hyperlink::
+ "mail me",
+3. (Endnotes.) Text enclosed by brackets link to an endnote at the
+ end of the document: at the beginning of the line, two dots, a
+ space, and the same text in brackets, followed by the end note
+ itself::
+ Please refer to the fine manual [GVR2001].
+ .. [GVR2001] Python Documentation, Release 2.1, van Rossum,
+ Drake, et al.,
+The problem with forms 1 and 2 is that they are neither intuitive nor
+unobtrusive (they break design goals 5 & 2). They overload
+double-quotes, which are too often used in ordinary text (potentially
+breaking design goal 4). The brackets in form 3 are also too common
+in ordinary text (such as [nested] asides and Python lists like [12]).
+1. Have no special markup for hyperlinks.
+2. A. Interpret and mark up hyperlinks as any contiguous text
+ containing '://' or ':...@' (absolute URI) or '@' (email
+ address) after an alphanumeric word. To de-emphasize the URI,
+ simply enclose it in parentheses:
+ Python (
+ B. Leave special hyperlink markup as a domain-specific extension.
+ Hyperlinks in ordinary reStructuredText documents would be
+ required to be standalone (i.e. the URI text inline in the
+ document text). Processed hyperlinks (where the URI text is
+ hidden behind the link) are important enough to warrant syntax.
+3. The original Setext_ introduced a mechanism of indirect hyperlinks.
+ A source link word ('hot word') in the text was given a trailing
+ underscore::
+ Here is some text with a hyperlink_ built in.
+ The hyperlink itself appeared at the end of the document on a line
+ by itself, beginning with two dots, a space, the link word with a
+ leading underscore, whitespace, and the URI itself::
+ .. _hyperlink
+ Setext used ``underscores_instead_of_spaces_`` for phrase links.
+With some modification, alternative 3 best satisfies the design goals.
+It has the advantage of being readable and relatively unobtrusive.
+Since each source link must match up to a target, the odd variable
+ending in an underscore can be spared being marked up (although it
+should generate a "no such link target" warning). The only
+disadvantage is that phrase-links aren't possible without some
+obtrusive syntax.
+We could achieve phrase-links if we enclose the link text:
+1. in double quotes::
+ "like this"_
+2. in brackets::
+ [like this]_
+3. or in backquotes::
+ `like this`_
+Each gives us somewhat obtrusive markup, but that is unavoidable. The
+bracketed syntax (#2) is reminiscent of links on many web pages
+(intuitive), although it is somewhat obtrusive. Alternative #3 is
+much less obtrusive, and is consistent with interpreted text: the
+trailing underscore indicates the interpretation of the phrase, as a
+hyperlink. #3 also disambiguates hyperlinks from footnote references.
+Alternative #3 wins.
+The same trailing underscore markup can also be used for footnote and
+citation references, removing the problem with ordinary bracketed text
+and Python lists::
+ Please refer to the fine manual [GVR2000]_.
+ .. [GVR2000] Python Documentation, van Rossum, Drake, et al.,
+The two-dots-and-a-space syntax was generalized by Setext for
+comments, which are removed from the (visible) processed output.
+reStructuredText uses this syntax for comments, footnotes, and link
+target, collectively termed "explicit markup". For link targets, in
+order to eliminate ambiguity with comments and footnotes,
+reStructuredText specifies that a colon always follow the link target
+word/phrase. The colon denotes 'maps to'. There is no reason to
+restrict target links to the end of the document; they could just as
+easily be interspersed.
+Internal hyperlinks (links from one point to another within a single
+document) can be expressed by a source link as before, and a target
+link with a colon but no URI. In effect, these targets 'map to' the
+element immediately following.
+As an added bonus, we now have a perfect candidate for
+reStructuredText directives, a simple extension mechanism: explicit
+markup containing a single word followed by two colons and whitespace.
+The interpretation of subsequent data on the directive line or
+following is directive-dependent.
+To summarize::
+ .. This is a comment.
+ .. The line below is an example of a directive.
+ .. version:: 1
+ This is a footnote [1]_.
+ This internal hyperlink will take us to the footnotes_ area below.
+ Here is a one-word_ external hyperlink.
+ Here is `a hyperlink phrase`_.
+ .. _footnotes:
+ .. [1] Footnote text goes here.
+ .. external hyperlink target mappings:
+ .. _one-word:
+ .. _a hyperlink phrase:
+The presence or absence of a colon after the target link
+differentiates an indirect hyperlink from a footnote, respectively. A
+footnote requires brackets. Backquotes around a target link word or
+phrase are required if the phrase contains a colon, optional
+Below are examples using no markup, the two StructuredText hypertext
+styles, and the reStructuredText hypertext style. Each example
+contains an indirect link, a direct link, a footnote/endnote, and
+bracketed text. In HTML, each example should evaluate to::
+ <P>A <A HREF="">URI</A>, see <A HREF="#eggs2000">
+ [eggs2000]</A> (in Bacon [Publisher]). Also see
+ <A HREF=""></A>.</P>
+ <P><A NAME="eggs2000">[eggs2000]</A> "Spam, Spam, Spam, Eggs,
+ Bacon, and Spam"</P>
+1. No markup::
+ A URI, see eggs2000 (in Bacon [Publisher]).
+ Also see
+ eggs2000 "Spam, Spam, Spam, Eggs, Bacon, and Spam"
+2. StructuredText absolute/relative URI syntax
+ ("text":
+ A "URI":, see [eggs2000] (in Bacon [Publisher]).
+ Also see "":
+ .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"
+ Note that StructuredText does not recognize standalone URIs,
+ forcing doubling up as shown in the second line of the example
+ above.
+3. StructuredText absolute-only URI syntax
+ ("text",
+ A "URI",, see [eggs2000] (in Bacon
+ [Publisher]). Also see "",
+ .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"
+4. reStructuredText syntax::
+ 4. A URI_, see [eggs2000]_ (in Bacon [Publisher]).
+ Also see
+ .. _URI: http:/
+ .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"
+The bracketed text '[Publisher]' may be problematic with
+StructuredText (syntax 2 & 3).
+reStructuredText's syntax (#4) is definitely the most readable. The
+text is separated from the link URI and the footnote, resulting in
+cleanly readable text.
+.. _StructuredText:
+.. _Setext:
+.. _reStructuredText:
+.. _detailed description:
+.. _STMinus:
+.. _StructuredTextNG:
+.. _README:
+ python/python/dist/src/README
+.. _Emacs table mode:
+.. _reStructuredText Markup Specification:
+ ../../ref/rst/restructuredtext.html
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/dev/semantics.txt b/docs/dev/semantics.txt
new file mode 100644
index 000000000..cd20e15f6
--- /dev/null
+++ b/docs/dev/semantics.txt
@@ -0,0 +1,119 @@
+ Docstring Semantics
+:Author: David Goodger
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This document has been placed in the public domain.
+These are notes for a possible future PEP providing the final piece of
+the Python docstring puzzle: docstring semantics or documentation
+methodology. `PEP 257`_, Docstring Conventions, sketches out some
+guidelines, but does not get into methodology details.
+I haven't explored documentation methodology more because, in my
+opinion, it is a completely separate issue from syntax, and it's even
+more controversial than syntax. Nobody wants to be told how to lay
+out their documentation, a la JavaDoc_. I think the JavaDoc way is
+butt-ugly, but it *is* an established standard for the Java world.
+Any standard documentation methodology has to be formal enough to be
+useful but remain light enough to be usable. If the methodology is
+too strict, too heavy, or too ugly, many/most will not want to use it.
+I think a standard methodology could benefit the Python community, but
+it would be a hard sell. A PEP would be the place to start. For most
+human-readable documentation needs, the free-form text approach is
+adequate. We'd only need a formal methodology if we want to extract
+the parameters into a data dictionary, index, or summary of some kind.
+(Not to be confused with Daniel Larsson's pythondoc_ project.)
+A Python version of the JavaDoc_ semantics (not syntax). A set of
+conventions which are understood by the Docutils. What JavaDoc has
+done is to establish a syntax that enables a certain documentation
+methodology, or standard *semantics*. JavaDoc is not just syntax; it
+prescribes a methodology.
+- Use field lists or definition lists for "tagged blocks". By this I
+ mean that field lists can be used similarly to JavaDoc's ``@tag``
+ syntax. That's actually one of the motivators behind field lists.
+ For example, we could have::
+ """
+ :Parameters:
+ - `lines`: a list of one-line strings without newlines.
+ - `until_blank`: Stop collecting at the first blank line if
+ true (1).
+ - `strip_indent`: Strip common leading indent if true (1,
+ default).
+ :Return:
+ - a list of indented lines with mininum indent removed;
+ - the amount of the indent;
+ - whether or not the block finished with a blank line or at
+ the end of `lines`.
+ """
+ This is taken straight out of docutils/, in which I
+ experimented with a simple documentation methodology. Another
+ variation I've thought of exploits the Grouch_-compatible
+ "classifier" element of definition lists. For example::
+ :Parameters:
+ `lines` : [string]
+ List of one-line strings without newlines.
+ `until_blank` : boolean
+ Stop collecting at the first blank line if true (1).
+ `strip_indent` : boolean
+ Strip common leading indent if true (1, default).
+- Field lists could even be used in a one-to-one correspondence with
+ JavaDoc ``@tags``, although I doubt if I'd recommend it. Several
+ ports of JavaDoc's ``@tag`` methodology exist in Python, most
+ recently Ed Loper's "epydoc_".
+Other Ideas
+- Can we extract comments from parsed modules? Could be handy for
+ documenting function/method parameters::
+ def method(self,
+ source, # path of input file
+ dest # path of output file
+ ):
+ This would save having to repeat parameter names in the docstring.
+ Idea from Mark Hammond's 1998-06-23 Doc-SIG post, "Re: [Doc-SIG]
+ Documentation tool":
+ it would be quite hard to add a new param to this method without
+ realising you should document it
+- Frederic Giacometti's `iPhrase Python documentation conventions`_ is
+ an attachment to his Doc-SIG post of 2001-05-30.
+.. _PEP 257:
+.. _JavaDoc:
+.. _pythondoc:
+.. _Grouch:
+.. _epydoc:
+.. _iPhrase Python documentation conventions:
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/dev/testing.txt b/docs/dev/testing.txt
new file mode 100644
index 000000000..bde54116f
--- /dev/null
+++ b/docs/dev/testing.txt
@@ -0,0 +1,246 @@
+ Docutils_ Testing
+:Author: Felix Wiemann
+:Author: David Goodger
+:Revision: $Revision$
+:Date: $Date$
+:Copyright: This document has been placed in the public domain.
+.. _Docutils:
+.. contents::
+When adding new functionality (or fixing bugs), be sure to add test
+cases to the test suite. Practise test-first programming; it's fun,
+it's addictive, and it works!
+This document describes how to run the Docutils test suite, how the
+tests are organized and how to add new tests or modify existing tests.
+Running the Test Suite
+Before checking in any changes, run the entire Docutils test suite to
+be sure that you haven't broken anything. From a shell::
+ cd docutils/test
+ ./
+Python Versions
+The Docutils 0.4 release supports Python 2.1 [#py21]_ or later, with
+some features only working (and being tested) with Python 2.3+.
+Therefore, you should actually have Pythons 2.1 [#py21]_, 2.2, 2.3, as
+well as the latest Python installed and always run the tests on all of
+them. (A good way to do that is to always run the test suite through
+a short script that runs ```` under each version of
+Python.) If you can't afford intalling 3 or more Python versions, the
+edge cases (2.1 and 2.3) should cover most of it.
+.. [#py21] Python 2.1 may be used providing the compiler package is
+ installed. The compiler package can be found in the Tools/
+ directory of Python 2.1's source distribution.
+Good resources covering the differences between Python versions:
+* `What's New in Python 2.2`__
+* `What's New in Python 2.3`__
+* `What's New in Python 2.4`__
+* `PEP 290 - Code Migration and Modernization`__
+.. _Python Check-in Policies:
+.. _sandbox directory:
+.. _nightly repository tarball:
+Unit Tests
+Unit tests test single functions or modules (i.e. whitebox testing).
+If you are implementing a new feature, be sure to write a test case
+covering its functionality. It happens very frequently that your
+implementation (or even only a part of it) doesn't work with an older
+(or even newer) Python version, and the only reliable way to detect
+those cases is using tests.
+Often, it's easier to write the test first and then implement the
+functionality required to make the test pass.
+Writing New Tests
+When writing new tests, it very often helps to see how a similar test
+is implemented. For example, the files in the
+``test_parsers/test_rst/`` directory all look very similar. So when
+adding a test, you don't have to reinvent the wheel.
+If there is no similar test, you can write a new test from scratch
+using Python's ``unittest`` module. For an example, please have a
+look at the following imaginary ````::
+ #! /usr/bin/env python
+ # Author: your name
+ # Contact: your email address
+ # Revision: $Revision$
+ # Date: $Date$
+ # Copyright: This module has been placed in the public domain.
+ """
+ Test module for docutils.square.
+ """
+ import unittest
+ import docutils.square
+ class SquareTest(unittest.TestCase):
+ def test_square(self):
+ self.assertEqual(docutils.square.square(0), 0)
+ self.assertEqual(docutils.square.square(5), 25)
+ self.assertEqual(docutils.square.square(7), 49)
+ def test_square_root(self):
+ self.assertEqual(docutils.square.sqrt(49), 7)
+ self.assertEqual(docutils.square.sqrt(0), 0)
+ self.assertRaises(docutils.square.SquareRootError,
+ docutils.square.sqrt, 20)
+ if __name__ == '__main__':
+ unittest.main()
+For more details on how to write tests, please refer to the
+documentation of the ``unittest`` module.
+.. _functional:
+Functional Tests
+The directory ``test/functional/`` contains data for functional tests.
+Performing functional testing means testing the Docutils system as a
+whole (i.e. blackbox testing).
+Directory Structure
++ ``functional/`` The main data directory.
+ + ``input/`` The input files.
+ - ``some_test.txt``, for example.
+ + ``output/`` The actual output.
+ - ``some_test.html``, for example.
+ + ``expected/`` The expected output.
+ - ``some_test.html``, for example.
+ + ``tests/`` The config files for processing the input files.
+ - ````, for example.
+ - ````, the `default configuration file`_.
+The Testing Process
+When running ````, all config files in
+``functional/tests/`` are processed. (Config files whose names begin
+with an underscore are ignored.) The current working directory is
+always Docutils' main test directory (``test/``).
+For example, ``functional/tests/`` could read like this::
+ # Source and destination file names.
+ test_source = "some_test.txt"
+ test_destination = "some_test.html"
+ # Keyword parameters passed to publish_file.
+ reader_name = "standalone"
+ parser_name = "rst"
+ writer_name = "html"
+ settings_overrides['output-encoding'] = 'utf-8'
+ # Relative to main ``test/`` directory.
+ settings_overrides['stylesheet_path'] = '../docutils/writers/html4css1/html4css1.css'
+The two variables ``test_source`` and ``test_destination`` contain the
+input file name (relative to ``functional/input/``) and the output
+file name (relative to ``functional/output/`` and
+``functional/expected/``). Note that the file names can be chosen
+arbitrarily. However, the file names in ``functional/output/`` *must*
+match the file names in ``functional/expected/``.
+If defined, ``_test_more`` must be a function with the following
+ def _test_more(expected_dir, output_dir, test_case, parameters):
+This function is called from the test case to perform tests beyond the
+simple comparison of expected and actual output files.
+``test_source`` and ``test_destination`` are removed from the
+namespace, as are all variables whose names begin with an underscore
+("_"). The remaining names are passed as keyword arguments to
+``docutils.core.publish_file``, so you can set reader, parser, writer
+and anything else you want to configure. Note that
+``settings_overrides`` is already initialized as a dictionary *before*
+the execution of the config file.
+Creating New Tests
+In order to create a new test, put the input test file into
+``functional/input/``. Then create a config file in
+``functional/tests/`` which sets at least input and output file names,
+reader, parser and writer.
+Now run ````. The test will fail, of course,
+because you do not have an expected output yet. However, an output
+file will have been generated in ``functional/output/``. Check this
+output file for validity and correctness. Then copy the file to
+If you rerun ```` now, it should pass.
+If you run ```` later and the actual output doesn't
+match the expected output anymore, the test will fail.
+If this is the case and you made an intentional change, check the
+actual output for validity and correctness, copy it to
+``functional/expected/`` (overwriting the old expected output), and
+commit the change.
+.. _default configuration file:
+The Default Configuration File
+The file ``functional/tests/`` contains default settings.
+It is executed just before the actual configuration files, which has
+the same effect as if the contents of ```` were prepended
+to every configuration file.
diff --git a/docs/dev/todo.txt b/docs/dev/todo.txt
new file mode 100644
index 000000000..6f1c6291d
--- /dev/null
+++ b/docs/dev/todo.txt
@@ -0,0 +1,1964 @@
+ Docutils_ To Do List
+:Author: David Goodger (with input from many); open to all Docutils
+ developers
+:Date: $Date$
+:Revision: $Revision$
+:Copyright: This document has been placed in the public domain.
+.. _Docutils:
+.. contents::
+Priority items are marked with "@" symbols. The more @s, the higher
+the priority. Items in question form (containing "?") are ideas which
+require more thought and debate; they are potential to-do's.
+Many of these items are awaiting champions. If you see something
+you'd like to tackle, please do! If there's something you'd like to
+see done but are unable to implement it yourself, please consider
+donating to Docutils: |donate|
+.. |donate| image::
+ :target:
+ :align: middle
+ :width: 88
+ :height: 32
+ :alt: Support the Docutils project!
+Please see also the Bugs_ document for a list of bugs in Docutils.
+.. _bugs: ../../BUGS.html
+Release 0.4
+We should get Docutils 0.4 out soon, but we shouldn't just cut a
+"frozen snapshot" release. Here's a list of features (achievable in
+the short term) to include:
+* [DONE in rev. 3901] Move support files to docutils/writers/support.
+* [DONE in rev. 4163] Convert ``docutils/writers/support/*`` into
+ individual writer packages.
+* [DONE in rev. 3901] Remove docutils.transforms.html.StylesheetCheck
+ (no longer needed because of the above change).
+* [DONE in rev. 3962] Incorporate new branch policy into the docs.
+ ("Development strategy" thread on Docutils-develop)
+* [DONE in rev. 4152] Added East-Asian double-width character support.
+* [DONE in rev. 4156] Merge the S5 branch.
+Anything else?
+Once released,
+* Tag it and create a maintenance branch (perhaps "maint-0-4").
+* Declare that:
+ - Docutils 0.4.x is the last version that will support Python 2.1
+ (and perhaps higher?)
+ - Docutils 0.4.x is the last version that will support (make
+ compromises for) Netscape Navigator 4
+Minimum Requirements for Python Standard Library Candidacy
+Below are action items that must be added and issues that must be
+addressed before Docutils can be considered suitable to be proposed
+for inclusion in the Python standard library.
+* Support for `document splitting`_. May require some major code
+ rework.
+* Support for subdocuments (see `large documents`_).
+* `Object numbering and object references`_.
+* `Nested inline markup`_.
+* `Python Source Reader`_.
+* The HTML writer needs to be rewritten (or a second HTML writer
+ added) to allow for custom classes, and for arbitrary splitting
+ (stack-based?).
+* Documentation_ of the architecture. Other docs too.
+* Plugin support.
+* A LaTeX writer making use of (La)TeX's power, so that the rendering
+ of the resulting documents is more easily customizable. (Similar to
+ what you wrote about a new HTML Writer.)
+* Suitability for `Python module documentation
+ <>`_.
+* Allow different report levels for STDERR and system_messages inside
+ the document?
+* Change the docutils-update script (in sandbox/infrastructure), to
+ support arbitrary branch snapshots.
+* Add a generic "container" element, equivalent to "inline", to which
+ a "class" attribute can be attached. Will require a reST directive
+ also.
+* Move some general-interest sandboxes out of individuals'
+ directories, into subprojects?
+* Add option for file (and URL) access restriction to make Docutils
+ usable in Wikis and similar applications.
+ 2005-03-21: added ``file_insertion_enabled`` & ``raw_enabled``
+ settings. These partially solve the problem, allowing or disabling
+ **all** file accesses, but not limited access.
+* Configuration file handling needs discussion:
+ - There should be some error checking on the contents of config
+ files. How much checking should be done? How loudly should
+ Docutils complain if it encounters an error/problem?
+ - Docutils doesn't complain when it doesn't find a configuration
+ file supplied with the ``--config`` option. Should it? (If yes,
+ error or warning?)
+* Internationalization:
+ - I18n needs refactoring, the language dictionaries are difficult to
+ maintain. Maybe have a look at gettext or similar tools.
+ - Language modules: in accented languages it may be useful to have
+ both accented and unaccented entries in the
+ ``bibliographic_fields`` mapping for versatility.
+ - Add a "--strict-language" option & setting: no English fallback
+ for language-dependent features.
+ - Add internationalization to _`footer boilerplate text` (resulting
+ from "--generator", "--source-link", and "--date" etc.), allowing
+ translations.
+* Add validation? See, RELAX NG, pyRXP.
+* In ``docutils.readers.get_reader_class`` (& ``parsers`` &
+ ``writers`` too), should we be importing "standalone" or
+ "docutils.readers.standalone"? (This would avoid importing
+ top-level modules if the module name is not in docutils/readers.
+ Potential nastiness.)
+* Perhaps store a _`name-to-id mapping file`? This could be stored
+ permanently, read by subsequent processing runs, and updated with
+ new entries. ("Persistent ID mapping"?)
+* Perhaps the ``Component.supports`` method should deal with
+ individual features ("meta" etc.) instead of formats ("html" etc.)?
+* Add _`object numbering and object references` (tables & figures).
+ These would be the equivalent of DocBook's "formal" elements.
+ We may need _`persistent sequences`, such as chapter numbers. See
+ ` XML`_ "fields". Should the sequences be automatic
+ or manual (user-specifyable)?
+ We need to name the objects:
+ - "name" option for the "figure" directive? ::
+ .. figure:: image.png
+ :name: image's name
+ Same for the "table" directive::
+ .. table:: optional title here
+ :name: table's name
+ ===== =====
+ x not x
+ ===== =====
+ True False
+ False True
+ ===== =====
+ This would also allow other options to be set, like border
+ styles. The same technique could be used for other objects.
+ A preliminary "table" directive has been implemented, supporting
+ table titles. Perhaps the name should derive from the title.
+ - The object could also be done this way::
+ .. _figure name:
+ .. figure:: image.png
+ This may be a more general solution, equally applicable to tables.
+ However, explicit naming using an option seems simpler to users.
+ - Perhaps the figure name could be incorporated into the figure
+ definition, as an optional inline target part of the directive
+ argument::
+ .. figure:: _`figure name` image.png
+ Maybe with a delimiter::
+ .. figure:: _`figure name`: image.png
+ Or some other, simpler syntax.
+ We'll also need syntax for object references. See `
+ XML`_ "reference fields":
+ - Parameterized substitutions? For example::
+ See |figure (figure name)| on |page (figure name)|.
+ .. |figure (name)| figure-ref:: (name)
+ .. |page (name)| page-ref:: (name)
+ The result would be::
+ See figure 3.11 on page 157.
+ But this would require substitution directives to be processed at
+ reference-time, not at definition-time as they are now. Or,
+ perhaps the directives could just leave ``pending`` elements
+ behind, and the transforms do the work? How to pass the data
+ through? Too complicated.
+ - An interpreted text approach is simpler and better::
+ See :figure:`figure name` on :page:`figure name`.
+ The "figure" and "page" roles could generate appropriate
+ boilerplate text. The position of the role (prefix or suffix)
+ could also be utilized.
+ See `Interpreted Text`_ below.
+ - We could leave the boilerplate text up to the document::
+ See Figure :fig:`figure name` on page :pg:`figure name`.
+ - Reference boilerplate could be specified in the document
+ (defaulting to nothing)::
+ .. fignum::
+ :prefix-ref: "Figure "
+ :prefix-caption: "Fig. "
+ :suffix-caption: :
+ .. XML:
+* Think about _`large documents` made up of multiple subdocument
+ files. Issues: continuity (`persistent sequences`_ above),
+ cross-references (`name-to-id mapping file`_ above and `targets in
+ other documents`_ below), splitting (`document splitting`_ below).
+ When writing a book, the author probably wants to split it up into
+ files, perhaps one per chapter (but perhaps even more detailed).
+ However, we'd like to be able to have references from one chapter to
+ another, and have continuous numbering (pages and chapters, as
+ applicable). Of course, none of this is implemented yet. There has
+ been some thought put into some aspects; see `the "include"
+ directive`__ and the `Reference Merging`_ transform below.
+ When I was working with SGML in Japan, we had a system where there
+ was a top-level coordinating file, book.sgml, which contained the
+ top-level structure of a book: the <book> element, containing the
+ book <title> and empty component elements (<preface>, <chapter>,
+ <appendix>, etc.), each with filename attributes pointing to the
+ actual source for the component. Something like this::
+ <book id="bk01">
+ <title>Title of the Book</title>
+ <preface inrefid="pr01"></preface>
+ <chapter inrefid="ch01"></chapter>
+ <chapter inrefid="ch02"></chapter>
+ <chapter inrefid="ch03"></chapter>
+ <appendix inrefid="ap01"></appendix>
+ </book>
+ (The "inrefid" attribute stood for "insertion reference ID".)
+ The processing system would process each component separately, but
+ it would recognize and use the book file to coordinate chapter and
+ page numbering, and keep a persistent ID to (title, page number)
+ mapping database for cross-references. Docutils could use a similar
+ system for large-scale, multipart documents.
+ __ ../ref/rst/directives.html#including-an-external-document-fragment
+ Aahz's idea:
+ First the ToC::
+ .. ToC-list::
+ Introduction.txt
+ Objects.txt
+ Data.txt
+ Control.txt
+ Then a sample use::
+ .. include:: ToC.txt
+ As I said earlier in chapter :chapter:`Objects.txt`, the
+ reference count gets increased every time a binding is made.
+ Which produces::
+ As I said earlier in chapter 2, the
+ reference count gets increased every time a binding is made.
+ The ToC in this form doesn't even need to be references to actual
+ reST documents; I'm simply doing it that way for a minimum of
+ future-proofing, in case I do want to add the ability to pick up
+ references within external chapters.
+ Perhaps, instead of ToC (which would overload the "contents"
+ directive concept already in use), we could use "manifest". A
+ "manifest" directive might associate local reference names with
+ files::
+ .. manifest::
+ intro: Introduction.txt
+ objects: Objects.txt
+ data: Data.txt
+ control: Control.txt
+ Then the sample becomes::
+ .. include:: manifest.txt
+ As I said earlier in chapter :chapter:`objects`, the
+ reference count gets increased every time a binding is made.
+* Add support for _`multiple output files`.
+* Add testing for Docutils' front end tools?
+* Publisher: "Ordinary setup" shouldn't requre specific ordering; at
+ the very least, there ought to be error checking higher up in the
+ call chain. [Aahz]
+ ``Publisher.get_settings`` requires that all components be set up
+ before it's called. Perhaps the I/O *objects* shouldn't be set, but
+ I/O *classes*. Then options are set up (``.set_options``), and
+ ``Publisher.set_io`` (or equivalent code) is called with source &
+ destination paths, creating the I/O objects.
+ Perhaps I/O objects shouldn't be instantiated until required. For
+ split output, the Writer may be called multiple times, once for each
+ doctree, and each doctree should have a separate Output object (with
+ a different path). Is the "Builder" pattern applicable here?
+* Perhaps I/O objects should become full-fledged components (i.e.
+ subclasses of ``docutils.Component``, as are Readers, Parsers, and
+ Writers now), and thus have associated option/setting specs and
+ transforms.
+* Multiple file I/O suggestion from Michael Hudson: use a file-like
+ object or something you can iterate over to get file-like objects.
+* Add an "--input-language" option & setting? Specify a different
+ language module for input (bibliographic fields, directives) than
+ for output. The "--language" option would set both input & output
+ languages.
+* Auto-generate reference tables for language-dependent features?
+ Could be generated from the source modules. A special command-line
+ option could be added to Docutils front ends to do this. (Idea from
+ Engelbert Gruber.)
+* Enable feedback of some kind from internal decisions, such as
+ reporting the successful input encoding. Modify runtime settings?
+ System message? Simple stderr output?
+* Rationalize Writer settings (HTML/LaTeX/PEP) -- share settings.
+* Merge docs/user/latex.txt info into tools.txt and config.txt.
+* Add an "--include file" command-line option (config setting too?),
+ equivalent to ".. include:: file" as the first line of the doc text?
+ Especially useful for character entity sets, text transform specs,
+ boilerplate, etc.
+* Parameterize the Reporter object or class? See the `2004-02-18
+ "rest checking and source path"`_ thread.
+ .. _2004-02-18 "rest checking and source path":
+* Add a "disable_transforms" setting? And a dummy Writer subclass
+ that does nothing when its .write() method is called? Would allow
+ for easy syntax checking. See the `2004-02-18 "rest checking and
+ source path"`_ thread.
+* Add a generic meta-stylesheet mechanism? An external file could
+ associate style names ("class" attributes) with specific elements.
+ Could be generalized to arbitrary output attributes; useful for HTML
+ & XMLs. Aahz implemented something like this in
+ sandbox/aahz/Effective/
+* .. _classes for table cells:
+ William Dode suggested that table cells be assigned "class"
+ attributes by columns, so that stylesheets can affect text
+ alignment. Unfortunately, there doesn't seem to be a way (in HTML
+ at least) to leverage the "colspec" elements (HTML "col" tags) by
+ adding classes to them. The resulting HTML is very verbose::
+ <td class="col1">111</td>
+ <td class="col2">222</td>
+ ...
+ At the very least, it should be an option. People who don't use it
+ shouldn't be penalized by increases in their HTML file sizes.
+ Table rows could also be assigned classes (like odd/even). That
+ would be easier to implement.
+ How should it be implemented?
+ * There could be writer options (column classes & row classes) with
+ standard values.
+ * The table directive could grow some options. Something like
+ ":cell-classes: col1 col2 col3" (either must match the number of
+ columns, or repeat to fill?) and ":row-classes: odd even" (repeat
+ to fill; body rows only, or header rows too?).
+ Probably per-table directive options are best. The "class" values
+ could be used by any writer, and applying such classes to all tables
+ in a document with writer options is too broad.
+* Add file-specific settings support to config files, like::
+ [file index.txt]
+ compact-lists: no
+ Is this even possible? Should the criterion be the name of the
+ input file or the output file?
+* The "validator" support added to OptionParser is very similar to
+ "traits_" in SciPy_. Perhaps something could be done with them?
+ (Had I known about traits when I was implementing docutils.frontend,
+ I may have used them instead of rolling my own.)
+ .. _traits:
+ .. _SciPy:
+* tools/ Extend the --prune option ("prune" config
+ setting) to accept file names (generic path) in addition to
+ directories (e.g. --prune=docs/user/rst/cheatsheet.txt, which should
+ *not* be converted to HTML).
+* Add support for _`plugins`.
+* _`Config directories`: Currently, ~/.docutils, ./docutils.conf/, &
+ /etc/docutils.conf are read as configuration files. Proposal: allow
+ ~/.docutils to be a a configuration *directory*, along with
+ /etc/docutils/ and ./docutils.conf/. Within these directories,
+ check for config.txt files. We can also have subdirectories here,
+ for plugins, S5 themes, components (readers/writers/parsers) etc.
+ Docutils will continue to support configuration files for backwards
+ compatibility.
+* Add support for document decorations other than headers & footers?
+ For example, top/bottom/side navigation bars for web pages. Generic
+ decorations?
+ Seems like a bad idea as long as it isn't independent from the ouput
+ format (for example, navigation bars are only useful for web pages).
+* docutils_update: Check for a ``Makefile`` in a directory, and run
+ ``make`` if found? This would allow for variant processing on
+ specific source files, such as running instead of
+* Add a "disable table of contents" setting? The S5 writer could set
+ it as a default. Rationale:
+ The ``contents`` (table of contents) directive must not be used
+ [in S5/HTML documents]. It changes the CSS class of headings
+ and they won't show up correctly in the screen presentation.
+ -- `Easy Slide Shows With reStructuredText & S5
+ <../user/slide-shows.html>`_
+User Docs
+* Add a FAQ entry about using Docutils (with reStructuredText) on a
+ server and that it's terribly slow. See the first paragraphs in
+ <>.
+* Add document about what Docutils has previously been used for
+ (web/use-cases.txt?).
+Developer Docs
+* Complete `Docutils Runtime Settings <../api/runtime-settings.html>`_.
+* Improve the internal module documentation (docstrings in the code).
+ Specific deficiencies listed below.
+ - docutils.parsers.rst.states.State.build_table: data structure
+ required (including StringList).
+ - docutils.parsers.rst.states: more complete documentation of parser
+ internals.
+* docs/ref/doctree.txt: DTD element structural relationships,
+ semantics, and attributes. In progress; element descriptions to be
+ completed.
+* Document the ``pending`` elements, how they're generated and what
+ they do.
+* Document the transforms (perhaps in docstrings?): how they're used,
+ what they do, dependencies & order considerations.
+* Document the HTML classes used by
+* Write an overview of the Docutils architecture, as an introduction
+ for developers. What connects to what, why, and how. Either update
+ PEP 258 (see PEPs_ below) or as a separate doc.
+* Give information about unit tests. Maybe as a howto?
+* Document the docutils.nodes APIs.
+* Complete the docs/api/publisher.txt docs.
+* Creating Docutils Writers
+* Creating Docutils Readers
+* Creating Docutils Transforms
+* Creating Docutils Parsers
+* Using Docutils as a Library
+* Complete PEP 258 Docutils Design Specification.
+ - Fill in the blanks in API details.
+ - Specify the internal data structure implementation?
+ [Tibs:] Eventually we need to have direct documentation in
+ there on how it all hangs together - the DTD is not enough
+ (indeed, is it still meant to be correct? [Yes, it is.
+ --DG]).
+* Rework PEP 257, separating style from spec from tools, wrt Docutils?
+ See Doc-SIG from 2001-06-19/20.
+Python Source Reader
+* Analyze Tony Ibbs' PySource code.
+* Analyze Doug Hellmann's HappyDoc project.
+* Investigate how POD handles literate programming.
+* Take the best ideas and integrate them into Docutils.
+Miscellaneous ideas:
+* Ask Python-dev for opinions (GvR for a pronouncement) on special
+ variables (__author__, __version__, etc.): convenience vs. namespace
+ pollution. Ask opinions on whether or not Docutils should recognize
+ & use them.
+* If we can detect that a comment block begins with ``##``, a la
+ JavaDoc, it might be useful to indicate interspersed section headers
+ & explanatory text in a module. For example::
+ """Module docstring."""
+ ##
+ # Constants
+ # =========
+ a = 1
+ b = 2
+ ##
+ # Exception Classes
+ # =================
+ class MyException(Exception): pass
+ # etc.
+* Should standalone strings also become (module/class) docstrings?
+ Under what conditions? We want to prevent arbitrary strings from
+ becomming docstrings of prior attribute assignments etc. Assume
+ that there must be no blank lines between attributes and attribute
+ docstrings? (Use lineno of NEWLINE token.)
+ Triple-quotes are sometimes used for multi-line comments (such as
+ commenting out blocks of code). How to reconcile?
+* HappyDoc's idea of using comment blocks when there's no docstring
+ may be useful to get around the conflict between `additional
+ docstrings`_ and ``from __future__ import`` for module docstrings.
+ A module could begin like this::
+ #!/usr/bin/env python
+ # :Author: Me
+ # :Copyright: whatever
+ """This is the public module docstring (``__doc__``)."""
+ # More docs, in comments.
+ # All comments at the beginning of a module could be
+ # accumulated as docstrings.
+ # We can't have another docstring here, because of the
+ # ``__future__`` statement.
+ from __future__ import division
+ Using the JavaDoc convention of a doc-comment block beginning with
+ ``##`` is useful though. It allows doc-comments and implementation
+ comments.
+ .. _additional docstrings:
+ ../peps/pep-0258.html#additional-docstrings
+* HappyDoc uses an initial comment block to set "parser configuration
+ values". Do the same thing for Docutils, to set runtime settings on
+ a per-module basis? I.e.::
+ # Docutils:setting=value
+ Could be used to turn on/off function parameter comment recognition
+ & other marginal features. Could be used as a general mechanism to
+ augment config files and command-line options (but which takes
+ precedence?).
+* Multi-file output should be divisible at arbitrary level.
+* Support all forms of ``import`` statements:
+ - ``import module``: listed as "module"
+ - ``import module as alias``: "alias (module)"
+ - ``from module import identifier``: "identifier (from module)"
+ - ``from module import identifier as alias``: "alias (identifier
+ from module)"
+ - ``from module import *``: "all identifiers (``*``) from module"
+* Have links to colorized Python source files from API docs? And
+ vice-versa: backlinks from the colorized source files to the API
+ docs!
+* In summaries, use the first *sentence* of a docstring if the first
+ line is not followed by a blank line.
+reStructuredText Parser
+Also see the `... Or Not To Do?`__ list.
+__ rst/alternatives.html#or-not-to-do
+* Treat enumerated lists that are not arabic and consist of only one
+ item in a single line as ordinary paragraphs. See
+ <>.
+* The citation syntax could use some enhancements. See
+ <> and
+ <>.
+* The current list-recognition logic has too many false positives, as
+ in ::
+ * Aorta
+ * V. cava superior
+ * V. cava inferior
+ Here ``V.`` is recognized as an enumerator, which leads to
+ confusion. We need to find a solution that resolves such problems
+ without complicating the spec to much.
+ See <>.
+* Add indirect links via citation references & footnote references.
+ Example::
+ `Goodger (2005)`_ is helpful.
+ .. _Goodger (2005): [goodger2005]_
+ .. [goodger2005] citation text
+ See <>.
+* Allow multiple block quotes, only separated by attributions
+ (, e.g.::
+ quote 1
+ ---Attrib 1
+ quote 2
+ ---Attrib 2
+* Change the specification so that more punctuation is allowed
+ before/after inline markup start/end string
+ (
+* Complain about bad URI characters
+ ( and
+ disallow internal whitespace
+ (
+* Create ``info``-level system messages for unnecessarily
+ backslash-escaped characters (as in ``"\something"``, rendered as
+ "something") to allow checking for errors which silently slipped
+ through.
+* Add (functional) tests for untested roles.
+* Add test for ":figwidth: image" option of "figure" directive. (Test
+ code needs to check if PIL is available on the system.)
+* Add support for CJK double-width whitespace (indentation) &
+ punctuation characters (markup; e.g. double-width "*", "-", "+")?
+* Add motivation sections for constructs in spec.
+* Support generic hyperlink references to _`targets in other
+ documents`? Not in an HTML-centric way, though (it's trivial to say
+ ````, and useless in non-HTML
+ contexts). XLink/XPointer? ``.. baseref::``? See Doc-SIG
+ 2001-08-10.
+* .. _adaptable file extensions:
+ In target URLs, it would be useful to not explicitly specify the
+ file extension. If we're generating HTML, then ".html" is
+ appropriate; if PDF, then ".pdf"; etc. How about using ".*" to
+ indicate "choose the most appropriate filename extension"? For
+ example::
+ .. _Another Document: another.*
+ What is to be done for output formats that don't *have* hyperlinks?
+ For example, LaTeX targeted at print. Hyperlinks may be "called
+ out", as footnotes with explicit URLs.
+ But then there's also LaTeX targeted at PDFs, which *can* have
+ links. Perhaps a runtime setting for "*" could explicitly provide
+ the extension, defaulting to the output file's extension.
+ Should the system check for existing files? No, not practical.
+ Handle documents only, or objects (images, etc.) also?
+ If this handles images also, how to differentiate between document
+ and image links? Element context (within "image")? Which image
+ extension to use for which document format? Again, a runtime
+ setting would suffice.
+ This may not be just a parser issue; it may need framework support.
+ Mailing list threads: `Images in both HTML and LaTeX`__ (especially
+ `this summary of Felix's objections`__), `more-universal links?`__,
+ `Output-format-sensitive link targets?`__
+ __
+ __
+ __
+ __
+* Implement the header row separator modification to table.el. (Wrote
+ to Takaaki Ota & the table.el mailing list on 2001-08-12, suggesting
+ support for "=====" header rows. On 2001-08-17 he replied, saying
+ he'd put it on his to-do list, but "don't hold your breath".)
+* Fix the parser's indentation handling to conform with the stricter
+ definition in the spec. (Explicit markup blocks should be strict or
+ forgiving?)
+ .. XXX What does this mean? Can you elaborate, David?
+* Make the parser modular. Allow syntax constructs to be added or
+ disabled at run-time. Subclassing is probably not enough because it
+ makes it difficult to apply multiple extensions.
+* Generalize the "doctest block" construct (which is overly
+ Python-centric) to other interactive sessions? "Doctest block"
+ could be renamed to "I/O block" or "interactive block", and each of
+ these could also be recognized as such by the parser:
+ - Shell sessions::
+ $ cat example1.txt
+ A block beginning with a "$ " prompt is interpreted as a shell
+ session interactive block. As with Doctest blocks, the
+ interactive block ends with the first blank line, and wouldn't
+ have to be indented.
+ - Root shell sessions::
+ # cat example2.txt
+ A block beginning with a "# " prompt is interpreted as a root
+ shell session (the user is or has to be logged in as root)
+ interactive block. Again, the block ends with a blank line.
+ Other standard (and unambiguous) interactive session prompts could
+ easily be added (such as "> " for WinDOS).
+ Tony Ibbs spoke out against this idea (2002-06-14 Doc-SIG thread
+ "docutils feedback").
+* The "doctest" element should go away. The construct could simply be
+ a front-end to generic literal blocks. We could immediately (in
+ 0.4, or 0.5) remove the doctest node from the doctree, but leave the
+ syntax in reST. The reST parser could represent doctest blocks as
+ literal blocks with a class attribute. The syntax could be left in
+ reST for a set period of time.
+* Add support for pragma (syntax-altering) directives.
+ Some pragma directives could be local-scope unless explicitly
+ specified as global/pragma using ":global:" options.
+* Support whitespace in angle-bracketed standalone URLs according to
+ Appendix E ("Recommendations for Delimiting URI in Context") of `RFC
+ 2396`_.
+ .. _RFC 2396:
+* Use the vertical spacing of the source text to determine the
+ corresponding vertical spacing of the output?
+* [From Mark Nodine] For cells in simple tables that comprise a
+ single line, the justification can be inferred according to the
+ following rules:
+ 1. If the text begins at the leftmost column of the cell,
+ then left justification, ELSE
+ 2. If the text begins at the rightmost column of the cell,
+ then right justification, ELSE
+ 3. Center justification.
+ The onus is on the author to make the text unambiguous by adding
+ blank columns as necessary. There should be a parser setting to
+ turn off justification-recognition (normally on would be fine).
+ Decimal justification?
+ All this shouldn't be done automatically. Only when it's requested
+ by the user, e.g. with something like this::
+ .. table::
+ :auto-indent:
+ (Table goes here.)
+ Otherwise it will break existing documents.
+* Generate a warning or info message for paragraphs which should have
+ been lists, like this one::
+ 1. line one
+ 3. line two
+* Generalize the "target-notes" directive into a command-line option
+ somehow? See docutils-develop 2003-02-13.
+* Allow a "::"-only paragraph (first line, actually) to introduce a
+ _`literal block without a blank line`? (Idea from Paul Moore.) ::
+ ::
+ This is a literal block
+ Is indentation enough to make the separation between a paragraph
+ which contains just a ``::`` and the literal text unambiguous?
+ (There's one problem with this concession: If one wants a definition
+ list item which defines the term "::", we'd have to escape it.) It
+ would only be reasonable to apply it to "::"-only paragraphs though.
+ I think the blank line is visually necessary if there's text before
+ the "::"::
+ The text in this paragraph needs separation
+ from the literal block following::
+ This doesn't look right.
+* Add new syntax for _`nested inline markup`? Or extend the parser to
+ parse nested inline markup somehow? See the `collected notes
+ <rst/alternatives.html#nested-inline-markup>`__.
+* Drop the backticks from embedded URIs with omitted reference text?
+ Should the angle brackets be kept in the output or not? ::
+ <file_name>_
+ Probably not worth the trouble.
+* Add _`math markup`. We should try for a general solution, that's
+ applicable to any output format. Using a standard, such as MathML_,
+ would be best. TeX (or itex_) would be acceptable as a *front-end*
+ to MathML. See `the culmination of a relevant discussion
+ <>`__.
+ Both a directive and an interpreted text role will be necessary (for
+ each markup). Directive example::
+ .. itex::
+ \alpha_t(i) = P(O_1, O_2, \dots O_t, q_t = S_i \lambda)
+ The same thing inline::
+ The equation in question is :itex:`\alpha_t(i) = P(O_1, O_2,
+ \dots O_t, q_t = S_i \lambda)`.
+ .. _MathML:
+ .. _itex:
+* How about a syntax for alternative hyperlink behavior, such as "open
+ in a new window" (as in HTML's ``<a target="_blank">``)? Double
+ angle brackets might work for inline targets::
+ The `reference docs <<url>>`__ may be handy.
+ But what about explicit targets?
+ The MoinMoin wiki uses a caret ("^") at the beginning of the URL
+ ("^" is not a legal URI character). That could work for both inline
+ and explicit targets::
+ The `reference docs <^url>`__ may be handy.
+ .. _name: ^url
+ This may be too specific to HTML. It hasn't been requested very
+ often either.
+* Add an option to add URI schemes at runtime.
+* _`Segmented lists`::
+ : segment : segment : segment
+ : segment : segment : very long
+ segment
+ : segment : segment : segment
+ The initial colon (":") can be thought of as a type of bullet
+ We could even have segment titles::
+ :: title : title : title
+ : segment : segment : segment
+ : segment : segment : segment
+ This would correspond well to DocBook's SegmentedList. Output could
+ be tabular or "name: value" pairs, as described in DocBook's docs.
+* Allow backslash-escaped colons in field names::
+ :Case Study\: Event Handling: This chapter will be dropped.
+* _`footnote spaces`:
+ When supplying the command line options
+ --footnote-references=brackets and --use-latex-footnotes with the
+ LaTeX writer (which might very well happen when using configuration
+ files), the spaces in front of footnote references aren't trimmed.
+* Enable grid _`tables inside XML comments`, where "--" ends comments.
+ I see three implementation possibilities:
+ 1. Make the table syntax characters into "table" directive options.
+ This is the most flexible but most difficult, and we probably
+ don't need that much flexibility.
+ 2. Substitute "~" for "-" with a specialized directive option
+ (e.g. ":tildes:").
+ 3. Make the standard table syntax recognize "~" as well as "-", even
+ without a directive option. Individual tables would have to be
+ internally consistent.
+ Directive options are preferable to configuration settings, because
+ tables are document-specific. A pragma directive would be another
+ approach, to set the syntax once for a whole document.
+ In the meantime, the list-table_ directive is a good replacement for
+ grid tables inside XML comments.
+ .. _list-table: ../ref/rst/directives.html#list-table
+* Generalize docinfo contents (bibliographic fields): remove specific
+ fields, and have only a single generic "field"?
+Directives below are often referred to as "module.directive", the
+directive function. The "module." is not part of the directive name
+when used in a document.
+* Make the _`directive interface` object-oriented
+ (
+* Allow for field lists in list tables. See
+ <>.
+* .. _unify tables:
+ Unify table implementations and unify options of table directives
+ (
+* Allow directives to be added at run-time?
+* Use the language module for directive option names?
+* Add "substitution_only" and "substitution_ok" function attributes,
+ and automate context checking?
+* Change directive functions to directive classes? Superclass'
+ ``__init__()`` could handle all the bookkeeping.
+* Implement options or features on existing directives:
+ - Add a "name" option to directives, to set an author-supplied
+ identifier?
+ - All directives that produce titled elements should grow implicit
+ reference names based on the titles.
+ - Allow the _`:trim:` option for all directives when they occur in a
+ substitution definition, not only the unicode_ directive.
+ .. _unicode: ../ref/rst/directives.html#unicode-character-codes
+ - _`images.figure`: "title" and "number", to indicate a formal
+ figure?
+ - _`parts.sectnum`: "local"?, "refnum"
+ A "local" option could enable numbering for sections from a
+ certain point down, and sections in the rest of the document are
+ not numbered. For example, a reference section of a manual might
+ be numbered, but not the rest. OTOH, an all-or-nothing approach
+ would probably be enough.
+ The "sectnum" directive should be usable multiple times in a
+ single document. For example, in a long document with "chapter"
+ and "appendix" sections, there could be a second "sectnum" before
+ the first appendix, changing the sequence used (from 1,2,3... to
+ A,B,C...). This is where the "local" concept comes in. This part
+ of the implementation can be left for later.
+ A "refnum" option (better name?) would insert reference names
+ (targets) consisting of the reference number. Then a URL could be
+ of the form ``http://host/document.html#2.5`` (or "2-5"?). Allow
+ internal references by number? Allow name-based *and*
+ number-based ids at the same time, or only one or the other (which
+ would the table of contents use)? Usage issue: altering the
+ section structure of a document could render hyperlinks invalid.
+ - _`parts.contents`: Add a "suppress" or "prune" option? It would
+ suppress contents display for sections in a branch from that point
+ down. Or a new directive, like "prune-contents"?
+ Add an option to include topics in the TOC? Another for sidebars?
+ The "topic" directive could have a "contents" option, or the
+ "contents" directive" could have an "include-topics" option. See
+ docutils-develop 2003-01-29.
+ - _`parts.header` & _`parts.footer`: Support multiple, named headers
+ & footers? For example, separate headers & footers for odd, even,
+ and the first page of a document.
+ This may be too specific to output formats which have a notion of
+ "pages".
+ - _`misc.class`:
+ - Add a ``:parent:`` option for setting the parent's class
+ (
+ - _`misc.include`:
+ - Option to select a range of lines?
+ - Option to label lines?
+ - How about an environment variable, say RSTINCLUDEPATH or
+ RSTPATH, for standard includes (as in ``.. include:: <name>``)?
+ This could be combined with a setting/option to allow
+ user-defined include directories.
+ - Add support for inclusion by URL? ::
+ .. include::
+ :url:
+ - _`misc.raw`: add a "destination" option to the "raw" directive? ::
+ .. raw:: html
+ :destination: head
+ <link ...>
+ It needs thought & discussion though, to come up with a consistent
+ set of destination labels and consistent behavior.
+ And placing HTML code inside the <head> element of an HTML
+ document is rather the job of a templating system.
+ - _`body.sidebar`: Allow internal section structure? Adornment
+ styles would be independent of the main document.
+ That is really complicated, however, and the document model
+ greatly benefits from its simplicity.
+* Implement directives. Each of the list items below begins with an
+ identifier of the form, "module_name.directive_function_name". The
+ directive name itself could be the same as the
+ directive_function_name, or it could differ.
+ - _`html.imagemap`
+ It has the disadvantage that it's only easily implementable for
+ HTML, so it's specific to one output format.
+ (For non-HTML writers, the imagemap would have to be replaced with
+ the image only.)
+ - _`parts.endnotes` (or "footnotes"): See `Footnote & Citation Gathering`_.
+ - _`parts.citations`: See `Footnote & Citation Gathering`_.
+ - _`misc.language`: Specify (= change) the language of a document at
+ parse time.
+ - _`misc.settings`: Set any(?) Docutils runtime setting from within
+ a document? Needs much thought and discussion.
+ - _`misc.gather`: Gather (move, or copy) all instances of a specific
+ element. A generalization of the "endnotes" & "citations" ideas.
+ - Add a custom "directive" directive, equivalent to "role"? For
+ example::
+ .. directive:: incr
+ .. class:: incremental
+ .. incr::
+ "``.. incr::``" above is equivalent to "``.. class:: incremental``".
+ Another example::
+ .. directive:: printed-links
+ .. topic:: Links
+ :class: print-block
+ .. target-notes::
+ :class: print-inline
+ This acts like macros. The directive contents will have to be
+ evaluated when referenced, not when defined.
+ * Needs a better name? "Macro", "substitution"?
+ * What to do with directive arguments & options when the
+ macro/directive is referenced?
+ - .. _conditional directives:
+ Docutils already has the ability to say "use this content for
+ Writer X" (via the "raw" directive), but it doesn't have the
+ ability to say "use this content for any Writer other than X". It
+ wouldn't be difficult to add this ability though.
+ My first idea would be to add a set of conditional directives.
+ Let's call them "writer-is" and "writer-is-not" for discussion
+ purposes (don't worry about implemention details). We might
+ have::
+ .. writer-is:: text-only
+ ::
+ +----------+
+ | SNMP |
+ +----------+
+ | UDP |
+ +----------+
+ | IP |
+ +----------+
+ | Ethernet |
+ +----------+
+ .. writer-is:: pdf
+ .. figure:: protocol_stack.eps
+ .. writer-is-not:: text-only pdf
+ .. figure:: protocol_stack.png
+ This could be an interface to the Filter transform
+ (docutils.transforms.components.Filter).
+ The ideas in `adaptable file extensions`_ above may also be
+ applicable here.
+ SVG's "switch" statement may provide inspiration.
+ Here's an example of a directive that could produce multiple
+ outputs (*both* raw troff pass-through *and* a GIF, for example)
+ and allow the Writer to select. ::
+ .. eqn::
+ .EQ
+ delim %%
+ .EN
+ %sum from i=o to inf c sup i~=~lim from {m -> inf}
+ sum from i=0 to m sup i%
+ .EQ
+ delim off
+ .EN
+ - _`body.example`: Examples; suggested by Simon Hefti. Semantics as
+ per Docbook's "example"; admonition-style, numbered, reference,
+ with a caption/title.
+ - _`body.index`: Index targets.
+ See `Index Entries & Indexes
+ <./rst/alternatives.html#index-entries-indexes>`__.
+ - _`body.literal`: Literal block, possibly "formal" (see `object
+ numbering and object references`_ above). Possible options:
+ - "highlight" a range of lines
+ - include only a specified range of lines
+ - "number" or "line-numbers"
+ - "styled" could indicate that the directive should check for
+ style comments at the end of lines to indicate styling or
+ markup.
+ Specific derivatives (i.e., a "python-interactive" directive)
+ could interpret style based on cues, like the ">>> " prompt and
+ "input()"/"raw_input()" calls.
+ See docutils-users 2003-03-03.
+ - _`body.listing`: Code listing with title (to be numbered
+ eventually), equivalent of "figure" and "table" directives.
+ - _`colorize.python`: Colorize Python code. Fine for HTML output,
+ but what about other formats? Revert to a literal block? Do we
+ need some kind of "alternate" mechanism? Perhaps use a "pending"
+ transform, which could switch its output based on the "format" in
+ use. Use a factory function "transformFF()" which returns either
+ "HTMLTransform()" instance or "GenericTransform" instance?
+ If we take a Python-to-HTML pretty-printer and make it output a
+ Docutils internal doctree (as per instead of HTML, then
+ each output format's stylesheet (or equivalent) mechanism could
+ take care of the rest. The pretty-printer code could turn this
+ doctree fragment::
+ <literal_block xml:space="preserve">
+ print 'This is Python code.'
+ for i in range(10):
+ print i
+ </literal_block>
+ into something like this ("</>" is end-tag shorthand)::
+ <literal_block xml:space="preserve" class="python">
+ <keyword>print</> <string>'This is Python code.'</>
+ <keyword>for</> <identifier>i</> <keyword
+ >in</> <expression>range(10)</>:
+ <keyword>print</> <expression>i</>
+ </literal_block>
+ But I'm leaning toward adding a single new general-purpose
+ element, "phrase", equivalent to HTML's <span>. Here's the
+ example rewritten using the generic "phrase"::
+ <literal_block xml:space="preserve" class="python">
+ <phrase class="keyword">print</> <phrase
+ class="string">'This is Python code.'</>
+ <phrase class="keyword">for</> <phrase
+ class="identifier">i</> <phrase class="keyword">in</> <phrase
+ class="expression">range(10)</>:
+ <phrase class="keyword">print</> <phrase
+ class="expression">i</>
+ </literal_block>
+ It's more verbose but more easily extensible and more appropriate
+ for the case at hand. It allows us to edit style sheets to add
+ support for new formats, not the Docutils code itself.
+ Perhaps a single directive with a format parameter would be
+ better::
+ .. colorize:: python
+ print 'This is Python code.'
+ for i in range(10):
+ print i
+ But directives can have synonyms for convenience. "format::
+ python" was suggested, but "format" seems too generic.
+ - _`pysource.usage`: Extract a usage message from the program,
+ either by running it at the command line with a ``--help`` option
+ or through an exposed API. [Suggestion for Optik.]
+Interpreted Text
+Interpreted text is entirely a reStructuredText markup construct, a
+way to get around built-in limitations of the medium. Some roles are
+intended to introduce new doctree elements, such as "title-reference".
+Others are merely convenience features, like "RFC".
+All supported interpreted text roles must already be known to the
+Parser when they are encountered in a document. Whether pre-defined
+in core/client code, or in the document, doesn't matter; the roles
+just need to have already been declared. Adding a new role may
+involve adding a new element to the DTD and may require extensive
+support, therefore such additions should be well thought-out. There
+should be a limited number of roles.
+The only place where no limit is placed on variation is at the start,
+at the Reader/Parser interface. Transforms are inserted by the Reader
+into the Transformer's queue, where non-standard elements are
+converted. Once past the Transformer, no variation from the standard
+Docutils doctree is possible.
+An example is the Python Source Reader, which will use interpreted
+text extensively. The default role will be "Python identifier", which
+will be further interpreted by namespace context into <class>,
+<method>, <module>, <attribute>, etc. elements (see pysource.dtd),
+which will be transformed into standard hyperlink references, which
+will be processed by the various Writers. No Writer will need to have
+any knowledge of the Python-Reader origin of these elements.
+* Add explicit interpreted text roles for the rest of the implicit
+ inline markup constructs: named-reference, anonymous-reference,
+ footnote-reference, citation-reference, substitution-reference,
+ target, uri-reference (& synonyms).
+* Add directives for each role as well? This would allow indirect
+ nested markup::
+ This text contains |nested inline markup|.
+ .. |nested inline markup| emphasis::
+ nested ``inline`` markup
+* Implement roles:
+ - "_`raw-wrapped`" (or "_`raw-wrap`"): Base role to wrap raw text
+ around role contents.
+ For example, the following reStructuredText source ... ::
+ .. role:: red(raw-formatting)
+ :prefix:
+ :html: <font color="red">
+ :latex: {\color{red}
+ :suffix:
+ :html: </font>
+ :latex: }
+ colored :red:`text`
+ ... will yield the following document fragment::
+ <paragraph>
+ colored
+ <inline classes="red">
+ <raw format="html">
+ <font color="red">
+ <raw format="latex">
+ {\color{red}
+ <inline classes="red">
+ text
+ <raw format="html">
+ </font>
+ <raw format="latex">
+ }
+ Possibly without the intermediate "inline" node.
+ - "acronym" and "abbreviation": Associate the full text with a short
+ form. Jason Diamond's description:
+ I want to translate ```reST`:acronym:`` into ``<acronym
+ title='reStructuredText'>reST</acronym>``. The value of the
+ title attribute has to be defined out-of-band since you can't
+ parameterize interpreted text. Right now I have them in a
+ separate file but I'm experimenting with creating a directive
+ that will use some form of reST syntax to let you define them.
+ Should Docutils complain about undefined acronyms or
+ abbreviations?
+ What to do if there are multiple definitions? How to
+ differentiate between CSS (Content Scrambling System) and CSS
+ (Cascading Style Sheets) in a single document? David Priest
+ responds,
+ The short answer is: you don't. Anyone who did such a thing
+ would be writing very poor documentation indeed. (Though I
+ note that `somewhere else in the docs`__, there's mention of
+ allowing replacement text to be associated with the
+ abbreviation. That takes care of the duplicate
+ acronyms/abbreviations problem, though a writer would be
+ foolish to ever need it.)
+ __ `inline parameter syntax`_
+ How to define the full text? Possibilities:
+ 1. With a directive and a definition list? ::
+ .. acronyms::
+ reST
+ reStructuredText
+ Docstring Processing System
+ Would this list remain in the document as a glossary, or would
+ it simply build an internal lookup table? A "glossary"
+ directive could be used to make the intention clear.
+ Acronyms/abbreviations and glossaries could work together.
+ Then again, a glossary could be formed by gathering individual
+ definitions from around the document.
+ 2. Some kind of `inline parameter syntax`_? ::
+ `reST <reStructuredText>`:acronym: is `WYSIWYG <what you
+ see is what you get>`:acronym: plaintext markup.
+ .. _inline parameter syntax:
+ rst/alternatives.html#parameterized-interpreted-text
+ 3. A combination of 1 & 2?
+ The multiple definitions issue could be handled by establishing
+ rules of priority. For example, directive-based lookup tables
+ have highest priority, followed by the first inline definition.
+ Multiple definitions in directive-based lookup tables would
+ trigger warnings, similar to the rules of `implicit hyperlink
+ targets`__.
+ __ ../ref/rst/restructuredtext.html#implicit-hyperlink-targets
+ 4. Using substitutions? ::
+ .. |reST| acronym:: reST
+ :text: reStructuredText
+ What do we do for other formats than HTML which do not support
+ tool tips? Put the full text in parentheses?
+ - "figure", "table", "listing", "chapter", "page", etc: See `object
+ numbering and object references`_ above.
+ - "glossary-term": This would establish a link to a glossary. It
+ would require an associated "glossary-entry" directive, whose
+ contents could be a definition list::
+ .. glossary-entry::
+ term1
+ definition1
+ term2
+ definition2
+ This would allow entries to be defined anywhere in the document,
+ and collected (via a "glossary" directive perhaps) at one point.
+Unimplemented Transforms
+* _`Footnote & Citation Gathering`
+ Collect and move footnotes & citations to the end of a document.
+ (Separate transforms.)
+* _`Reference Merging`
+ When merging two or more subdocuments (such as docstrings),
+ conflicting references may need to be resolved. There may be:
+ * duplicate reference and/or substitution names that need to be made
+ unique; and/or
+ * duplicate footnote numbers that need to be renumbered.
+ Should this be done before or after reference-resolving transforms
+ are applied? What about references from within one subdocument to
+ inside another?
+* _`Document Splitting`
+ If the processed document is written to multiple files (possibly in
+ a directory tree), it will need to be split up. Internal references
+ will have to be adjusted.
+ (HTML only? Initially, yes. Eventually, anything should be
+ splittable.)
+ Ideas:
+ - Insert a "destination" attribute into the root element of each
+ split-out document, containing the path/filename. The Output
+ object or Writer will recognize this attribute and split out the
+ files accordingly. Must allow for common headers & footers,
+ prev/next, breadcrumbs, etc.
+ - Transform a single-root document into a document containing
+ multiple subdocuments, recursively. The content model of the
+ "document" element would have to change to::
+ <!ELEMENT document
+ ( (title, subtitle?)?,
+ decoration?,
+ (docinfo, transition?)?,
+ %structure.model;,
+ document* )>
+ (I.e., add the last line -- 0 or more document elements.)
+ Let's look at the case of hierarchical (directories and files)
+ HTML output. Each document element containing further document
+ elements would correspond to a directory (with an index.html file
+ for the content preceding the subdocuments). Each document
+ element containing no subdocuments (i.e., structure model elements
+ only) corresponds to a concrete file with no directory.
+ The natural transform would be to map sections to subdocuments,
+ but possibly only a given number of levels deep.
+* _`Navigation`
+ If a document is split up, each segment will need navigation links:
+ parent, children (small TOC), previous (preorder), next (preorder).
+ Part of `Document Splitting`_?
+* _`List of System Messages`
+ The ``system_message`` elements are inserted into the document tree,
+ adjacent to the problems themselves where possible. Some (those
+ generated post-parse) are kept until later, in
+ ``document.messages``, and added as a special final section,
+ "Docutils System Messages".
+ Docutils could be made to generate hyperlinks to all known
+ system_messages and add them to the document, perhaps to the end of
+ the "Docutils System Messages" section.
+ Fred L. Drake, Jr. wrote:
+ I'd like to propose that both parse- and transformation-time
+ messages are included in the "Docutils System Messages" section.
+ If there are no objections, I can make the change.
+ The advantage of the current way of doing things is that parse-time
+ system messages don't require a transform; they're already in the
+ document. This is valuable for testing (unit tests,
+ tools/ So if we do decide to make a change, I think
+ the insertion of parse-time system messages ought to remain as-is
+ and the Messages transform ought to move all parse-time system
+ messages (remove from their originally inserted positions, insert in
+ System Messages section).
+* _`Index Generation`
+HTML Writer
+* Add support for _`multiple stylesheets`. See
+ <>.
+* Idea for field-list rendering: hanging indent::
+ Field name (bold): First paragraph of field body begins
+ with the field name inline.
+ If the first item of a field body is not a paragraph,
+ it would begin on the following line.
+* Add more support for <link> elements, especially for navigation
+ bars.
+ The framework does not have a notion of document relationships, so
+ probably raw.destination_ should be used.
+ We'll have framework support for document relationships when support
+ for `multiple output files`_ is added. The HTML writer could
+ automatically generate <link> elements then.
+ .. _raw.destination: misc.raw_
+* Base list compaction on the spacing of source list? Would require
+ parser support. (Idea: fantasai, 16 Dec 2002, doc-sig.)
+* Add a tool tip ("title" attribute?) to footnote back-links
+ identifying them as such. Text in Docutils language module.
+PEP/HTML Writer
+* Remove the generic style information (duplicated from html4css1.css)
+ from pep.css to avoid redundancy.
+ We need support for `multiple stylesheets`_ first, though.
+LaTeX writer
+* Add an ``--embed-stylesheet`` (and ``--link-stylesheet``) option.
+HTML SlideShow Writer
+Add a Writer for presentations, derivative of the HTML Writer. Given
+an input document containing one section per slide, the output would
+consist of a master document for the speaker, and a slide file (or set
+of filess, one (or more) for each slide). Each slide would contain
+the slide text (large, stylesheet-controlled) and images, plus "next"
+and "previous" links in consistent places. The speaker's master
+document would contain a small version of the slide text with
+speaker's notes interspersed. The master document could use
+``target="whatever"`` to direct links to a separate window on a second
+monitor (e.g., a projector).
+* Base the output on |S5|_. I discovered |S5| a few weeks before it
+ appeared on Slashdot, after writing most of this section. It turns
+ out that |S5| does most of what I wanted.
+ Chris Liechti has `integrated S5 with the HTML writer
+ <>`__.
+ .. |S5| replace:: S\ :sup:`5`
+ .. _S5:
+Below, "[S5]" indicates that |S5| already implements the feature or
+may implement all or part of the feature. "[S5 1.1]" indicates that
+|S5| version 1.1 implements the feature (a preview of the 1.1 beta is
+available in the `S5 testbed`_).
+.. _S5 testbed:
+Features & issues:
+* [S5 1.1] Incremental slides, where each slide adds to the one before
+ (ticking off items in a list, delaying display of later items). The
+ speaker's master document would list each transition in the TOC and
+ provide links in the content.
+ * Use transitions to separate stages. Problem with transitions is
+ that they can't be used everywhere -- not, for example, within a
+ list (see the example below).
+ * Use a special directive to separate stages. Possible names:
+ pause, delay, break, cut, continue, suspend, hold, stay, stop.
+ Should the directive be available in all contexts (and ineffectual
+ in all but SlideShow context), or added at runtime by the
+ SlideShow Writer? Probably such a "pause" directive should only
+ be available for slide shows; slide shows are too much of a
+ special case to justify adding a directive (and node?) to the
+ core.
+ The directive could accept text content, which would be rendered
+ while paused but would disappear when the slide is continued (the
+ text could also be a link to the next slide). In the speaker's
+ master document, the text "paused:" could appear, prefixed to the
+ directive text.
+ * Use a special directive or class to declare incremental content.
+ This works best with the S5 ideas. For example::
+ Slide Title
+ ===========
+ .. incremental::
+ * item one
+ * item two
+ * item three
+ Add an option to make all bullet lists implicitly incremental?
+* Speaker's notes -- how to intersperse? Could use reST comments
+ (".."), but make them visible in the speaker's master document. If
+ structure is necessary, we could use a "comment" directive (to avoid
+ nonsensical DTD changes, the "comment" directive could produce an
+ untitled topic element).
+ The speaker's notes could (should?) be separate from S5's handout
+ content.
+* The speaker's master document could use frames for easy navigation:
+ TOC on the left, content on the right.
+ - It would be nice if clicking in the TOC frame simultaneously
+ linked to both the speaker's notes frame and to the slide window,
+ synchronizing both. Needs JavaScript?
+ - TOC would have to be tightly formatted -- minimal indentation.
+ - TOC auto-generated, as in the PEP Reader. (What if there already
+ is a "contents" directive in the document?)
+ - There could be another frame on the left (top-left or bottom-left)
+ containing a single "Next" link, always pointing to the next slide
+ (synchronized, of course). Also "Previous" link? FF/Rew go to
+ the beginning of the next/current parent section? First/Last
+ also? Tape-player-style buttons like ``|<< << < > >> >>|``?
+* [S5] Need to support templating of some kind, for uniform slide
+ layout. S5 handles this via CSS.
+ Build in support for limited features? E.g., top/bottom or
+ left/right banners, images on each page, background color and/or
+ image, etc.
+* [S5?] One layout for all slides, or allow some variation?
+ While S5 seems to support only one style per HTML file, it's
+ pretty easy to split a presentation in different files and
+ insert a hyperlink to the last slide of the first part and load
+ the second part by a click on it.
+ -- Chris Liechti
+* For nested sections, do we show the section's ancestry on each
+ slide? Optional? No -- leave the implementation to someone who
+ wants it.
+* [S5] Stylesheets for slides:
+ - Tweaked for different resolutions, 1024x768 etc.
+ - Some layout elements have fixed positions.
+ - Text must be quite large.
+ - Allow 10 lines of text per slide? 15?
+ - Title styles vary by level, but not so much?
+* [not required with S5.] Need a transform to number slides for
+ output filenames?, and for hyperlinks?
+* Directive to begin a new, untitled (blank) slide?
+* Directive to begin a new slide, continuation, using the same title
+ as the previous slide? (Unnecessary?)
+* Have a timeout on incremental items, so the colour goes away after 1
+ second.
+Here's an example that I was hoping to show at PyCon DC 2005::
+ ========================
+ The Docutils SlideShow
+ ========================
+ Welcome To The Docutils SlideShow!
+ ==================================
+ .. pause::
+ David Goodger
+ .. (introduce yourself)
+ Hi, I'm David Goodger from Montreal, Canada.
+ I've been working on Docutils since 2000.
+ Time flies!
+ .. pause::
+ Docutils
+ .. I also volunteer as a Python Enhancement Proposal (or PEP)
+ editor.
+ .. SlideShow is a new feature of Docutils. This presentation was
+ written using the Docutils SlideShow system. The slides you
+ are seeing are HTML, rendered by a standard Mozilla Firefox
+ browser.
+ The Docutils SlideShow System
+ =============================
+ .. The Docutils SlideShow System provides
+ Easy and open presentations.
+ Features
+ ========
+ * reStructuredText-based input files.
+ .. reStructuredText is a what-you-see-is-what-you-get
+ plaintext format. Easy to read & write, non-proprietary,
+ editable in your favourite text editor.
+ .. Parsers for other markup languages can be added to Docutils.
+ In the future, I hope some are.
+ .. pause:: ...
+ * Stylesheet-driven HTML output.
+ .. The format of all elements of the output slides are
+ controlled by CSS (cascading stylesheets).
+ .. pause:: ...
+ * Works with any modern browser.
+ .. that supports CSS, frames, and JavaScript.
+ Tested with Mozilla Firefox.
+ .. pause:: ...
+ * Works on any OS.
+ Etc.
+ ====
+ That's as far as I got, but you get the idea...
+Front-End Tools
+* What about if we don't know which Reader and/or Writer we are
+ going to use? If the Reader/Writer is specified on the
+ command-line? (Will this ever happen?)
+ Perhaps have different types of front ends:
+ a) _`Fully qualified`: Reader and Writer are hard-coded into the
+ front end (e.g. ``pep2html [options]``, ``pysource2pdf
+ [options]``).
+ b) _`Partially qualified`: Reader is hard-coded, and the Writer is
+ specified a sub-command (e.g. ``pep2 html [options]``,
+ ``pysource2 pdf [options]``). The Writer is known before option
+ processing happens, allowing the OptionParser to be built
+ dynamically. Alternatively, the Writer could be hard-coded and
+ the Reader specified as a sub-command (e.g. ``htmlfrom pep
+ [options]``).
+ c) _`Unqualified`: Reader and Writer are specified as subcommands
+ (e.g. ``publish pep html [options]``, ``publish pysource pdf
+ [options]``). A single front end would be sufficient, but
+ probably only useful for testing purposes.
+ d) _`Dynamic`: Reader and/or Writer are specified by options, with
+ defaults if unspecified (e.g. ``publish --writer pdf
+ [options]``). Is this possible? The option parser would have
+ to be told about new options it needs to handle, on the fly.
+ Component-specific options would have to be specified *after*
+ the component-specifying option.
+ Allow common options before subcommands, as in CVS? Or group all
+ options together? In the case of the `fully qualified`_
+ front ends, all the options will have to be grouped together
+ anyway, so there's no advantage (we can't use it to avoid
+ conflicts) to splitting common and component-specific options
+ apart.
+* Parameterize help text & defaults somehow? Perhaps a callback? Or
+ initialize ``settings_spec`` in ``__init__`` or ``init_options``?
+* Disable common options that don't apply?
+* Add ``--section-numbering`` command line option. The "sectnum"
+ directive should override the ``--no-section-numbering`` command
+ line option then.
+* Create a single dynamic_ or unqualified_ front end that can be
+ installed?
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End:
diff --git a/docs/dev/website.txt b/docs/dev/website.txt
new file mode 100644
index 000000000..193e9c0f2
--- /dev/null
+++ b/docs/dev/website.txt
@@ -0,0 +1,46 @@
+ Docutils Web Site
+:Author: David Goodger; open to all Docutils developers
+:Date: $Date$
+:Revision: $Revision$
+:Copyright: This document has been placed in the public domain.
+The Docutils web site, <>, is
+maintained automatically by the ``docutils-update`` script, run as an
+hourly cron job on (by user "felixwiemann"). The
+script will process any .txt file which is newer than the
+corresponding .html file in the project's web directory on (``/home/groups/docutils/htdocs/aux/htdocs/``) and
+upload the changes to the web site at SourceForge. For a new .txt
+file, just SSH to ``<username>`` and ::
+ cd /home/groups/docutils/htdocs/aux/htdocs/
+ touch filename.html
+ chmod g+w filename.html
+ sleep 1
+ touch filename.txt
+The script will take care of the rest within an hour. Thereafter
+whenever the .txt file is modified (checked in to SVN), the .html will
+be regenerated automatically.
+After adding directories to SVN, allow the script to run once to
+create the directories in the filesystem before preparing for HTML
+processing as described above.
+The docutils-update__ script is located at
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ End: