diff options
Diffstat (limited to 'docs/dev')
-rw-r--r-- | docs/dev/distributing.txt | 146 | ||||
-rw-r--r-- | docs/dev/enthought-plan.txt | 480 | ||||
-rw-r--r-- | docs/dev/enthought-rfp.txt | 146 | ||||
-rw-r--r-- | docs/dev/hacking.txt | 264 | ||||
-rw-r--r-- | docs/dev/policies.txt | 549 | ||||
-rw-r--r-- | docs/dev/pysource.dtd | 259 | ||||
-rw-r--r-- | docs/dev/pysource.txt | 130 | ||||
-rw-r--r-- | docs/dev/release.txt | 168 | ||||
-rw-r--r-- | docs/dev/repository.txt | 217 | ||||
-rw-r--r-- | docs/dev/rst/alternatives.txt | 3129 | ||||
-rw-r--r-- | docs/dev/rst/problems.txt | 872 | ||||
-rw-r--r-- | docs/dev/semantics.txt | 119 | ||||
-rw-r--r-- | docs/dev/testing.txt | 246 | ||||
-rw-r--r-- | docs/dev/todo.txt | 1964 | ||||
-rw-r--r-- | docs/dev/website.txt | 46 |
15 files changed, 8735 insertions, 0 deletions
diff --git a/docs/dev/distributing.txt b/docs/dev/distributing.txt new file mode 100644 index 000000000..c81807279 --- /dev/null +++ b/docs/dev/distributing.txt @@ -0,0 +1,146 @@ +=============================== + Docutils_ Distributor's Guide +=============================== + +:Author: Felix Wiemann +:Contact: Felix.Wiemann@ososo.de +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +.. _Docutils: http://docutils.sourceforge.net/ + +.. contents:: + +This document describes how to create packages of Docutils (e.g. for +shipping with a Linux distribution). If you have any questions, +please direct them to the Docutils-develop_ mailing list. + +First, please download the most current `release tarball`_ and unpack +it. + +.. _Docutils-develop: ../user/mailing-lists.html#docutils-develop +.. _release tarball: http://docutils.sourceforge.net/#download + + +Dependencies +============ + +Docutils has the following dependencies: + +* Python 2.1 or later is required. While the compiler package from + the Tools/ directory of Python's source distribution must be + installed for the test suite to pass with Python 2.1, the + functionality available to end users should be available without the + compiler package as well. So just use ">= Python 2.1" in the + dependencies. + +* Docutils may optionally make use of the PIL (`Python Imaging + Library`_). If PIL is present, it is automatically detected by + Docutils. + +* There are three files in the ``extras/`` directory of the Docutils + distribution, ``optparse.py``, ``textwrap.py``, and ``roman.py``. + For Python 2.1/2.2, all of them must be installed (into the + ``site-packages/`` directory). Python 2.3 and later versions have + ``textwrap`` and ``optparse`` included in the standard library, so + only ``roman.py`` is required here; installing the other files won't + hurt, though. + + These files are automatically installed by the setup script (when + calling "python setup.py install"). + +.. _Python Imaging Library: http://www.pythonware.com/products/pil/ + + +Python Files +============ + +The Docutils Python files must be installed into the +``site-packages/`` directory of Python. Running ``python setup.py +install`` should do the trick, but if you want to place the files +yourself, you can just install the ``docutils/`` directory of the +Docutils tarball to ``/usr/lib/python/site-packages/docutils/``. In +this case you should also compile the Python files to ``.pyc`` and/or +``.pyo`` files so that Docutils doesn't need to be recompiled every +time it's executed. + + +Executables +=========== + +The executable front-end tools are located in the ``tools/`` directory +of the Docutils tarball. + +The ``rst2*.py`` tools (except ``rst2newlatex.py``) are intended for +end-users. You should install them to ``/usr/bin/``. You do not need +to change the names (e.g. to ``docutils-rst2html.py``) because the +``rst2`` prefix is unique. + + +Documentation +============= + +The documentation should be generated using ``buildhtml.py``. To +generate HTML for all documentation files, go to the ``tools/`` +directory and run:: + + # Place html4css1.css in base directory. + cp ../docutils/writers/html4css1/html4css1.css .. + ./buildhtml.py --stylesheet-path=../html4css1.css .. + +Then install the following files to ``/usr/share/doc/docutils/`` (or +wherever you install documentation): + +* All ``.html`` and ``.txt`` files in the base directory. + +* The ``docs/`` directory. + + Do not install the contents of the ``docs/`` directory directly to + ``/usr/share/doc/docutils/``; it's incomplete and would contain + invalid references! + +* The ``licenses/`` directory. + +* ``html4css1.css`` in the base directory. + + +Removing the ``.txt`` Files +--------------------------- + +If you are tight with disk space, you can remove all ``.txt`` files in +the tree except for: + +* those in the ``licenses/`` directory because they have not been + processed to HTML and + +* ``user/rst/cheatsheet.txt`` and ``user/rst/demo.txt``, which should + be readable in source form. + +Before you remove the ``.txt`` files you should rerun ``buildhtml.py`` +with the ``--no-source-link`` switch to avoid broken references to the +source files. + + +Other Files +=========== + +You may want to install the Emacs-Lisp files +``tools/editors/emacs/*.el`` into the appropriate directory. + + +Configuration File +================== + +It is possible to have a system-wide configuration file at +``/etc/docutils.conf``. However, this is usually not necessary. You +should *not* install ``tools/docutils.conf`` into ``/etc/``. + + +Tests +===== + +While you probably do not need to ship the tests with your +distribution, you can test your package by installing it and then +running ``alltests.py`` from the ``tests/`` directory of the Docutils +tarball. diff --git a/docs/dev/enthought-plan.txt b/docs/dev/enthought-plan.txt new file mode 100644 index 000000000..0ab0d3c83 --- /dev/null +++ b/docs/dev/enthought-plan.txt @@ -0,0 +1,480 @@ +=========================================== + Plan for Enthought API Documentation Tool +=========================================== + +:Author: David Goodger +:Contact: goodger@python.org +:Date: $Date$ +:Revision: $Revision$ +:Copyright: 2004 by `Enthought, Inc. <http://www.enthought.com>`_ +:License: `Enthought License`_ (BSD-style) + +.. _Enthought License: http://docutils.sf.net/licenses/enthought.txt + +This document should be read in conjunction with the `Enthought API +Documentation Tool RFP`__ prepared by Janet Swisher. + +__ enthought-rfp.html + +.. contents:: +.. sectnum:: + + +Introduction +============ + +In March 2004 at I met Eric Jones, president and CTO of `Enthought, +Inc.`_, at `PyCon 2004`_ in Washington DC. He told me that Enthought +was using reStructuredText_ for source code documentation, but they +had some issues. He asked if I'd be interested in doing some work on +a customized API documentation tool. Shortly after PyCon, Janet +Swisher, Enthought's senior technical writer, contacted me to work out +details. Some email, a trip to Austin in May, and plenty of Texas +hospitality later, we had a project. This document will record the +details, milestones, and evolution of the project. + +In a nutshell, Enthought is sponsoring the implementation of an open +source API documentation tool that meets their needs. Fortuitously, +their needs coincide well with the "Python Source Reader" description +in `PEP 258`_. In other words, Enthought is funding some significant +improvements to Docutils, improvements that were planned but never +implemented due to time and other constraints. The implementation +will take place gradually over several months, on a part-time basis. + +This is an ideal example of cooperation between a corporation and an +open-source project. The corporation, the project, I personally, and +the community all benefit. Enthought, whose commitment to open source +is also evidenced by their sponsorship of SciPy_, benefits by +obtaining a useful piece of software, much more quickly than would +have been possible without their support. Docutils benefits directly +from the implementation of one of its core subsystems. I benefit from +the funding, which allows me to justify the long hours to my wife and +family. All the corporations, projects, and individuals that make up +the community will benefit from the end result, which will be great. + +All that's left now is to actually do the work! + +.. _PyCon 2004: http://pycon.org/dc2004/ +.. _reStructuredText: http://docutils.sf.net/rst.html +.. _SciPy: http://www.scipy.org/ + + +Development Plan +================ + +1. Analyze prior art, most notably Epydoc_ and HappyDoc_, to see how + they do what they do. I have no desire to reinvent wheels + unnecessarily. I want to take the best ideas from each tool, + combined with the outline in `PEP 258`_ (which will evolve), and + build at least the foundation of the definitive Python + auto-documentation tool. + + .. _Epydoc: http://epydoc.sourceforge.net/ + .. _HappyDoc: http://happydoc.sourceforge.net/ + .. _PEP 258: + http://docutils.sf.net/docs/peps/pep-0258.html#python-source-reader + +2. Decide on a base platform. The best way to achieve Enthought's + goals in a reasonable time frame may be to extend Epydoc or + HappyDoc. Or it may be necessary to start fresh. + +3. Extend the reStructuredText parser. See `Proposed Changes to + reStructuredText`_ below. + +4. Depending on the base platform chosen, build or extend the + docstring & doc comment extraction tool. This may be the biggest + part of the project, but I won't be able to break it down into + details until more is known. + + +Repository +========== + +If possible, all software and documentation files will be stored in +the Subversion repository of Docutils and/or the base project, which +are all publicly-available via anonymous pserver access. + +The Docutils project is very open about granting Subversion write +access; so far, everyone who asked has been given access. Any +Enthought staff member who would like Subversion write access will get +it. + +If either Epydoc or HappyDoc is chosen as the base platform, I will +ask the project's administrator for CVS access for myself and any +Enthought staff member who wants it. If sufficient access is not +granted -- although I doubt that there would be any problem -- we may +have to begin a fork, which could be hosted on SourceForge, on +Enthought's Subversion server, or anywhere else deemed appropriate. + + +Copyright & License +=================== + +Most existing Docutils files have been placed in the public domain, as +follows:: + + :Copyright: This document has been placed in the public domain. + +This is in conjunction with the "Public Domain Dedication" section of +COPYING.txt__. + +__ http://docutils.sourceforge.net/COPYING.html + +The code and documentation originating from Enthought funding will +have Enthought's copyright and license declaration. While I will try +to keep Enthought-specific code and documentation separate from the +existing files, there will inevitably be cases where it makes the most +sense to extend existing files. + +I propose the following: + +1. New files related to this Enthought-funded work will be identified + with the following field-list headers:: + + :Copyright: 2004 by Enthought, Inc. + :License: Enthought License (BSD Style) + + The license field text will be linked to the license file itself. + +2. For significant or major changes to an existing file (more than 10% + change), the headers shall change as follows (for example):: + + :Copyright: 2001-2004 by David Goodger + :Copyright: 2004 by Enthought, Inc. + :License: BSD-style + + If the Enthought-funded portion becomes greater than the previously + existing portion, Enthought's copyright line will be shown first. + +3. In cases of insignificant or minor changes to an existing file + (less than 10% change), the public domain status shall remain + unchanged. + +A section describing all of this will be added to the Docutils +`COPYING`__ instructions file. + +If another project is chosen as the base project, similar changes +would be made to their files, subject to negotiation. + +__ http://docutils.sf.net/COPYING.html + + +Proposed Changes to reStructuredText +==================================== + +Doc Comment Syntax +------------------ + +The "traits" construct is implemented as dictionaries, where +standalone strings would be Python syntax errors. Therefore traits +require documentation in comments. We also need a way to +differentiate between ordinary "internal" comments and documentation +comments (doc comments). + +Javadoc uses the following syntax for doc comments:: + + /** + * The first line of a multi-line doc comment begins with a slash + * and *two* asterisks. The doc comment ends normally. + */ + +Python doesn't have multi-line comments; only single-line. A similar +convention in Python might look like this:: + + ## + # The first line of a doc comment begins with *two* hash marks. + # The doc comment ends with the first non-comment line. + 'data' : AnyValue, + + ## The double-hash-marks could occur on the first line of text, + # saving a line in the source. + 'data' : AnyValue, + +How to indicate the end of the doc comment? :: + + ## + # The first line of a doc comment begins with *two* hash marks. + # The doc comment ends with the first non-comment line, or another + # double-hash-mark. + ## + # This is an ordinary, internal, non-doc comment. + 'data' : AnyValue, + + ## First line of a doc comment, terse syntax. + # Second (and last) line. Ends here: ## + # This is an ordinary, internal, non-doc comment. + 'data' : AnyValue, + +Or do we even need to worry about this case? A simple blank line +could be used:: + + ## First line of a doc comment, terse syntax. + # Second (and last) line. Ends with a blank line. + + # This is an ordinary, internal, non-doc comment. + 'data' : AnyValue, + +Other possibilities:: + + #" Instead of double-hash-marks, we could use a hash mark and a + # quotation mark to begin the doc comment. + 'data' : AnyValue, + + ## We could require double-hash-marks on every line. This has the + ## added benefit of delimiting the *end* of the doc comment, as + ## well as working well with line wrapping in Emacs + ## ("fill-paragraph" command). + # Ordinary non-doc comment. + 'data' : AnyValue, + + #" A hash mark and a quotation mark on each line looks funny, and + #" it doesn't work well with line wrapping in Emacs. + 'data' : AnyValue, + +These styles (repeated on each line) work well with line wrapping in +Emacs:: + + ## #> #| #- #% #! #* + +These styles do *not* work well with line wrapping in Emacs:: + + #" #' #: #) #. #/ #@ #$ #^ #= #+ #_ #~ + +The style of doc comment indicator used could be a runtime, global +and/or per-module setting. That may add more complexity than it's +worth though. + + +Recommendation +`````````````` + +I recommend adopting "#*" on every line:: + + # This is an ordinary non-doc comment. + + #* This is a documentation comment, with an asterisk after the + #* hash marks on every line. + 'data' : AnyValue, + +I initially recommended adopting double-hash-marks:: + + # This is an ordinary non-doc comment. + + ## This is a documentation comment, with double-hash-marks on + ## every line. + 'data' : AnyValue, + +But Janet Swisher rightly pointed out that this could collide with +ordinary comments that are then block-commented. This applies to +double-hash-marks on the first line only as well. So they're out. + +On the other hand, the JavaDoc-comment style ("##" on the first line +only, "#" after that) is used in Fredrik Lundh's PythonDoc_. It may +be worthwhile to conform to this syntax, reinforcing it as a standard. +PythonDoc does not support terse doc comments (text after "##" on the +first line). + +.. _PythonDoc: http://effbot.org/zone/pythondoc.htm + + +Update +`````` + +Enthought's Traits system has switched to a metaclass base, and traits +are now defined via ordinary attributes. Therefore doc comments are +no longer absolutely necessary; attribute docstrings will suffice. +Doc comments may still be desirable though, since they allow +documentation to precede the thing being documented. + + +Docstring Density & Whitespace Minimization +------------------------------------------- + +One problem with extensively documented classes & functions, is that +there is a lot of screen space wasted on whitespace. Here's some +current Enthought code (from lib/cp/fluids/gassmann.py):: + + def max_gas(temperature, pressure, api, specific_gravity=.56): + """ + Computes the maximum dissolved gas in oil using Batzle and + Wang (1992). + + Parameters + ---------- + temperature : sequence + Temperature in degrees Celsius + pressure : sequence + Pressure in MPa + api : sequence + Stock tank oil API + specific_gravity : sequence + Specific gravity of gas at STP, default is .56 + + Returns + ------- + max_gor : sequence + Maximum dissolved gas in liters/liter + + Description + ----------- + This estimate is based on equations given by Mavko, Mukerji, + and Dvorkin, (1998, pp. 218-219, or 2003, p. 236) obtained + originally from Batzle and Wang (1992). + """ + code... + +The docstring is 24 lines long. + +Rather than using subsections, field lists (which exist now) can save +6 lines:: + + def max_gas(temperature, pressure, api, specific_gravity=.56): + """ + Computes the maximum dissolved gas in oil using Batzle and + Wang (1992). + + :Parameters: + temperature : sequence + Temperature in degrees Celsius + pressure : sequence + Pressure in MPa + api : sequence + Stock tank oil API + specific_gravity : sequence + Specific gravity of gas at STP, default is .56 + :Returns: + max_gor : sequence + Maximum dissolved gas in liters/liter + :Description: This estimate is based on equations given by + Mavko, Mukerji, and Dvorkin, (1998, pp. 218-219, or 2003, + p. 236) obtained originally from Batzle and Wang (1992). + """ + code... + +As with the "Description" field above, field bodies may begin on the +same line as the field name, which also saves space. + +The output for field lists is typically a table structure. For +example: + + :Parameters: + temperature : sequence + Temperature in degrees Celsius + pressure : sequence + Pressure in MPa + api : sequence + Stock tank oil API + specific_gravity : sequence + Specific gravity of gas at STP, default is .56 + :Returns: + max_gor : sequence + Maximum dissolved gas in liters/liter + :Description: + This estimate is based on equations given by Mavko, + Mukerji, and Dvorkin, (1998, pp. 218-219, or 2003, p. 236) + obtained originally from Batzle and Wang (1992). + +But the definition lists describing the parameters and return values +are still wasteful of space. There are a lot of half-filled lines. + +Definition lists are currently defined as:: + + term : classifier + definition + +Where the classifier part is optional. Ideas for improvements: + +1. We could allow multiple classifiers:: + + term : classifier one : two : three ... + definition + +2. We could allow the definition on the same line as the term, using + some embedded/inline markup: + + * "--" could be used, but only in limited and well-known contexts:: + + term -- definition + + This is the syntax used by StructuredText (one of + reStructuredText's predecessors). It was not adopted for + reStructuredText because it is ambiguous -- people often use "--" + in their text, as I just did. But given a constrained context, + the ambiguity would be acceptable (or would it?). That context + would be: in docstrings, within a field list, perhaps only with + certain well-defined field names (parameters, returns). + + * The "constrained context" above isn't really enough to make the + ambiguity acceptable. Instead, a slightly more verbose but far + less ambiguous syntax is possible:: + + term === definition + + This syntax has advantages. Equals signs lend themselves to the + connotation of "definition". And whereas one or two equals signs + are commonly used in program code, three equals signs in a row + have no conflicting meanings that I know of. (Update: there + *are* uses out there.) + + The problem with this approach is that using inline markup for + structure is inherently ambiguous in reStructuredText. For + example, writing *about* definition lists would be difficult:: + + ``term === definition`` is an example of a compact definition list item + + The parser checks for structural markup before it does inline + markup processing. But the "===" should be protected by its inline + literal context. + +3. We could allow the definition on the same line as the term, using + structural markup. A variation on bullet lists would work well:: + + : term :: definition + : another term :: and a definition that + wraps across lines + + Some ambiguity remains:: + + : term ``containing :: double colons`` :: definition + + But the likelihood of such cases is negligible, and they can be + covered in the documentation. + + Other possibilities for the definition delimiter include:: + + : term : classifier -- definition + : term : classifier --- definition + : term : classifier : : definition + : term : classifier === definition + +The third idea currently has the best chance of being adopted and +implemented. + + +Recommendation +`````````````` + +Combining these ideas, the function definition becomes:: + + def max_gas(temperature, pressure, api, specific_gravity=.56): + """ + Computes the maximum dissolved gas in oil using Batzle and + Wang (1992). + + :Parameters: + : temperature : sequence :: Temperature in degrees Celsius + : pressure : sequence :: Pressure in MPa + : api : sequence :: Stock tank oil API + : specific_gravity : sequence :: Specific gravity of gas at + STP, default is .56 + :Returns: + : max_gor : sequence :: Maximum dissolved gas in liters/liter + :Description: This estimate is based on equations given by + Mavko, Mukerji, and Dvorkin, (1998, pp. 218-219, or 2003, + p. 236) obtained originally from Batzle and Wang (1992). + """ + code... + +The docstring is reduced to 14 lines, from the original 24. For +longer docstrings with many parameters and return values, the +difference would be more significant. diff --git a/docs/dev/enthought-rfp.txt b/docs/dev/enthought-rfp.txt new file mode 100644 index 000000000..986f5604f --- /dev/null +++ b/docs/dev/enthought-rfp.txt @@ -0,0 +1,146 @@ +================================== + Enthought API Documentation Tool +================================== +----------------------- + Request for Proposals +----------------------- + +:Author: Janet Swisher, Senior Technical Writer +:Organization: `Enthought, Inc. <http://www.enthought.com>`_ +:Copyright: 2004 by Enthought, Inc. +:License: `Enthought License`_ (BSD Style) + +.. _Enthought License: http://docutils.sf.net/licenses/enthought.txt + +The following is excerpted from the full RFP, and is published here +with permission from `Enthought, Inc.`_ See the `Plan for Enthought +API Documentation Tool`__. + +__ enthought-plan.html + +.. contents:: +.. sectnum:: + + +Requirements +============ + +The documentation tool will address the following high-level goals: + + +Documentation Extraction +------------------------ + +1. Documentation will be generated directly from Python source code, + drawing from the code structure, docstrings, and possibly other + comments. + +2. The tool will extract logical constructs as appropriate, minimizing + the need for comments that are redundant with the code structure. + The output should reflect both documented and undocumented + elements. + + +Source Format +------------- + +1. The docstrings will be formatted in as terse syntax as possible. + Required tags, syntax, and white space should be minimized. + +2. The tool must support the use of Traits. Special comment syntax + for Traits may be necessary. Information about the Traits package + is available at http://code.enthought.com/traits/. In the + following example, each trait definition is prefaced by a plain + comment:: + + __traits__ = { + + # The current selection within the frame. + 'selection' : Trait([], TraitInstance(list)), + + # The frame has been activated or deactivated. + 'activated' : TraitEvent(), + + 'closing' : TraitEvent(), + + # The frame is closed. + 'closed' : TraitEvent(), + } + +3. Support for ReStructuredText (ReST) format is desirable, because + much of the existing docstrings uses ReST. However, the complete + ReST specification need not be supported, if a subset can achieve + the project goals. If the tool does not support ReST, the + contractor should also provide a tool or path to convert existing + docstrings. + + +Output Format +------------- + +1. Documentation will be output as a navigable suite of HTML + files. + +2. The style of the HTML files will be customizable by a cascading + style sheet and/or a customizable template. + +3. Page elements such as headers and footer should be customizable, to + support differing requirements from one documentation project to + the next. + + +Output Structure and Navigation +------------------------------- + +1. The navigation scheme for the HTML files should not rely on frames, + and should harmonize with conversion to Microsoft HTML Help (.chm) + format. + +2. The output should be structured to make navigable the architecture + of the Python code. Packages, modules, classes, traits, and + functions should be presented in clear, logical hierarchies. + Diagrams or trees for inheritance, collaboration, sub-packaging, + etc. are desirable but not required. + +3. The output must include indexes that provide a comprehensive view + of all packages, modules, and classes. These indexes will provide + readers with a clear and exhaustive view of the code base. These + indexes should be presented in a way that is easily accessible and + allows easy navigation. + +4. Cross-references to other documented elements will be used + throughout the documentation, to enable the reader to move quickly + relevant information. For example, where type information for an + element is available, the type definition should be + cross-referenced. + +5. The HTML suite should provide consistent navigation back to the + home page, which will include the following information: + + * Bibliographic information + + - Author + - Copyright + - Release date + - Version number + + * Abstract + + * References + + - Links to related internal docs (i.e., other docs for the same + product) + + - Links to related external docs (e.g., supporting development + docs, Python support docs, docs for included packages) + + It should be possible to specify similar information at the top + level of each package, so that packages can be included as + appropriate for a given application. + + +License +======= + +Enthought intends to release the software under an open-source +("BSD-style") license. diff --git a/docs/dev/hacking.txt b/docs/dev/hacking.txt new file mode 100644 index 000000000..d0ec9a3fb --- /dev/null +++ b/docs/dev/hacking.txt @@ -0,0 +1,264 @@ +========================== + Docutils_ Hacker's Guide +========================== + +:Author: Felix Wiemann +:Contact: Felix.Wiemann@ososo.de +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +:Abstract: This is the introduction to Docutils for all persons who + want to extend Docutils in some way. +:Prerequisites: You have used reStructuredText_ and played around with + the `Docutils front-end tools`_ before. Some (basic) Python + knowledge is certainly helpful (though not necessary, strictly + speaking). + +.. _Docutils: http://docutils.sourceforge.net/ +.. _reStructuredText: http://docutils.sourceforge.net/rst.html +.. _Docutils front-end tools: ../user/tools.html + +.. contents:: + + +Overview of the Docutils Architecture +===================================== + +To give you an understanding of the Docutils architecture, we'll dive +right into the internals using a practical example. + +Consider the following reStructuredText file:: + + My *favorite* language is Python_. + + .. _Python: http://www.python.org/ + +Using the ``rst2html.py`` front-end tool, you would get an HTML output +which looks like this:: + + [uninteresting HTML code removed] + <body> + <div class="document"> + <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p> + </div> + </body> + </html> + +While this looks very simple, it's enough to illustrate all internal +processing stages of Docutils. Let's see how this document is +processed from the reStructuredText source to the final HTML output: + + +Reading the Document +-------------------- + +The **Reader** reads the document from the source file and passes it +to the parser (see below). The default reader is the standalone +reader (``docutils/readers/standalone.py``) which just reads the input +data from a single text file. Unless you want to do really fancy +things, there is no need to change that. + +Since you probably won't need to touch readers, we will just move on +to the next stage: + + +Parsing the Document +-------------------- + +The **Parser** analyzes the the input document and creates a **node +tree** representation. In this case we are using the +**reStructuredText parser** (``docutils/parsers/rst/__init__.py``). +To see what that node tree looks like, we call ``quicktest.py`` (which +can be found in the ``tools/`` directory of the Docutils distribution) +with our example file (``test.txt``) as first parameter (Windows users +might need to type ``python quicktest.py test.txt``):: + + $ quicktest.py test.txt + <document source="test.txt"> + <paragraph> + My + <emphasis> + favorite + language is + <reference name="Python" refname="python"> + Python + . + <target ids="python" names="python" refuri="http://www.python.org/"> + +Let us now examine the node tree: + +The top-level node is ``document``. It has a ``source`` attribute +whose value is ``text.txt``. There are two children: A ``paragraph`` +node and a ``target`` node. The ``paragraph`` in turn has children: A +text node ("My "), an ``emphasis`` node, a text node (" language is "), +a ``reference`` node, and again a ``Text`` node ("."). + +These node types (``document``, ``paragraph``, ``emphasis``, etc.) are +all defined in ``docutils/nodes.py``. The node types are internally +arranged as a class hierarchy (for example, both ``emphasis`` and +``reference`` have the common superclass ``Inline``). To get an +overview of the node class hierarchy, use epydoc (type ``epydoc +nodes.py``) and look at the class hierarchy tree. + + +Transforming the Document +------------------------- + +In the node tree above, the ``reference`` node does not contain the +target URI (``http://www.python.org/``) yet. + +Assigning the target URI (from the ``target`` node) to the +``reference`` node is *not* done by the parser (the parser only +translates the input document into a node tree). + +Instead, it's done by a **Transform**. In this case (resolving a +reference), it's done by the ``ExternalTargets`` transform in +``docutils/transforms/references.py``. + +In fact, there are quite a lot of Transforms, which do various useful +things like creating the table of contents, applying substitution +references or resolving auto-numbered footnotes. + +The Transforms are applied after parsing. To see how the node tree +has changed after applying the Transforms, we use the +``rst2pseudoxml.py`` tool: + +.. parsed-literal:: + + $ rst2pseudoxml.py test.txt + <document source="test.txt"> + <paragraph> + My + <emphasis> + favorite + language is + <reference name="Python" **refuri="http://www.python.org/"**> + Python + . + <target ids="python" names="python" ``refuri="http://www.python.org/"``> + +For our small test document, the only change is that the ``refname`` +attribute of the reference has been replaced by a ``refuri`` +attribute |---| the reference has been resolved. + +While this does not look very exciting, transforms are a powerful tool +to apply any kind of transformation on the node tree. + +By the way, you can also get a "real" XML representation of the node +tree by using ``rst2xml.py`` instead of ``rst2pseudoxml.py``. + + +Writing the Document +-------------------- + +To get an HTML document out of the node tree, we use a **Writer**, the +HTML writer in this case (``docutils/writers/html4css1.py``). + +The writer receives the node tree and returns the output document. +For HTML output, we can test this using the ``rst2html.py`` tool:: + + $ rst2html.py --link-stylesheet test.txt + <?xml version="1.0" encoding="utf-8" ?> + <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> + <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> + <meta name="generator" content="Docutils 0.3.10: http://docutils.sourceforge.net/" /> + <title></title> + <link rel="stylesheet" href="../docutils/writers/html4css1/html4css1.css" type="text/css" /> + </head> + <body> + <div class="document"> + <p>My <em>favorite</em> language is <a class="reference" href="http://www.python.org/">Python</a>.</p> + </div> + </body> + </html> + +So here we finally have our HTML output. The actual document contents +are in the fourth-last line. Note, by the way, that the HTML writer +did not render the (invisible) ``target`` node |---| only the +``paragraph`` node and its children appear in the HTML output. + + +Extending Docutils +================== + +Now you'll ask, "how do I actually extend Docutils?" + +First of all, once you are clear about *what* you want to achieve, you +have to decide *where* to implement it |---| in the Parser (e.g. by +adding a directive or role to the reStructuredText parser), as a +Transform, or in the Writer. There is often one obvious choice among +those three (Parser, Transform, Writer). If you are unsure, ask on +the Docutils-develop_ mailing list. + +In order to find out how to start, it is often helpful to look at +similar features which are already implemented. For example, if you +want to add a new directive to the reStructuredText parser, look at +the implementation of a similar directive in +``docutils/parsers/rst/directives/``. + + +Modifying the Document Tree Before It Is Written +------------------------------------------------ + +You can modify the document tree right before the writer is called. +One possibility is to use the publish_doctree_ and +publish_from_doctree_ functions. + +To retrieve the document tree, call:: + + document = docutils.core.publish_doctree(...) + +Please see the docstring of publish_doctree for a list of parameters. + +.. XXX Need to write a well-readable list of (commonly used) options + of the publish_* functions. Probably in api/publisher.txt. + +``document`` is the root node of the document tree. You can now +change the document by accessing the ``document`` node and its +children |---| see `The Node Interface`_ below. + +When you're done with modifying the document tree, you can write it +out by calling:: + + output = docutils.core.publish_from_doctree(document, ...) + +.. _publish_doctree: ../api/publisher.html#publish_doctree +.. _publish_from_doctree: ../api/publisher.html#publish_from_doctree + + +The Node Interface +------------------ + +As described in the overview above, Docutils' internal representation +of a document is a tree of nodes. We'll now have a look at the +interface of these nodes. + +(To be completed.) + + +What Now? +========= + +This document is not complete. Many topics could (and should) be +covered here. To find out with which topics we should write about +first, we are awaiting *your* feedback. So please ask your questions +on the Docutils-develop_ mailing list. + + +.. _Docutils-develop: ../user/mailing-lists.html#docutils-develop + + +.. |---| unicode:: 8212 .. em-dash + :trim: + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/policies.txt b/docs/dev/policies.txt new file mode 100644 index 000000000..25fb4f2e9 --- /dev/null +++ b/docs/dev/policies.txt @@ -0,0 +1,549 @@ +=========================== + Docutils Project Policies +=========================== + +:Author: David Goodger; open to all Docutils developers +:Contact: goodger@python.org +:Date: $Date$ +:Revision: $Revision$ +:Copyright: This document has been placed in the public domain. + +.. contents:: + +The Docutils project group is a meritocracy based on code contribution +and lots of discussion [#bcs]_. A few quotes sum up the policies of +the Docutils project. The IETF's classic credo (by MIT professor Dave +Clark) is an ideal we can aspire to: + + We reject: kings, presidents, and voting. We believe in: rough + consensus and running code. + +As architect, chief cook and bottle-washer, David Goodger currently +functions as BDFN (Benevolent Dictator For Now). (But he would +happily abdicate the throne given a suitable candidate. Any takers?) + +Eric S. Raymond, anthropologist of the hacker subculture, writes in +his essay `The Magic Cauldron`_: + + The number of contributors [to] projects is strongly and inversely + correlated with the number of hoops each project makes a user go + through to contribute. + +We will endeavour to keep the barrier to entry as low as possible. +The policies below should not be thought of as barriers, but merely as +a codification of experience to date. These are "best practices"; +guidelines, not absolutes. Exceptions are expected, tolerated, and +used as a source of improvement. Feedback and criticism is welcome. + +As for control issues, Emmett Plant (CEO of the Xiph.org Foundation, +originators of Ogg Vorbis) put it well when he said: + + Open source dictates that you lose a certain amount of control + over your codebase, and that's okay with us. + +.. [#bcs] Phrase borrowed from `Ben Collins-Sussman of the Subversion + project <http://www.red-bean.com/sussman/svn-anti-fud.html>`__. + +.. _The Magic Cauldron: + http://www.catb.org/~esr/writings/magic-cauldron/ + + +Python Coding Conventions +========================= + +Contributed code will not be refused merely because it does not +strictly adhere to these conditions; as long as it's internally +consistent, clean, and correct, it probably will be accepted. But +don't be surprised if the "offending" code gets fiddled over time to +conform to these conventions. + +The Docutils project shall follow the generic coding conventions as +specified in the `Style Guide for Python Code`_ and `Docstring +Conventions`_ PEPs, summarized, clarified, and extended as follows: + +* 4 spaces per indentation level. No hard tabs. + +* Use only 7-bit ASCII, no 8-bit strings. See `Docutils + Internationalization`_. + +* No one-liner compound statements (i.e., no ``if x: return``: use two + lines & indentation), except for degenerate class or method + definitions (i.e., ``class X: pass`` is OK.). + +* Lines should be no more than 78 characters long. + +* Use "StudlyCaps" for class names (except for element classes in + docutils.nodes). + +* Use "lowercase" or "lowercase_with_underscores" for function, + method, and variable names. For short names, maximum two words, + joined lowercase may be used (e.g. "tagname"). For long names with + three or more words, or where it's hard to parse the split between + two words, use lowercase_with_underscores (e.g., + "note_explicit_target", "explicit_target"). If in doubt, use + underscores. + +* Avoid lambda expressions, which are inherently difficult to + understand. Named functions are preferable and superior: they're + faster (no run-time compilation), and well-chosen names serve to + document and aid understanding. + +* Avoid functional constructs (filter, map, etc.). Use list + comprehensions instead. + +* Avoid ``from __future__ import`` constructs. They are inappropriate + for production code. + +* Use 'single quotes' for string literals, and """triple double + quotes""" for docstrings. + +.. _Style Guide for Python Code: + http://www.python.org/peps/pep-0008.html +.. _Docstring Conventions: http://www.python.org/peps/pep-0257.html +.. _Docutils Internationalization: ../howto/i18n.html#python-code + + +Documentation Conventions +========================= + +* Docutils documentation is written using reStructuredText, of course. + +* Use 7-bit ASCII if at all possible, and Unicode substitutions when + necessary. + +* Use the following section title adornment styles:: + + ================ + Document Title + ================ + + -------------------------------------------- + Document Subtitle, or Major Division Title + -------------------------------------------- + + Section + ======= + + Subsection + ---------- + + Sub-Subsection + `````````````` + + Sub-Sub-Subsection + .................. + +* Use two blank lines before each section/subsection/etc. title. One + blank line is sufficient between immediately adjacent titles. + +* Add a bibliographic field list immediately after the document + title/subtitle. See the beginning of this document for an example. + +* Add an Emacs "local variables" block in a comment at the end of the + document. See the end of this document for an example. + + +Copyrights and Licensing +======================== + +The majority of the Docutils project code and documentation has been +placed in the public domain. Unless clearly and explicitly indicated +otherwise, any patches (modifications to existing files) submitted to +the project for inclusion (via Subversion, SourceForge trackers, +mailing lists, or private email) are assumed to be in the public +domain as well. + +Any new files contributed to the project should clearly state their +intentions regarding copyright, in one of the following ways: + +* Public domain (preferred): include the statement "This + module/document has been placed in the public domain." + +* Copyright & open source license: include a copyright notice, along + with either an embedded license statement, a reference to an + accompanying license file, or a license URL. + +One of the goals of the Docutils project, once complete, is to be +incorporated into the Python standard library. At that time copyright +of the Docutils code will be assumed by or transferred to the Python +Software Foundation (PSF), and will be released under Python's +license. If the copyright/license option is chosen for new files, the +license should be compatible with Python's current license, and the +author(s) of the files should be willing to assign copyright to the +PSF. The PSF accepts the `Academic Free License v. 2.1 +<http://opensource.org/licenses/afl-2.1.php>`_ and the `Apache +License, Version 2.0 <http://opensource.org/licenses/apache2.0.php>`_. + + +Subversion Repository +===================== + +Please see the `repository documentation`_ for details on how to +access Docutils' Subversion repository. Anyone can access the +repository anonymously. Only project developers can make changes. +(If you would like to become a project developer, just ask!) Also see +`Setting Up For Docutils Development`_ below for some useful info. + +Unless you really *really* know what you're doing, please do *not* use +``svn import``. It's quite easy to mess up the repository with an +import. + +.. _repository documentation: repository.html + + +Branches +-------- + +(These branch policies go into effect with Docutils 0.4.) + +The "docutils" directory of the **trunk** (a.k.a. the **Docutils +core**) is used for active -- but stable, fully tested, and reviewed +-- development. + +There will be at least one active **maintenance branch** at a time, +based on at least the latest feature release. For example, when +Docutils 0.5 is released, its maintenance branch will take over, and +the 0.4.x maintenance branch may be retired. Maintenance branches +will receive bug fixes only; no new features will be allowed here. + +Obvious and uncontroversial bug fixes *with tests* can be checked in +directly to the core and to the maintenance branches. Don't forget to +add test cases! Many (but not all) bug fixes will be applicable both +to the core and to the maintenance branches; these should be applied +to both. No patches or dedicated branches are required for bug fixes, +but they may be used. It is up to the discretion of project +developers to decide which mechanism to use for each case. + +Feature additions and API changes will be done in **feature +branches**. Feature branches will not be managed in any way. +Frequent small checkins are encouraged here. Feature branches must be +discussed on the docutils-develop mailing list and reviewed before +being merged into the core. + + +Review Criteria +``````````````` + +Before a new feature, an API change, or a complex, disruptive, or +controversial bug fix can be checked in to the core or into a +maintenance branch, it must undergo review. These are the criteria: + +* The branch must be complete, and include full documentation and + tests. + +* There should ideally be one branch merge commit per feature or + change. In other words, each branch merge should represent a + coherent change set. + +* The code must be stable and uncontroversial. Moving targets and + features under debate are not ready to be merged. + +* The code must work. The test suite must complete with no failures. + See `Docutils Testing`_. + +The review process will ensure that at least one other set of eyeballs +& brains sees the code before it enters the core. In addition to the +above, the general `Check-ins`_ policy (below) also applies. + +.. _Docutils Testing: testing.html + + +Check-ins +--------- + +Changes or additions to the Docutils core and maintenance branches +carry a commitment to the Docutils user community. Developers must be +prepared to fix and maintain any code they have committed. + +The Docutils core (``trunk/docutils`` directory) and maintenance +branches should always be kept in a stable state (usable and as +problem-free as possible). All changes to the Docutils core or +maintenance branches must be in `good shape`_, usable_, documented_, +tested_, and `reasonably complete`_. + +* _`Good shape` means that the code is clean, readable, and free of + junk code (unused legacy code; by analogy to "junk DNA"). + +* _`Usable` means that the code does what it claims to do. An "XYZ + Writer" should produce reasonable XYZ output. + +* _`Documented`: The more complete the documentation the better. + Modules & files must be at least minimally documented internally. + `Docutils Front-End Tools`_ should have a new section for any + front-end tool that is added. `Docutils Configuration Files`_ + should be modified with any settings/options defined. For any + non-trivial change, the HISTORY.txt_ file should be updated. + +* _`Tested` means that unit and/or functional tests, that catch all + bugs fixed and/or cover all new functionality, have been added to + the test suite. These tests must be checked by running the test + suite under all supported Python versions, and the entire test suite + must pass. See `Docutils Testing`_. + +* _`Reasonably complete` means that the code must handle all input. + Here "handle" means that no input can cause the code to fail (cause + an exception, or silently and incorrectly produce nothing). + "Reasonably complete" does not mean "finished" (no work left to be + done). For example, a writer must handle every standard element + from the Docutils document model; for unimplemented elements, it + must *at the very least* warn that "Output for element X is not yet + implemented in writer Y". + +If you really want to check code directly into the Docutils core, +you can, but you must ensure that it fulfills the above criteria +first. People will start to use it and they will expect it to work! +If there are any issues with your code, or if you only have time for +gradual development, you should put it on a branch or in the sandbox +first. It's easy to move code over to the Docutils core once it's +complete. + +It is the responsibility and obligation of all developers to keep the +Docutils core and maintenance branches stable. If a commit is made to +the core or maintenance branch which breaks any test, the solution is +simply to revert the change. This is not vindictive; it's practical. +We revert first, and discuss later. + +Docutils will pursue an open and trusting policy for as long as +possible, and deal with any aberrations if (and hopefully not when) +they happen. We'd rather see a torrent of loose contributions than +just a trickle of perfect-as-they-stand changes. The occasional +mistake is easy to fix. That's what Subversion is for! + +.. _Docutils Front-End Tools: ../user/tools.html +.. _Docutils Configuration Files: ../user/config.html +.. _HISTORY.txt: ../../HISTORY.txt + + +Version Numbering +================= + +Docutils version numbering uses a ``major.minor.micro`` scheme (x.y.z; +for example, 0.4.1). + +**Major releases** (x.0, e.g. 1.0) will be rare, and will represent +major changes in API, functionality, or commitment. For example, as +long as the major version of Docutils is 0, it is to be considered +*experimental code*. When Docutils reaches version 1.0, the major +APIs will be considered frozen and backward compatibility will become +of paramount importance. + +Releases that change the minor number (x.y, e.g. 0.5) will be +**feature releases**; new features from the `Docutils core`_ will be +included. + +Releases that change the micro number (x.y.z, e.g. 0.4.1) will be +**bug-fix releases**. No new features will be introduced in these +releases; only bug fixes off of `maintenance branches`_ will be +included. + +This policy was adopted in October 2005, and will take effect with +Docutils version 0.4. Prior to version 0.4, Docutils didn't have an +official version numbering policy, and micro releases contained both +bug fixes and new features. + +.. _Docutils core: + http://svn.berlios.de/viewcvs/docutils/trunk/docutils/ +.. _maintenance branches: + http://svn.berlios.de/viewcvs/docutils/branches/ + + +Snapshots +========= + +Snapshot tarballs will be generated regularly from + +* the Docutils core, representing the current cutting-edge state of + development; + +* each active maintenance branch, for bug fixes; + +* each development branch, representing the unstable + seat-of-your-pants bleeding edge. + +The ``sandbox/infrastructure/docutils-update`` shell script, run as an +hourly cron job on the BerliOS server, is responsible for +automatically generating the snapshots and updating the web site. See +the `web site docs <website.html>`__. + + +Setting Up For Docutils Development +=================================== + +When making changes to the code, testing is a must. The code should +be run to verify that it produces the expected results, and the entire +test suite should be run too. The modified Docutils code has to be +accessible to Python for the tests to have any meaning. There are two +ways to keep the Docutils code accessible during development: + +1. Update your ``PYTHONPATH`` environment variable so that Python + picks up your local working copy of the code. This is the + recommended method. + + We'll assume that the Docutils trunk is checked out under your + ~/projects/ directory as follows:: + + svn co svn+ssh://<user>@svn.berlios.de/svnroot/repos/docutils/trunk \ + docutils + + For the bash shell, add this to your ``~/.profile``:: + + PYTHONPATH=$HOME/projects/docutils/docutils + PYTHONPATH=$PYTHONPATH:$HOME/projects/docutils/docutils/extras + export PYTHONPATH + + The first line points to the directory containing the ``docutils`` + package. The second line adds the directory containing the + third-party modules Docutils depends on. The third line exports + this environment variable. You may also wish to add the ``tools`` + directory to your ``PATH``:: + + PATH=$PATH:$HOME/projects/docutils/docutils/tools + export PATH + +2. Before you run anything, every time you make a change, reinstall + Docutils:: + + python setup.py install + + .. CAUTION:: + + This method is **not** recommended for day-to-day development; + it's too easy to forget. Confusion inevitably ensues. + + If you install Docutils this way, Python will always pick up the + last-installed copy of the code. If you ever forget to + reinstall the "docutils" package, Python won't see your latest + changes. + +A useful addition to the ``docutils`` top-level directory in branches +and alternate copies of the code is a ``set-PATHS`` file +containing the following lines:: + + # source this file + export PYTHONPATH=$PWD:$PWD/extras + export PATH=$PWD/tools:$PATH + +Open a shell for this branch, ``cd`` to the ``docutils`` top-level +directory, and "source" this file. For example, using the bash +shell:: + + $ cd some-branch/docutils + $ . set-PATHS + + +Mailing Lists +============= + +Developers are recommended to subscribe to all `Docutils mailing +lists`_. + +.. _Docutils mailing lists: ../user/mailing-lists.html + + +The Wiki +======== + +There is a development wiki at http://docutils.python-hosting.com/ as +a scratchpad for transient notes. Please use the repository for +permament document storage. + + +The Sandbox +=========== + +The `sandbox directory`_ is a place to play around, to try out and +share ideas. It's a part of the Subversion repository but it isn't +distributed as part of Docutils releases. Feel free to check in code +to the sandbox; that way people can try it out but you won't have to +worry about it working 100% error-free, as is the goal of the Docutils +core. Each developer who wants to play in the sandbox should create +either a project-specific subdirectory or personal subdirectory +(suggested name: SourceForge ID, nickname, or given name + family +initial). It's OK to make a mess in your personal space! But please, +play nice. + +Please update the `sandbox README`_ file with links and a brief +description of your work. + +In order to minimize the work necessary for others to install and try +out new, experimental components, the following sandbox directory +structure is recommended:: + + sandbox/ + project_name/ # For a collaborative project. + # Structure as in userid/component_name below. + userid/ # For personal space. + component_name/ # A verbose name is best. + README.txt # Please explain the requirements, + # purpose/goals, and usage. + docs/ + ... + component.py # The component is a single module. + # *OR* (but *not* both) + component/ # The component is a package. + __init__.py # Contains the Reader/Writer class. + other1.py # Other modules and data files used + data.txt # by this component. + ... + test/ # Test suite. + ... + tools/ # For front ends etc. + ... + setup.py # Use Distutils to install the component + # code and tools/ files into the right + # places in Docutils. + +Some sandbox projects are destined to become Docutils components once +completed. Others, such as add-ons to Docutils or applications of +Docutils, graduate to become `parallel projects`_. + +.. _sandbox README: http://docutils.sf.net/sandbox/README.html +.. _sandbox directory: + http://svn.berlios.de/viewcvs/docutils/trunk/sandbox/ + + +.. _parallel project: + +Parallel Projects +================= + +Parallel projects contain useful code that is not central to the +functioning of Docutils. Examples are specialized add-ons or +plug-ins, and applications of Docutils. They use Docutils, but +Docutils does not require their presence to function. + +An official parallel project will have its own directory beside (or +parallel to) the main ``docutils`` directory in the Subversion +repository. It can have its own web page in the +docutils.sourceforge.net domain, its own file releases and +downloadable snapshots, and even a mailing list if that proves useful. +However, an official parallel project has implications: it is expected +to be maintained and continue to work with changes to the core +Docutils. + +A parallel project requires a project leader, who must commit to +coordinate and maintain the implementation: + +* Answer questions from users and developers. +* Review suggestions, bug reports, and patches. +* Monitor changes and ensure the quality of the code and + documentation. +* Coordinate with Docutils to ensure interoperability. +* Put together official project releases. + +Of course, related projects may be created independently of Docutils. +The advantage of a parallel project is that the SourceForge +environment and the developer and user communities are already +established. Core Docutils developers are available for consultation +and may contribute to the parallel project. It's easier to keep the +projects in sync when there are changes made to the core Docutils +code. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/pysource.dtd b/docs/dev/pysource.dtd new file mode 100644 index 000000000..fb8af4091 --- /dev/null +++ b/docs/dev/pysource.dtd @@ -0,0 +1,259 @@ +<!-- +====================================================================== + Docutils Python Source DTD +====================================================================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This DTD has been placed in the public domain. +:Filename: pysource.dtd + +This DTD (document type definition) extends the Generic DTD (see +below). + +More information about this DTD and the Docutils project can be found +at http://docutils.sourceforge.net/. The latest version of this DTD +is available from +http://docutils.sourceforge.net/docs/dev/pysource.dtd. + +The formal public identifier for this DTD is:: + + +//IDN docutils.sourceforge.net//DTD Docutils Python Source//EN//XML +--> + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Parameter Entity Overrides +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!ENTITY % additional.section.elements + " | package_section | module_section | class_section + | method_section | function_section + | module_attribute_section | function_attribute_section + | class_attribute_section | instance_attribute_section "> + +<!ENTITY % additional.inline.elements + " | package | module | class | method | function + | variable | parameter | type | attribute + | module_attribute | class_attribute | instance_attribute + | exception_class | warning_class "> + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Generic DTD +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This DTD extends the Docutils Generic DTD, available from +http://docutils.sourceforge.net/docs/ref/docutils.dtd. +--> + +<!ENTITY % docutils PUBLIC + "+//IDN python.org//DTD Docutils Generic//EN//XML" + "docutils.dtd"> +%docutils; + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Additional Section Elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!ELEMENT package_section + (package, fullname?, import_list?, %structure.model;)> +<!ATTLIST package_section %basic.atts;> + +<!ELEMENT module_section + (module, fullname?, import_list?, %structure.model;)> +<!ATTLIST module_section %basic.atts;> + +<!ELEMENT class_section + (class, inheritance_list?, fullname?, subclasses?, + %structure.model;)> +<!ATTLIST class_section %basic.atts;> + +<!ELEMENT method_section + (method, parameter_list?, fullname?, overrides?, + %structure.model;)> +<!ATTLIST method_section %basic.atts;> + +<!ELEMENT function_section + (function, parameter_list?, fullname?, %structure.model;)> +<!ATTLIST function_section %basic.atts;> + +<!ELEMENT module_attribute_section + (attribute, initial_value?, fullname?, %structure.model;)> +<!ATTLIST module_attribute_section %basic.atts;> + +<!ELEMENT function_attribute_section + (attribute, initial_value?, fullname?, %structure.model;)> +<!ATTLIST function_attribute_section %basic.atts;> + +<!ELEMENT class_attribute_section + (attribute, initial_value?, fullname?, overrides?, + %structure.model;)> +<!ATTLIST class_attribute_section %basic.atts;> + +<!ELEMENT instance_attribute_section + (attribute, initial_value?, fullname?, overrides?, + %structure.model;)> +<!ATTLIST instance_attribute_section %basic.atts;> + +<!-- + Section Subelements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!ELEMENT fullname + (package | module | class | method | function | attribute)+> +<!ATTLIST fullname %basic.atts;> + +<!ELEMENT import_list (import_item+)> +<!ATTLIST import_list %basic.atts;> + +<!-- +Support ``import module``, ``import module as alias``, ``from module +import identifier``, and ``from module import identifier as alias``. +--> +<!ELEMENT import_item (fullname, identifier?, alias?)> +<!ATTLIST import_item %basic.atts;> + +<!ELEMENT inheritance_list (class+)> +<!ATTLIST inheritance_list %basic.atts;> + +<!ELEMENT subclasses (class+)> +<!ATTLIST subclasses %basic.atts;> + +<!ELEMENT parameter_list + ((parameter_item+, optional_parameters*) | optional_parameters+)> +<!ATTLIST parameter_list %basic.atts;> + +<!ELEMENT parameter_item + ((parameter | parameter_tuple), parameter_default?)> +<!ATTLIST parameter_item %basic.atts;> + +<!ELEMENT optional_parameters (parameter_item+, optional_parameters*)> +<!ATTLIST optional_parameters %basic.atts;> + +<!ELEMENT parameter_tuple (parameter | parameter_tuple)+> +<!ATTLIST parameter_tuple %basic.atts;> + +<!ELEMENT parameter_default (#PCDATA)> +<!ATTLIST parameter_default %basic.atts;> + +<!ELEMENT overrides (fullname+)> +<!ATTLIST overrides %basic.atts;> + +<!ELEMENT initial_value (#PCDATA)> +<!ATTLIST initial_value %basic.atts;> + +<!-- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Additional Inline Elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--> + +<!-- Also used as the `package_section` identifier/title. --> +<!ELEMENT package (#PCDATA)> +<!ATTLIST package + %basic.atts; + %reference.atts;> + +<!-- Also used as the `module_section` identifier/title. --> +<!ELEMENT module (#PCDATA)> +<!ATTLIST module + %basic.atts; + %reference.atts;> + +<!-- +Also used as the `class_section` identifier/title, and in the +`inheritance` element. +--> +<!ELEMENT class (#PCDATA)> +<!ATTLIST class + %basic.atts; + %reference.atts;> + +<!-- Also used as the `method_section` identifier/title. --> +<!ELEMENT method (#PCDATA)> +<!ATTLIST method + %basic.atts; + %reference.atts;> + +<!-- Also used as the `function_section` identifier/title. --> +<!ELEMENT function (#PCDATA)> +<!ATTLIST function + %basic.atts; + %reference.atts;> + +<!-- +??? Use this instead of the ``*_attribute`` elements below? Add a +"type" attribute to differentiate? + +Also used as the identifier/title for `module_attribute_section`, +`class_attribute_section`, and `instance_attribute_section`. +--> +<!ELEMENT attribute (#PCDATA)> +<!ATTLIST attribute + %basic.atts; + %reference.atts;> + +<!-- +Also used as the `module_attribute_section` identifier/title. A module +attribute is an exported module-level global variable. +--> +<!ELEMENT module_attribute (#PCDATA)> +<!ATTLIST module_attribute + %basic.atts; + %reference.atts;> + +<!-- Also used as the `class_attribute_section` identifier/title. --> +<!ELEMENT class_attribute (#PCDATA)> +<!ATTLIST class_attribute + %basic.atts; + %reference.atts;> + +<!-- +Also used as the `instance_attribute_section` identifier/title. +--> +<!ELEMENT instance_attribute (#PCDATA)> +<!ATTLIST instance_attribute + %basic.atts; + %reference.atts;> + +<!ELEMENT variable (#PCDATA)> +<!ATTLIST variable + %basic.atts; + %reference.atts;> + +<!-- Also used in `parameter_list`. --> +<!ELEMENT parameter (#PCDATA)> +<!ATTLIST parameter + %basic.atts; + %reference.atts; + excess_positional %yesorno; #IMPLIED + excess_keyword %yesorno; #IMPLIED> + +<!ELEMENT type (#PCDATA)> +<!ATTLIST type + %basic.atts; + %reference.atts;> + +<!ELEMENT exception_class (#PCDATA)> +<!ATTLIST exception_class + %basic.atts; + %reference.atts;> + +<!ELEMENT warning_class (#PCDATA)> +<!ATTLIST warning_class + %basic.atts; + %reference.atts;> + +<!-- +Local Variables: +mode: sgml +indent-tabs-mode: nil +fill-column: 70 +End: +--> diff --git a/docs/dev/pysource.txt b/docs/dev/pysource.txt new file mode 100644 index 000000000..6f173a709 --- /dev/null +++ b/docs/dev/pysource.txt @@ -0,0 +1,130 @@ +====================== + Python Source Reader +====================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +This document explores issues around extracting and processing +docstrings from Python modules. + +For definitive element hierarchy details, see the "Python Plaintext +Document Interface DTD" XML document type definition, pysource.dtd_ +(which modifies the generic docutils.dtd_). Descriptions below list +'DTD elements' (XML 'generic identifiers' or tag names) corresponding +to syntax constructs. + + +.. contents:: + + +Model +===== + +The Python Source Reader ("PySource") model that's evolving in my mind +goes something like this: + +1. Extract the docstring/namespace [#]_ tree from the module(s) and/or + package(s). + + .. [#] See `Docstring Extractor`_ below. + +2. Run the parser on each docstring in turn, producing a forest of + doctrees (per nodes.py). + +3. Join the docstring trees together into a single tree, running + transforms: + + - merge hyperlinks + - merge namespaces + - create various sections like "Module Attributes", "Functions", + "Classes", "Class Attributes", etc.; see pysource.dtd_ + - convert the above special sections to ordinary doctree nodes + +4. Run transforms on the combined doctree. Examples: resolving + cross-references/hyperlinks (including interpreted text on Python + identifiers); footnote auto-numbering; first field list -> + bibliographic elements. + + (Or should step 4's transforms come before step 3?) + +5. Pass the resulting unified tree to the writer/builder. + +I've had trouble reconciling the roles of input parser and output +writer with the idea of modes ("readers" or "directors"). Does the +mode govern the tranformation of the input, the output, or both? +Perhaps the mode should be split into two. + +For example, say the source of our input is a Python module. Our +"input mode" should be the "Python Source Reader". It discovers (from +``__docformat__``) that the input parser is "reStructuredText". If we +want HTML, we'll specify the "HTML" output formatter. But there's a +piece missing. What *kind* or *style* of HTML output do we want? +PyDoc-style, LibRefMan style, etc. (many people will want to specify +and control their own style). Is the output style specific to a +particular output format (XML, HTML, etc.)? Is the style specific to +the input mode? Or can/should they be independent? + +I envision interaction between the input parser, an "input mode" , and +the output formatter. The same intermediate data format would be used +between each of these, being transformed as it progresses. + + +Docstring Extractor +=================== + +We need code that scans a parsed Python module, and returns an ordered +tree containing the names, docstrings (including attribute and +additional docstrings), and additional info (in parentheses below) of +all of the following objects: + +- packages +- modules +- module attributes (+ values) +- classes (+ inheritance) +- class attributes (+ values) +- instance attributes (+ values) +- methods (+ formal parameters & defaults) +- functions (+ formal parameters & defaults) + +(Extract comments too? For example, comments at the start of a module +would be a good place for bibliographic field lists.) + +In order to evaluate interpreted text cross-references, namespaces for +each of the above will also be required. + +See python-dev/docstring-develop thread "AST mining", started on +2001-08-14. + + +Interpreted Text +================ + +DTD elements: package, module, class, method, function, +module_attribute, class_attribute, instance_attribute, variable, +parameter, type, exception_class, warning_class. + +To classify identifiers explicitly, the role is given along with the +identifier in either prefix or suffix form:: + + Use :method:`Keeper.storedata` to store the object's data in + `Keeper.data`:instance_attribute:. + +The role may be one of 'package', 'module', 'class', 'method', +'function', 'module_attribute', 'class_attribute', +'instance_attribute', 'variable', 'parameter', 'type', +'exception_class', 'exception', 'warning_class', or 'warning'. Other +roles may be defined. + +.. _pysource.dtd: pysource.dtd +.. _docutils.dtd: ../ref/docutils.dtd + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + fill-column: 70 + End: diff --git a/docs/dev/release.txt b/docs/dev/release.txt new file mode 100644 index 000000000..fa58bc46f --- /dev/null +++ b/docs/dev/release.txt @@ -0,0 +1,168 @@ +============================= + Docutils_ Release Procedure +============================= +:Author: David Goodger; Felix Wiemann; open to all Docutils developers +:Contact: goodger@python.org +:Date: $Date$ +:Revision: $Revision$ +:Copyright: This document has been placed in the public domain. + +.. _Docutils: http://docutils.sourceforge.net/ + + +(Steps in boldface text are *not* covered by the release script at +sandbox/fwiemann/release.sh. "Not covered" means that you aren't even +reminded of them. Note: The release.sh script needs to be updated to +reflect the recent move to Subversion!) + +* **Announce a check-in freeze on Docutils-develop. Post a list of + major changes since the last release and ask for additions.** + + .. _CHANGES.txt: + + **You may want to save this list of changes in a file + (e.g. CHANGES.txt) to have it at hand when you need it for posting + announcements or pasting it into forms.** + +* Change ``__version_details__`` in docutils/docutils/__init__.py to + "release" (from "repository"). + +* Bump the _`version number` in the following files: + + + docutils/setup.py + + docutils/docutils/__init__.py + + docutils/test/functional/expected/* ("Generator: Docutils X.Y.Z") + +* Close the "Changes Since ..." section in docutils/HISTORY.txt. + +* Clear/unset the PYTHONPATH environment variable. + +* Create the release tarball: + + (a) Create a new empty directory and ``cd`` into it. + + (b) Get a clean snapshot of the main tree:: + + svn export svn://svn.berlios.de/docutils/trunk/docutils + + (c) Use Distutils to create the release tarball:: + + cd docutils + python setup.py sdist + +* Expand and _`install` the release tarball in isolation: + + (a) Expand the tarball in a new location, not over any existing + files. + + (b) Remove the old installation from site-packages (including + roman.py, and optparse.py, textwrap.py). + + Install from expanded directory:: + + cd docutils-X.Y.Z + python setup.py install + + The "install" command may require root permissions. + + (c) Repeat step b) for all supported Python versions. + +* Run the _`test suite` from the expanded archive directory with all + supported Python versions: ``cd test ; python -u alltests.py``. + +* Add a directory X.Y.Z (where X.Y.Z is the current version number + of Docutils) in the webroot (i.e. the ``htdocs/`` directory). + Put all documentation files into it:: + + cd docutils-X.Y.Z + rm -rf build + cd tools/ + ./buildhtml.py .. + cd .. + find -name test -type d -prune -o -name \*.css -print0 \ + -o -name \*.html -print0 -o -name \*.txt -print0 \ + | tar -cjvf docutils-docs.tar.bz2 -T - --null + scp docutils-docs.tar.bz2 <username>@shell.sourceforge.net: + + Now log in to shell.sourceforge.net and:: + + cd /home/groups/d/do/docutils/htdocs/ + mkdir -m g+rwxs X.Y.Z + cd X.Y.Z + tar -xjvf ~/docutils-docs.tar.bz2 + rm ~/docutils-docs.tar.bz2 + +* Upload the release tarball:: + + $ ftp upload.sourceforge.net + Connected to osdn.dl.sourceforge.net. + ... + Name (upload.sourceforge.net:david): anonymous + 331 Anonymous login ok, send your complete e-mail address as password. + Password: + ... + 230 Anonymous access granted, restrictions apply. + ftp> bin + 200 Type set to I. + ftp> cd /incoming + 250 CWD command successful. + ftp> put docutils-X.Y.Z.tar.gz + +* Access the _`file release system` on SourceForge (Admin + interface). Fill in the fields: + + :Package ID: docutils + :Release Name: <use release number only, e.g. 0.3> + :Release Date: <today's date> + :Status: Active + :File Name: <select the file just uploaded> + :File Type: Source .gz + :Processor Type: Platform-Independent + :Release Notes: <insert README.txt file here> + :Change Log: <insert summary from CHANGES.txt_> + + Also check the "Preserve my pre-formatted text" box. + +* For verifying the integrity of the release, download the release + tarball (you may need to wait up to 30 minutes), install_ it, and + re-run the `test suite`_. + +* Register with PyPI (``python setup.py register``). + +* Restore ``__version_details__`` in docutils/docutils/__init__.py to + "repository" (from "release"). + +* Bump the `version number`_ again. + +* Add a new empty section "Changes Since ..." in HISTORY.txt. + +* Update the web page (web/index.txt). + +* Run docutils-update on the server. + +* **Send announcement email to:** + + * docutils-develop@lists.sourceforge.net (also announcing the end + of the check-in freeze) + * docutils-users@lists.sourceforge.net + * doc-sig@python.org + * python-announce@python.org + +* **Add a SourceForge News item, with title "Docutils X.Y.Z released" + and containing the release tarball's download URL.** + +* **Register with FreshMeat.** (Add a `new release`__ for the + `Docutils project`__). + + __ http://freshmeat.net/add-release/48702/ + __ http://freshmeat.net/projects/docutils/ + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/repository.txt b/docs/dev/repository.txt new file mode 100644 index 000000000..2c613b10e --- /dev/null +++ b/docs/dev/repository.txt @@ -0,0 +1,217 @@ +===================================== + The Docutils_ Subversion Repository +===================================== + +:Author: Felix Wiemann +:Contact: Felix.Wiemann@ososo.de +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +.. _Docutils: http://docutils.sourceforge.net/ + +.. contents:: + +Docutils uses a Subversion_ repository located at ``svn.berlios.de``. +Subversion is exhaustively documented in the `Subversion Book`_ +(svnbook). + +.. _Subversion: http://subversion.tigris.org/ +.. _Subversion Book: http://svnbook.red-bean.com/ + +.. Note:: + + While the repository resides at BerliOS, all other project data + (web site, snapshots, releases, mailing lists, trackers) is hosted + at SourceForge. + +For the project policy on repository use (check-in requirements, +branching, etc.), please see the `Docutils Project Policies`__. + +__ policies.html#subversion-repository + + +Accessing the Repository +======================== + +Web Access +---------- + +The repository can be browsed and examined via the web at +http://svn.berlios.de/viewcvs/docutils/. + + +Anonymous Access +---------------- + +Anonymous (read-only) access is available at ``svn://svn.berlios.de/docutils/``. + +To check out the current main source tree of Docutils, type :: + + svn checkout svn://svn.berlios.de/docutils/trunk/docutils + +To check out everything (main tree, sandboxes, and web site), type :: + + svn checkout svn://svn.berlios.de/docutils/trunk docutils + +This will create a working copy of the whole trunk in a new directory +called ``docutils``. + +If you cannot use the ``svn`` port, you can also use the HTTP access +method by substituting "http://svn.berlios.de/svnroot/repos" for +"svn://svn.berlios.de". + +Note that you should *not* check out ``svn://svn.berlios.de/docutils`` +(without "trunk"), because then you'd end up fetching the whole +Docutils tree for every branch and tag over and over again, wasting +your and BerliOS's bandwidth. + +To update your working copy later on, cd into the working copy and +type :: + + svn update + + +Developer Access +---------------- + +(Developers who had write-access for Docutils' CVS repository on +SourceForge.net should `register at BerliOS`__ and send a message with +their BerliOS user name to `Felix Wiemann <Felix.Wiemann@ososo.de>`_.) + +__ https://developer.berlios.de/account/register.php + +If you are a developer, you get read-write access via +``svn+ssh://<user>@svn.berlios.de/svnroot/repos/docutils/``, where +``<user>`` is your BerliOS user account name. So to retrieve a +working copy, type :: + + svn checkout svn+ssh://<user>@svn.berlios.de/svnroot/repos/docutils/trunk \ + docutils + +If you previously had an anonymous working copy and gained developer +access, you can switch the URL associated with your working copy by +typing :: + + svn switch --relocate svn://svn.berlios.de/docutils/trunk/docutils \ + svn+ssh://<user>@svn.berlios.de/svnroot/repos/docutils + +(Again, ``<user>`` is your BerliOS user account name.) + +If you cannot use the ``ssh`` port, you can also use the HTTPS access +method by substituting "https://svn.berlios.de" for +"svn+ssh://svn.berlios.de". + + +Setting Up Your Subversion Client For Development +````````````````````````````````````````````````` + +Before commiting changes to the repository, please ensure that the +following lines are contained (and uncommented) in your +~/.subversion/config file, so that new files are added with the +correct properties set:: + + [miscellany] + # For your convenience: + global-ignores = ... *.pyc ... + # For correct properties: + enable-auto-props = yes + + [auto-props] + *.py = svn:eol-style=native;svn:keywords=Author Date Id Revision + *.txt = svn:eol-style=native;svn:keywords=Author Date Id Revision + *.html = svn:eol-style=native;svn:keywords=Author Date Id Revision + *.xml = svn:eol-style=native;svn:keywords=Author Date Id Revision + *.tex = svn:eol-style=native;svn:keywords=Author Date Id Revision + *.css = svn:eol-style=native;svn:keywords=Author Date Id Revision + *.patch = svn:eol-style=native + *.sh = svn:eol-style=native;svn:executable;svn:keywords=Author Date Id Revision + *.png = svn:mime-type=image/png + *.jpg = svn:mime-type=image/jpeg + *.gif = svn:mime-type=image/gif + + +Setting Up SSH Access +````````````````````` + +With a public & private key pair, you can access the shell and +Subversion servers without having to enter your password. There are +two places to add your SSH public key on BerliOS: your web account and +your shell account. + +* Adding your SSH key to your BerliOS web account: + + 1. Log in on the web at https://developer.berlios.de/. Create your + account first if necessary. You should be taken to your "My + Personal Page" (https://developer.berlios.de/my/). + + 2. Choose "Account Options" from the menu below the top banner. + + 3. At the bottom of the "Account Maintenance" page + (https://developer.berlios.de/account/) you'll find a "Shell + Account Information" section; click on "[Edit Keys]". + + 4. Copy and paste your SSH public key into the edit box on this page + (https://developer.berlios.de/account/editsshkeys.php). Further + instructions are available on this page. + +* Adding your SSH key to your BerliOS shell account: + + 1. Log in to the BerliOS shell server:: + + ssh <user>@shell.berlios.de + + You'll be asked for your password, which you set when you created + your account. + + 2. Create a .ssh directory in your home directory, and remove + permissions for group & other:: + + mkdir .ssh + chmod og-rwx .ssh + + Exit the SSH session. + + 3. Copy your public key to the .ssh directory on BerliOS:: + + scp .ssh/id_dsa.pub <user>@shell.berlios.de:.ssh/authorized_keys + + Now you should be able to start an SSH session without needing your + password. + + +Repository Layout +================= + +The following tree shows the repository layout:: + + docutils/ + |-- branches/ + | |-- branch1/ + | | |-- docutils/ + | | |-- sandbox/ + | | `-- web/ + | `-- branch2/ + | |-- docutils/ + | |-- sandbox/ + | `-- web/ + |-- tags/ + | |-- tag1/ + | | |-- docutils/ + | | |-- sandbox/ + | | `-- web/ + | `-- tag2/ + | |-- docutils/ + | |-- sandbox/ + | `-- web/ + `-- trunk/ + |-- docutils/ + |-- sandbox/ + `-- web/ + +``docutils/branches/`` and ``docutils/tags/`` contain (shallow) copies +of the whole trunk. + +The main source tree lives at ``docutils/trunk/docutils/``, next to +the sandboxes (``docutils/trunk/sandbox/``) and the web site files +(``docutils/trunk/web/``). diff --git a/docs/dev/rst/alternatives.txt b/docs/dev/rst/alternatives.txt new file mode 100644 index 000000000..12874c5fb --- /dev/null +++ b/docs/dev/rst/alternatives.txt @@ -0,0 +1,3129 @@ +================================================== + A Record of reStructuredText Syntax Alternatives +================================================== + +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +The following are ideas, alternatives, and justifications that were +considered for reStructuredText syntax, which did not originate with +Setext_ or StructuredText_. For an analysis of constructs which *did* +originate with StructuredText or Setext, please see `Problems With +StructuredText`_. See the `reStructuredText Markup Specification`_ +for full details of the established syntax. + +The ideas are divided into sections: + +* Implemented_: already done. The issues and alternatives are + recorded here for posterity. + +* `Not Implemented`_: these ideas won't be implemented. + +* Tabled_: these ideas should be revisited in the future. + +* `To Do`_: these ideas should be implemented. They're just waiting + for a champion to resolve issues and get them done. + +* `... Or Not To Do?`_: possible but questionable. These probably + won't be implemented, but you never know. + +.. _Setext: http://docutils.sourceforge.net/mirror/setext.html +.. _StructuredText: + http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/FrontPage +.. _Problems with StructuredText: problems.html +.. _reStructuredText Markup Specification: + ../../ref/rst/restructuredtext.html + + +.. contents:: + +------------- + Implemented +------------- + +Field Lists +=========== + +Prior to the syntax for field lists being finalized, several +alternatives were proposed. + +1. Unadorned RFC822_ everywhere:: + + Author: Me + Version: 1 + + Advantages: clean, precedent (RFC822-compliant). Disadvantage: + ambiguous (these paragraphs are a prime example). + + Conclusion: rejected. + +2. Special case: use unadorned RFC822_ for the very first or very last + text block of a document:: + + """ + Author: Me + Version: 1 + + The rest of the document... + """ + + Advantages: clean, precedent (RFC822-compliant). Disadvantages: + special case, flat (unnested) field lists only, still ambiguous:: + + """ + Usage: cmdname [options] arg1 arg2 ... + + We obviously *don't* want the like above to be interpreted as a + field list item. Or do we? + """ + + Conclusion: rejected for the general case, accepted for specific + contexts (PEPs, email). + +3. Use a directive:: + + .. fields:: + + Author: Me + Version: 1 + + Advantages: explicit and unambiguous, RFC822-compliant. + Disadvantage: cumbersome. + + Conclusion: rejected for the general case (but such a directive + could certainly be written). + +4. Use Javadoc-style:: + + @Author: Me + @Version: 1 + @param a: integer + + Advantages: unambiguous, precedent, flexible. Disadvantages: + non-intuitive, ugly, not RFC822-compliant. + + Conclusion: rejected. + +5. Use leading colons:: + + :Author: Me + :Version: 1 + + Advantages: unambiguous, obvious (*almost* RFC822-compliant), + flexible, perhaps even elegant. Disadvantages: no precedent, not + quite RFC822-compliant. + + Conclusion: accepted! + +6. Use double colons:: + + Author:: Me + Version:: 1 + + Advantages: unambiguous, obvious? (*almost* RFC822-compliant), + flexible, similar to syntax already used for literal blocks and + directives. Disadvantages: no precedent, not quite + RFC822-compliant, similar to syntax already used for literal blocks + and directives. + + Conclusion: rejected because of the syntax similarity & conflicts. + +Why is RFC822 compliance important? It's a universal Internet +standard, and super obvious. Also, I'd like to support the PEP format +(ulterior motive: get PEPs to use reStructuredText as their standard). +But it *would* be easy to get used to an alternative (easy even to +convert PEPs; probably harder to convert python-deviants ;-). + +Unfortunately, without well-defined context (such as in email headers: +RFC822 only applies before any blank lines), the RFC822 format is +ambiguous. It is very common in ordinary text. To implement field +lists unambiguously, we need explicit syntax. + +The following question was posed in a footnote: + + Should "bibliographic field lists" be defined at the parser level, + or at the DPS transformation level? In other words, are they + reStructuredText-specific, or would they also be applicable to + another (many/every other?) syntax? + +The answer is that bibliographic fields are a +reStructuredText-specific markup convention. Other syntaxes may +implement the bibliographic elements explicitly. For example, there +would be no need for such a transformation for an XML-based markup +syntax. + +.. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt + + +Interpreted Text "Roles" +======================== + +The original purpose of interpreted text was as a mechanism for +descriptive markup, to describe the nature or role of a word or +phrase. For example, in XML we could say "<function>len</function>" +to mark up "len" as a function. It is envisaged that within Python +docstrings (inline documentation in Python module source files, the +primary market for reStructuredText) the role of a piece of +interpreted text can be inferred implicitly from the context of the +docstring within the program source. For other applications, however, +the role may have to be indicated explicitly. + +Interpreted text is enclosed in single backquotes (`). + +1. Initially, it was proposed that an explicit role could be indicated + as a word or phrase within the enclosing backquotes: + + - As a prefix, separated by a colon and whitespace:: + + `role: interpreted text` + + - As a suffix, separated by whitespace and a colon:: + + `interpreted text :role` + + There are problems with the initial approach: + + - There could be ambiguity with interpreted text containing colons. + For example, an index entry of "Mission: Impossible" would + require a backslash-escaped colon. + + - The explicit role is descriptive markup, not content, and will + not be visible in the processed output. Putting it inside the + backquotes doesn't feel right; the *role* isn't being quoted. + +2. Tony Ibbs suggested that the role be placed outside the + backquotes:: + + role:`prefix` or `suffix`:role + + This removes the embedded-colons ambiguity, but limits the role + identifier to be a single word (whitespace would be illegal). + Since roles are not meant to be visible after processing, the lack + of whitespace support is not important. + + The suggested syntax remains ambiguous with respect to ratios and + some writing styles. For example, suppose there is a "signal" + identifier, and we write:: + + ...calculate the `signal`:noise ratio. + + "noise" looks like a role. + +3. As an improvement on #2, we can bracket the role with colons:: + + :role:`prefix` or `suffix`:role: + + This syntax is similar to that of field lists, which is fine since + both are doing similar things: describing. + + This is the syntax chosen for reStructuredText. + +4. Another alternative is two colons instead of one:: + + role::`prefix` or `suffix`::role + + But this is used for analogies ("A:B::C:D": "A is to B as C is to + D"). + + Both alternative #2 and #4 lack delimiters on both sides of the + role, making it difficult to parse (by the reader). + +5. Some kind of bracketing could be used: + + - Parentheses:: + + (role)`prefix` or `suffix`(role) + + - Braces:: + + {role}`prefix` or `suffix`{role} + + - Square brackets:: + + [role]`prefix` or `suffix`[role] + + - Angle brackets:: + + <role>`prefix` or `suffix`<role> + + (The overlap of \*ML tags with angle brackets would be too + confusing and precludes their use.) + +Syntax #3 was chosen for reStructuredText. + + +Comments +======== + +A problem with comments (actually, with all indented constructs) is +that they cannot be followed by an indented block -- a block quote -- +without swallowing it up. + +I thought that perhaps comments should be one-liners only. But would +this mean that footnotes, hyperlink targets, and directives must then +also be one-liners? Not a good solution. + +Tony Ibbs suggested a "comment" directive. I added that we could +limit a comment to a single text block, and that a "multi-block +comment" could use "comment-start" and "comment-end" directives. This +would remove the indentation incompatibility. A "comment" directive +automatically suggests "footnote" and (hyperlink) "target" directives +as well. This could go on forever! Bad choice. + +Garth Kidd suggested that an "empty comment", a ".." explicit markup +start with nothing on the first line (except possibly whitespace) and +a blank line immediately following, could serve as an "unindent". An +empty comment does **not** swallow up indented blocks following it, +so block quotes are safe. "A tiny but practical wart." Accepted. + + +Anonymous Hyperlinks +==================== + +Alan Jaffray came up with this idea, along with the following syntax:: + + Search the `Python DOC-SIG mailing list archives`{}_. + + .. _: http://mail.python.org/pipermail/doc-sig/ + +The idea is sound and useful. I suggested a "double underscore" +syntax:: + + Search the `Python DOC-SIG mailing list archives`__. + + .. __: http://mail.python.org/pipermail/doc-sig/ + +But perhaps single underscores are okay? The syntax looks better, but +the hyperlink itself doesn't explicitly say "anonymous":: + + Search the `Python DOC-SIG mailing list archives`_. + + .. _: http://mail.python.org/pipermail/doc-sig/ + +Mixing anonymous and named hyperlinks becomes confusing. The order of +targets is not significant for named hyperlinks, but it is for +anonymous hyperlinks:: + + Hyperlinks: anonymous_, named_, and another anonymous_. + + .. _named: named + .. _: anonymous1 + .. _: anonymous2 + +Without the extra syntax of double underscores, determining which +hyperlink references are anonymous may be difficult. We'd have to +check which references don't have corresponding targets, and match +those up with anonymous targets. Keeping to a simple consistent +ordering (as with auto-numbered footnotes) seems simplest. + +reStructuredText will use the explicit double-underscore syntax for +anonymous hyperlinks. An alternative (see `Reworking Explicit Markup +(Round 1)`_ below) for the somewhat awkward ".. __:" syntax is "__":: + + An anonymous__ reference. + + __ http://anonymous + + +Reworking Explicit Markup (Round 1) +=================================== + +Alan Jaffray came up with the idea of `anonymous hyperlinks`_, added +to reStructuredText. Subsequently it was asserted that hyperlinks +(especially anonymous hyperlinks) would play an increasingly important +role in reStructuredText documents, and therefore they require a +simpler and more concise syntax. This prompted a review of the +current and proposed explicit markup syntaxes with regards to +improving usability. + +1. Original syntax:: + + .. _blah: internal hyperlink target + .. _blah: http://somewhere external hyperlink target + .. _blah: blahblah_ indirect hyperlink target + .. __: anonymous internal target + .. __: http://somewhere anonymous external target + .. __: blahblah_ anonymous indirect target + .. [blah] http://somewhere footnote + .. blah:: http://somewhere directive + .. blah: http://somewhere comment + + .. Note:: + + The comment text was intentionally made to look like a hyperlink + target. + + Origins: + + * Except for the colon (a delimiter necessary to allow for + phrase-links), hyperlink target ``.. _blah:`` comes from Setext. + * Comment syntax from Setext. + * Footnote syntax from StructuredText ("named links"). + * Directives and anonymous hyperlinks original to reStructuredText. + + Advantages: + + + Consistent explicit markup indicator: "..". + + Consistent hyperlink syntax: ".. _" & ":". + + Disadvantages: + + - Anonymous target markup is awkward: ".. __:". + - The explicit markup indicator ("..") is excessively overloaded? + - Comment text is limited (can't look like a footnote, hyperlink, + or directive). But this is probably not important. + +2. Alan Jaffray's proposed syntax #1:: + + __ _blah internal hyperlink target + __ blah: http://somewhere external hyperlink target + __ blah: blahblah_ indirect hyperlink target + __ anonymous internal target + __ http://somewhere anonymous external target + __ blahblah_ anonymous indirect target + __ [blah] http://somewhere footnote + .. blah:: http://somewhere directive + .. blah: http://somewhere comment + + The hyperlink-connoted underscores have become first-level syntax. + + Advantages: + + + Anonymous targets are simpler. + + All hyperlink targets are one character shorter. + + Disadvantages: + + - Inconsistent internal hyperlink targets. Unlike all other named + hyperlink targets, there's no colon. There's an extra leading + underscore, but we can't drop it because without it, "blah" looks + like a relative URI. Unless we restore the colon:: + + __ blah: internal hyperlink target + + - Obtrusive markup? + +3. Alan Jaffray's proposed syntax #2:: + + .. _blah internal hyperlink target + .. blah: http://somewhere external hyperlink target + .. blah: blahblah_ indirect hyperlink target + .. anonymous internal target + .. http://somewhere anonymous external target + .. blahblah_ anonymous indirect target + .. [blah] http://somewhere footnote + !! blah: http://somewhere directive + ## blah: http://somewhere comment + + Leading underscores have been (almost) replaced by "..", while + comments and directives have gained their own syntax. + + Advantages: + + + Anonymous hyperlinks are simpler. + + Unique syntax for comments. Connotation of "comment" from + some programming languages (including our favorite). + + Unique syntax for directives. Connotation of "action!". + + Disadvantages: + + - Inconsistent internal hyperlink targets. Again, unlike all other + named hyperlink targets, there's no colon. There's a leading + underscore, matching the trailing underscores of references, + which no other hyperlink targets have. We can't drop that one + leading underscore though: without it, "blah" looks like a + relative URI. Again, unless we restore the colon:: + + .. blah: internal hyperlink target + + - All (except for internal) hyperlink targets lack their leading + underscores, losing the "hyperlink" connotation. + + - Obtrusive syntax for comments. Alternatives:: + + ;; blah: http://somewhere + (also comment syntax in Lisp & others) + ,, blah: http://somewhere + ("comma comma": sounds like "comment"!) + + - Iffy syntax for directives. Alternatives? + +4. Tony Ibbs' proposed syntax:: + + .. _blah: internal hyperlink target + .. _blah: http://somewhere external hyperlink target + .. _blah: blahblah_ indirect hyperlink target + .. anonymous internal target + .. http://somewhere anonymous external target + .. blahblah_ anonymous indirect target + .. [blah] http://somewhere footnote + .. blah:: http://somewhere directive + .. blah: http://somewhere comment + + This is the same as the current syntax, except for anonymous + targets which drop their "__: ". + + Advantage: + + + Anonymous targets are simpler. + + Disadvantages: + + - Anonymous targets lack their leading underscores, losing the + "hyperlink" connotation. + - Anonymous targets are almost indistinguishable from comments. + (Better to know "up front".) + +5. David Goodger's proposed syntax: Perhaps going back to one of + Alan's earlier suggestions might be the best solution. How about + simply adding "__ " as a synonym for ".. __: " in the original + syntax? These would become equivalent:: + + .. __: anonymous internal target + .. __: http://somewhere anonymous external target + .. __: blahblah_ anonymous indirect target + + __ anonymous internal target + __ http://somewhere anonymous external target + __ blahblah_ anonymous indirect target + +Alternative 5 has been adopted. + + +Backquotes in Phrase-Links +========================== + +[From a 2001-06-05 Doc-SIG post in reply to questions from Doug +Hellmann.] + +The first draft of the spec, posted to the Doc-SIG in November 2000, +used square brackets for phrase-links. I changed my mind because: + +1. In the first draft, I had already decided on single-backquotes for + inline literal text. + +2. However, I wanted to minimize the necessity for backslash escapes, + for example when quoting Python repr-equivalent syntax that uses + backquotes. + +3. The processing of identifiers (function/method/attribute/module + etc. names) into hyperlinks is a useful feature. PyDoc recognizes + identifiers heuristically, but it doesn't take much imagination to + come up with counter-examples where PyDoc's heuristics would result + in embarassing failure. I wanted to do it deterministically, and + that called for syntax. I called this construct "interpreted + text". + +4. Leveraging off the ``*emphasis*/**strong**`` syntax, lead to the + idea of using double-backquotes as syntax. + +5. I worked out some rules for inline markup recognition. + +6. In combination with #5, double backquotes lent themselves to inline + literals, neatly satisfying #2, minimizing backslash escapes. In + fact, the spec says that no interpretation of any kind is done + within double-backquote inline literal text; backslashes do *no* + escaping within literal text. + +7. Single backquotes are then freed up for interpreted text. + +8. I already had square brackets required for footnote references. + +9. Since interpreted text will typically turn into hyperlinks, it was + a natural fit to use backquotes as the phrase-quoting syntax for + trailing-underscore hyperlinks. + +The original inspiration for the trailing underscore hyperlink syntax +was Setext. But for phrases Setext used a very cumbersome +``underscores_between_words_like_this_`` syntax. + +The underscores can be viewed as if they were right-pointing arrows: +``-->``. So ``hyperlink_`` points away from the reference, and +``.. _hyperlink:`` points toward the target. + + +Substitution Mechanism +====================== + +Substitutions arose out of a Doc-SIG thread begun on 2001-10-28 by +Alan Jaffray, "reStructuredText inline markup". It reminded me of a +missing piece of the reStructuredText puzzle, first referred to in my +contribution to "Documentation markup & processing / PEPs" (Doc-SIG +2001-06-21). + +Substitutions allow the power and flexibility of directives to be +shared by inline text. They are a way to allow arbitrarily complex +inline objects, while keeping the details out of the flow of text. +They are the equivalent of SGML/XML's named entities. For example, an +inline image (using reference syntax alternative 4d (vertical bars) +and definition alternative 3, the alternatives chosen for inclusion in +the spec):: + + The |biohazard| symbol must be used on containers used to dispose + of medical waste. + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + +The ``|biohazard|`` substitution reference will be replaced in-line by +whatever the ``.. |biohazard|`` substitution definition generates (in +this case, an image). A substitution definition contains the +substitution text bracketed with vertical bars, followed by a an +embedded inline-compatible directive, such as "image". A transform is +required to complete the substitution. + +Syntax alternatives for the reference: + +1. Use the existing interpreted text syntax, with a predefined role + such as "sub":: + + The `biohazard`:sub: symbol... + + Advantages: existing syntax, explicit. Disadvantages: verbose, + obtrusive. + +2. Use a variant of the interpreted text syntax, with a new suffix + akin to the underscore in phrase-link references:: + + (a) `name`@ + (b) `name`# + (c) `name`& + (d) `name`/ + (e) `name`< + (f) `name`:: + (g) `name`: + + + Due to incompatibility with other constructs and ordinary text + usage, (f) and (g) are not possible. + +3. Use interpreted text syntax with a fixed internal format:: + + (a) `:name:` + (b) `name:` + (c) `name::` + (d) `::name::` + (e) `%name%` + (f) `#name#` + (g) `/name/` + (h) `&name&` + (i) `|name|` + (j) `[name]` + (k) `<name>` + (l) `&name;` + (m) `'name'` + + To avoid ML confusion (k) and (l) are definitely out. Square + brackets (j) won't work in the target (the substitution definition + would be indistinguishable from a footnote). + + The ```/name/``` syntax (g) is reminiscent of "s/find/sub" + substitution syntax in ed-like languages. However, it may have a + misleading association with regexps, and looks like an absolute + POSIX path. (i) is visually equivalent and lacking the + connotations. + + A disadvantage of all of these is that they limit interpreted text, + albeit only slightly. + +4. Use specialized syntax, something new:: + + (a) #name# + (b) @name@ + (c) /name/ + (d) |name| + (e) <<name>> + (f) //name// + (g) ||name|| + (h) ^name^ + (i) [[name]] + (j) ~name~ + (k) !name! + (l) =name= + (m) ?name? + (n) >name< + + "#" (a) and "@" (b) are obtrusive. "/" (c) without backquotes + looks just like a POSIX path; it is likely for such usage to appear + in text. + + "|" (d) and "^" (h) are feasible. + +5. Redefine the trailing underscore syntax. See definition syntax + alternative 4, below. + +Syntax alternatives for the definition: + +1. Use the existing directive syntax, with a predefined directive such + as "sub". It contains a further embedded directive resolving to an + inline-compatible object:: + + .. sub:: biohazard + .. image:: biohazard.png + [height=20 width=20] + + .. sub:: parrot + That bird wouldn't *voom* if you put 10,000,000 volts + through it! + + The advantages and disadvantages are the same as in inline + alternative 1. + +2. Use syntax as in #1, but with an embedded directivecompressed:: + + .. sub:: biohazard image:: biohazard.png + [height=20 width=20] + + This is a bit better than alternative 1, but still too much. + +3. Use a variant of directive syntax, incorporating the substitution + text, obviating the need for a special "sub" directive name. If we + assume reference alternative 4d (vertical bars), the matching + definition would look like this:: + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + +4. (Suggested by Alan Jaffray on Doc-SIG from 2001-11-06.) + + Instead of adding new syntax, redefine the trailing underscore + syntax to mean "substitution reference" instead of "hyperlink + reference". Alan's example:: + + I had lunch with Jonathan_ today. We talked about Zope_. + + .. _Jonathan: lj [user=jhl] + .. _Zope: http://www.zope.org/ + + A problem with the proposed syntax is that URIs which look like + simple reference names (alphanum plus ".", "-", "_") would be + indistinguishable from substitution directive names. A more + consistent syntax would be:: + + I had lunch with Jonathan_ today. We talked about Zope_. + + .. _Jonathan: lj:: user=jhl + .. _Zope: http://www.zope.org/ + + (``::`` after ``.. _Jonathan: lj``.) + + The "Zope" target is a simple external hyperlink, but the + "Jonathan" target contains a directive. Alan proposed is that the + reference text be replaced by whatever the referenced directive + (the "directive target") produces. A directive reference becomes a + hyperlink reference if the contents of the directive target resolve + to a hyperlink. If the directive target resolves to an icon, the + reference is replaced by an inline icon. If the directive target + resolves to a hyperlink, the directive reference becomes a + hyperlink reference. + + This seems too indirect and complicated for easy comprehension. + + The reference in the text will sometimes become a link, sometimes + not. Sometimes the reference text will remain, sometimes not. We + don't know *at the reference*:: + + This is a `hyperlink reference`_; its text will remain. + This is an `inline icon`_; its text will disappear. + + That's a problem. + +The syntax that has been incorporated into the spec and parser is +reference alternative 4d with definition alternative 3:: + + The |biohazard| symbol... + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + +We can also combine substitution references with hyperlink references, +by appending a "_" (named hyperlink reference) or "__" (anonymous +hyperlink reference) suffix to the substitution reference. This +allows us to click on an image-link:: + + The |biohazard|_ symbol... + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + .. _biohazard: http://www.cdc.gov/ + +There have been several suggestions for the naming of these +constructs, originally called "substitution references" and +"substitutions". + +1. Candidate names for the reference construct: + + (a) substitution reference + (b) tagging reference + (c) inline directive reference + (d) directive reference + (e) indirect inline directive reference + (f) inline directive placeholder + (g) inline directive insertion reference + (h) directive insertion reference + (i) insertion reference + (j) directive macro reference + (k) macro reference + (l) substitution directive reference + +2. Candidate names for the definition construct: + + (a) substitution + (b) substitution directive + (c) tag + (d) tagged directive + (e) directive target + (f) inline directive + (g) inline directive definition + (h) referenced directive + (i) indirect directive + (j) indirect directive definition + (k) directive definition + (l) indirect inline directive + (m) named directive definition + (n) inline directive insertion definition + (o) directive insertion definition + (p) insertion definition + (q) insertion directive + (r) substitution definition + (s) directive macro definition + (t) macro definition + (u) substitution directive definition + (v) substitution definition + +"Inline directive reference" (1c) seems to be an appropriate term at +first, but the term "inline" is redundant in the case of the +reference. Its counterpart "inline directive definition" (2g) is +awkward, because the directive definition itself is not inline. + +"Directive reference" (1d) and "directive definition" (2k) are too +vague. "Directive definition" could be used to refer to any +directive, not just those used for inline substitutions. + +One meaning of the term "macro" (1k, 2s, 2t) is too +programming-language-specific. Also, macros are typically simple text +substitution mechanisms: the text is substituted first and evaluated +later. reStructuredText substitution definitions are evaluated in +place at parse time and substituted afterwards. + +"Insertion" (1h, 1i, 2n-2q) is almost right, but it implies that +something new is getting added rather than one construct being +replaced by another. + +Which brings us back to "substitution". The overall best names are +"substitution reference" (1a) and "substitution definition" (2v). A +long way to go to add one word! + + +Inline External Targets +======================= + +Currently reStructuredText has two hyperlink syntax variations: + +* Named hyperlinks:: + + This is a named reference_ of one word ("reference"). Here is + a `phrase reference`_. Phrase references may even cross `line + boundaries`_. + + .. _reference: http://www.example.org/reference/ + .. _phrase reference: http://www.example.org/phrase_reference/ + .. _line boundaries: http://www.example.org/line_boundaries/ + + + Advantages: + + - The plaintext is readable. + - Each target may be reused multiple times (e.g., just write + ``"reference_"`` again). + - No syncronized ordering of references and targets is necessary. + + + Disadvantages: + + - The reference text must be repeated as target names; could lead + to mistakes. + - The target URLs may be located far from the references, and hard + to find in the plaintext. + +* Anonymous hyperlinks (in current reStructuredText):: + + This is an anonymous reference__. Here is an anonymous + `phrase reference`__. Phrase references may even cross `line + boundaries`__. + + __ http://www.example.org/reference/ + __ http://www.example.org/phrase_reference/ + __ http://www.example.org/line_boundaries/ + + + Advantages: + + - The plaintext is readable. + - The reference text does not have to be repeated. + + + Disadvantages: + + - References and targets must be kept in sync. + - Targets cannot be reused. + - The target URLs may be located far from the references. + +For comparison and historical background, StructuredText also has two +syntaxes for hyperlinks: + +* First, ``"reference text":URL``:: + + This is a "reference":http://www.example.org/reference/ + of one word ("reference"). Here is a "phrase + reference":http://www.example.org/phrase_reference/. + +* Second, ``"reference text", http://example.com/absolute_URL``:: + + This is a "reference", http://www.example.org/reference/ + of one word ("reference"). Here is a "phrase reference", + http://www.example.org/phrase_reference/. + +Both syntaxes share advantages and disadvantages: + ++ Advantages: + + - The target is specified immediately adjacent to the reference. + ++ Disadvantages: + + - Poor plaintext readability. + - Targets cannot be reused. + - Both syntaxes use double quotes, common in ordinary text. + - In the first syntax, the URL and the last word are stuck + together, exacerbating the line wrap problem. + - The second syntax is too magical; text could easily be written + that way by accident (although only absolute URLs are recognized + here, perhaps because of the potential for ambiguity). + +A new type of "inline external hyperlink" has been proposed. + +1. On 2002-06-28, Simon Budig proposed__ a new syntax for + reStructuredText hyperlinks:: + + This is a reference_(http://www.example.org/reference/) of one + word ("reference"). Here is a `phrase + reference`_(http://www.example.org/phrase_reference/). Are + these examples, (single-underscore), named? If so, `anonymous + references`__(http://www.example.org/anonymous/) using two + underscores would probably be preferable. + + __ http://mail.python.org/pipermail/doc-sig/2002-June/002648.html + + The syntax, advantages, and disadvantages are similar to those of + StructuredText. + + + Advantages: + + - The target is specified immediately adjacent to the reference. + + + Disadvantages: + + - Poor plaintext readability. + - Targets cannot be reused (unless named, but the semantics are + unclear). + + + Problems: + + - The ``"`ref`_(URL)"`` syntax forces the last word of the + reference text to be joined to the URL, making a potentially + very long word that can't be wrapped (URLs can be very long). + The reference and the URL should be separate. This is a + symptom of the following point: + + - The syntax produces a single compound construct made up of two + equally important parts, *with syntax in the middle*, *between* + the reference and the target. This is unprecedented in + reStructuredText. + + - The "inline hyperlink" text is *not* a named reference (there's + no lookup by name), so it shouldn't look like one. + + - According to the IETF standards RFC 2396 and RFC 2732, + parentheses are legal URI characters and curly braces are legal + email characters, making their use prohibitively difficult. + + - The named/anonymous semantics are unclear. + +2. After an analysis__ of the syntax of (1) above, we came up with the + following compromise syntax:: + + This is an anonymous reference__ + __<http://www.example.org/reference/> of one word + ("reference"). Here is a `phrase reference`__ + __<http://www.example.org/phrase_reference/>. `Named + references`_ _<http://www.example.org/anonymous/> use single + underscores. + + __ http://mail.python.org/pipermail/doc-sig/2002-July/002670.html + + The syntax builds on that of the existing "inline internal + targets": ``an _`inline internal target`.`` + + + Advantages: + + - The target is specified immediately adjacent to the reference, + improving maintainability: + + - References and targets are easily kept in sync. + - The reference text does not have to be repeated. + + - The construct is executed in two parts: references identical to + existing references, and targets that are new but not too big a + stretch from current syntax. + + - There's overwhelming precedent for quoting URLs with angle + brackets [#]_. + + + Disadvantages: + + - Poor plaintext readability. + - Lots of "line noise". + - Targets cannot be reused (unless named; see below). + + To alleviate the readability issue slightly, we could allow the + target to appear later, such as after the end of the sentence:: + + This is a named reference__ of one word ("reference"). + __<http://www.example.org/reference/> Here is a `phrase + reference`__. __<http://www.example.org/phrase_reference/> + + Problem: this could only work for one reference at a time + (reference/target pairs must be proximate [refA trgA refB trgB], + not interleaved [refA refB trgA trgB] or nested [refA refB trgB + trgA]). This variation is too problematic; references and inline + external targets will have to be kept imediately adjacent (see (3) + below). + + The ``"reference__ __<target>"`` syntax is actually for "anonymous + inline external targets", emphasized by the double underscores. It + follows that single trailing and leading underscores would lead to + *implicitly named* inline external targets. This would allow the + reuse of targets by name. So after ``"reference_ _<target>"``, + another ``"reference_"`` would point to the same target. + + .. [#] + From RFC 2396 (URI syntax): + + The angle-bracket "<" and ">" and double-quote (") + characters are excluded [from URIs] because they are often + used as the delimiters around URI in text documents and + protocol fields. + + Using <> angle brackets around each URI is especially + recommended as a delimiting style for URI that contain + whitespace. + + From RFC 822 (email headers): + + Angle brackets ("<" and ">") are generally used to indicate + the presence of a one machine-usable reference (e.g., + delimiting mailboxes), possibly including source-routing to + the machine. + +3. If it is best for references and inline external targets to be + immediately adjacent, then they might as well be integrated. + Here's an alternative syntax embedding the target URL in the + reference:: + + This is an anonymous `reference <http://www.example.org + /reference/>`__ of one word ("reference"). Here is a `phrase + reference <http://www.example.org/phrase_reference/>`__. + + Advantages and disadvantages are similar to those in (2). + Readability is still an issue, but the syntax is a bit less + heavyweight (reduced line noise). Backquotes are required, even + for one-word references; the target URL is included within the + reference text, forcing a phrase context. + + We'll call this variant "embedded URIs". + + Problem: how to refer to a title like "HTML Anchors: <a>" (which + ends with an HTML/SGML/XML tag)? We could either require more + syntax on the target (like ``"`reference text + __<http://example.com/>`__"``), or require the odd conflicting + title to be escaped (like ``"`HTML Anchors: \<a>`__"``). The + latter seems preferable, and not too onerous. + + Similarly to (2) above, a single trailing underscore would convert + the reference & inline external target from anonymous to implicitly + named, allowing reuse of targets by name. + + I think this is the least objectionable of the syntax alternatives. + +Other syntax variations have been proposed (by Brett Cannon and Benja +Fallenstein):: + + `phrase reference`->http://www.example.com + + `phrase reference`@http://www.example.com + + `phrase reference`__ ->http://www.example.com + + `phrase reference` [-> http://www.example.com] + + `phrase reference`__ [-> http://www.example.com] + + `phrase reference` <http://www.example.com>_ + +None of these variations are clearly superior to #3 above. Some have +problems that exclude their use. + +With any kind of inline external target syntax it comes down to the +conflict between maintainability and plaintext readability. I don't +see a major problem with reStructuredText's maintainability, and I +don't want to sacrifice plaintext readability to "improve" it. + +The proponents of inline external targets want them for easily +maintainable web pages. The arguments go something like this: + +- Named hyperlinks are difficult to maintain because the reference + text is duplicated as the target name. + + To which I said, "So use anonymous hyperlinks." + +- Anonymous hyperlinks are difficult to maintain becuase the + references and targets have to be kept in sync. + + "So keep the targets close to the references, grouped after each + paragraph. Maintenance is trivial." + +- But targets grouped after paragraphs break the flow of text. + + "Surely less than URLs embedded in the text! And if the intent is + to produce web pages, not readable plaintext, then who cares about + the flow of text?" + +Many participants have voiced their objections to the proposed syntax: + + Garth Kidd: "I strongly prefer the current way of doing it. + Inline is spectactularly messy, IMHO." + + Tony Ibbs: "I vehemently agree... that the inline alternatives + being suggested look messy - there are/were good reasons they've + been taken out... I don't believe I would gain from the new + syntaxes." + + Paul Moore: "I agree as well. The proposed syntax is far too + punctuation-heavy, and any of the alternatives discussed are + ambiguous or too subtle." + +Others have voiced their support: + + fantasai: "I agree with Simon. In many cases, though certainly + not in all, I find parenthesizing the url in plain text flows + better than relegating it to a footnote." + + Ken Manheimer: "I'd like to weigh in requesting some kind of easy, + direct inline reference link." + +(Interesting that those *against* the proposal have been using +reStructuredText for a while, and those *for* the proposal are either +new to the list ["fantasai", background unknown] or longtime +StructuredText users [Ken Manheimer].) + +I was initially ambivalent/against the proposed "inline external +targets". I value reStructuredText's readability very highly, and +although the proposed syntax offers convenience, I don't know if the +convenience is worth the cost in ugliness. Does the proposed syntax +compromise readability too much, or should the choice be left up to +the author? Perhaps if the syntax is *allowed* but its use strongly +*discouraged*, for aesthetic/readability reasons? + +After a great deal of thought and much input from users, I've decided +that there are reasonable use cases for this construct. The +documentation should strongly caution against its use in most +situations, recommending independent block-level targets instead. +Syntax #3 above ("embedded URIs") will be used. + + +Doctree Representation of Transitions +===================================== + +(Although not reStructuredText-specific, this section fits best in +this document.) + +Having added the "horizontal rule" construct to the `reStructuredText +Markup Specification`_, a decision had to be made as to how to reflect +the construct in the implementation of the document tree. Given this +source:: + + Document + ======== + + Paragraph 1 + + -------- + + Paragraph 2 + +The horizontal rule indicates a "transition" (in prose terms) or the +start of a new "division". Before implementation, the parsed document +tree would be:: + + <document> + <section names="document"> + <title> + Document + <paragraph> + Paragraph 1 + -------- <--- error here + <paragraph> + Paragraph 2 + +There are several possibilities for the implementation: + +1. Implement horizontal rules as "divisions" or segments. A + "division" is a title-less, non-hierarchical section. The first + try at an implementation looked like this:: + + <document> + <section names="document"> + <title> + Document + <paragraph> + Paragraph 1 + <division> + <paragraph> + Paragraph 2 + + But the two paragraphs are really at the same level; they shouldn't + appear to be at different levels. There's really an invisible + "first division". The horizontal rule splits the document body + into two segments, which should be treated uniformly. + +2. Treating "divisions" uniformly brings us to the second + possibility:: + + <document> + <section names="document"> + <title> + Document + <division> + <paragraph> + Paragraph 1 + <division> + <paragraph> + Paragraph 2 + + With this change, documents and sections will directly contain + divisions and sections, but not body elements. Only divisions will + directly contain body elements. Even without a horizontal rule + anywhere, the body elements of a document or section would be + contained within a division element. This makes the document tree + deeper. This is similar to the way HTML_ treats document contents: + grouped within a ``<body>`` element. + +3. Implement them as "transitions", empty elements:: + + <document> + <section names="document"> + <title> + Document + <paragraph> + Paragraph 1 + <transition> + <paragraph> + Paragraph 2 + + A transition would be a "point element", not containing anything, + only identifying a point within the document structure. This keeps + the document tree flatter, but the idea of a "point element" like + "transition" smells bad. A transition isn't a thing itself, it's + the space between two divisions. However, transitions are a + practical solution. + +Solution 3 was chosen for incorporation into the document tree model. + +.. _HTML: http://www.w3.org/MarkUp/ + + +Syntax for Line Blocks +====================== + +* An early idea: How about a literal-block-like prefix, perhaps + "``;;``"? (It is, after all, a *semi-literal* literal block, no?) + Example:: + + Take it away, Eric the Orchestra Leader! ;; + + A one, two, a one two three four + + Half a bee, philosophically, + must, *ipso facto*, half not be. + But half the bee has got to be, + *vis a vis* its entity. D'you see? + + But can a bee be said to be + or not to be an entire bee, + when half the bee is not a bee, + due to some ancient injury? + + Singing... + + Kinda lame. + +* Another idea: in an ordinary paragraph, if the first line ends with + a backslash (escaping the newline), interpret the entire paragraph + as a verse block? For example:: + + Add just one backslash\ + And this paragraph becomes + An awful haiku + + (Awful, and arguably invalid, since in Japanese the word "haiku" + contains three syllables not two.) + + This idea was superceded by the rules for escaped whitespace, useful + for `character-level inline markup`_. + +* In a `2004-02-22 docutils-develop message`__, Jarno Elonen proposed + a "plain list" syntax (and also provided a patch):: + + | John Doe + | President, SuperDuper Corp. + | jdoe@example.org + + __ http://thread.gmane.org/gmane.text.docutils.devel/1187 + + This syntax is very natural. However, these "plain lists" seem very + similar to line blocks, and I see so little intrinsic "list-ness" + that I'm loathe to add a new object. I used the term "blurbs" to + remove the "list" connotation from the originally proposed name. + Perhaps line blocks could be refined to add the two properties they + currently lack: + + A) long lines wrap nicely + B) HTML output doesn't look like program code in non-CSS web + browsers + + (A) is an issue of all 3 aspects of Docutils: syntax (construct + behaviour), internal representation, and output. (B) is partly an + issue of internal representation but mostly of output. + +ReStructuredText will redefine line blocks with the "|"-quoting +syntax. The following is my current thinking. + + +Syntax +------ + +Perhaps line block syntax like this would do:: + + | M6: James Bond + | MIB: Mr. J. + | IMF: not decided yet, but probably one of the following: + | Ethan Hunt + | Jim Phelps + | Claire Phelps + | CIA: Felix Leiter + +Note that the "nested" list does not have nested syntax (the "|" are +not further indented); the leading whitespace would still be +significant somehow (more below). As for long lines in the input, +this could suffice:: + + | John Doe + | Founder, President, Chief Executive Officer, Cook, Bottle + Washer, and All-Round Great Guy + | SuperDuper Corp. + | jdoe@example.org + +The lack of "|" on the third line indicates that it's a continuation +of the second line, wrapped. + +I don't see much point in allowing arbitrary nested content. Multiple +paragraphs or bullet lists inside a "blurb" doesn't make sense to me. +Simple nested line blocks should suffice. + + +Internal Representation +----------------------- + +Line blocks are currently represented as text blobs as follows:: + + <!ELEMENT line_block %text.model;> + <!ATTLIST line_block + %basic.atts; + %fixedspace.att;> + +Instead, we could represent each line by a separate element:: + + <!ELEMENT line_block (line+)> + <!ATTLIST line_block %basic.atts;> + + <!ELEMENT line %text.model;> + <!ATTLIST line %basic.atts;> + +We'd keep the significance of the leading whitespace of each line +either by converting it to non-breaking spaces at output, or with a +per-line margin. Non-breaking spaces are simpler (for HTML, anyway) +but kludgey, and wouldn't support indented long lines that wrap. But +should inter-word whitespace (i.e., not leading whitespace) be +preserved? Currently it is preserved in line blocks. + +Representing a more complex line block may be tricky:: + + | But can a bee be said to be + | or not to be an entire bee, + | when half the bee is not a bee, + | due to some ancient injury? + +Perhaps the representation could allow for nested line blocks:: + + <!ELEMENT line_block (line | line_block)+> + +With this model, leading whitespace would no longer be significant. +Instead, left margins are implied by the nesting. The example above +could be represented as follows:: + + <line_block> + <line> + But can a bee be said to be + <line_block> + <line> + or not to be an entire bee, + <line_block> + <line> + when half the bee is not a bee, + <line_block> + <line> + due to some ancient injury? + +I wasn't sure what to do about even more complex line blocks:: + + | Indented + | Not indented + | Indented a bit + | A bit more + | Only one space + +How should that be parsed and nested? Should the first line have +the same nesting level (== indentation in the output) as the fourth +line, or the same as the last line? Mark Nodine suggested that such +line blocks be parsed similarly to complexly-nested block quotes, +which seems reasonable. In the example above, this would result in +the nesting of first line matching the last line's nesting. In +other words, the nesting would be relative to neighboring lines +only. + + +Output +------ + +In HTML, line blocks are currently output as "<pre>" blocks, which +gives us significant whitespace and line breaks, but doesn't allow +long lines to wrap and causes monospaced output without stylesheets. +Instead, we could output "<div>" elements parallelling the +representation above, where each nested <div class="line_block"> would +have an increased left margin (specified in the stylesheet). + +Jarno suggested the following HTML output:: + + <div class="line_block"> + <span class="line">First, top level line</span><br class="hidden"/> + <div class="line_block"><span class="hidden"> </span> + <span class="line">Second, once nested</span><br class="hidden"/> + <span class="line">Third, once nested</span><br class="hidden"/> + ... + </div> + ... + </div> + +The ``<br class="hidden" />`` and ``<span +class="hidden"> </span>`` are meant to support non-CSS and +non-graphical browsers. I understand the case for "br", but I'm not +so sure about hidden " ". I question how much effort should be +put toward supporting non-graphical and especially non-CSS browsers, +at least for html4css1.py output. + +Should the lines themselves be ``<span>`` or ``<div>``? I don't like +mixing inline and block-level elements. + + +Implementation Plan +------------------- + +We'll leave the old implementation in place (via the "line-block" +directive only) until all Writers have been updated to support the new +syntax & implementation. The "line-block" directive can then be +updated to use the new internal representation, and its documentation +will be updated to recommend the new syntax. + + +List-Driven Tables +================== + +The original idea came from Dylan Jay: + + ... to use a two level bulleted list with something to + indicate it should be rendered as a table ... + +It's an interesting idea. It could be implemented in as a directive +which transforms a uniform two-level list into a table. Using a +directive would allow the author to explicitly set the table's +orientation (by column or by row), the presence of row headers, etc. + +Alternatives: + +1. (Implemented in Docutils 0.3.8). + + Bullet-list-tables might look like this:: + + .. list-table:: + + * - Treat + - Quantity + - Description + * - Albatross! + - 299 + - On a stick! + * - Crunchy Frog! + - 1499 + - If we took the bones out, it wouldn't be crunchy, + now would it? + * - Gannet Ripple! + - 199 + - On a stick! + + This list must be written in two levels. This wouldn't work:: + + .. list-table:: + + * Treat + * Albatross! + * Gannet! + * Crunchy Frog! + + * Quantity + * 299 + * 199 + * 1499 + + * Description + * On a stick! + * On a stick! + * If we took the bones out... + + The above is a single list of 12 items. The blank lines are not + significant to the markup. We'd have to explicitly specify how + many columns or rows to use, which isn't a good idea. + +2. Beni Cherniavsky suggested a field list alternative. It could look + like this:: + + .. field-list-table:: + :headrows: 1 + + - :treat: Treat + :quantity: Quantity + :descr: Description + + - :treat: Albatross! + :quantity: 299 + :descr: On a stick! + + - :treat: Crunchy Frog! + :quantity: 1499 + :descr: If we took the bones out, it wouldn't be + crunchy, now would it? + + Column order is determined from the order of fields in the first + row. Field order in all other rows is ignored. As a side-effect, + this allows trivial re-arrangement of columns. By using named + fields, it becomes possible to omit fields in some rows without + losing track of things, which is important for spans. + +3. An alternative to two-level bullet lists would be to use enumerated + lists for the table cells:: + + .. list-table:: + + * 1. Treat + 2. Quantity + 3. Description + * 1. Albatross! + 2. 299 + 3. On a stick! + * 1. Crunchy Frog! + 2. 1499 + 3. If we took the bones out, it wouldn't be crunchy, + now would it? + + That provides better correspondence between cells in the same + column than does bullet-list syntax, but not as good as field list + syntax. I think that were only field-list-tables available, a lot + of users would use the equivalent degenerate case:: + + .. field-list-table:: + - :1: Treat + :2: Quantity + :3: Description + ... + +4. Another natural variant is to allow a description list with field + lists as descriptions:: + + .. list-table:: + :headrows: 1 + + Treat + :quantity: Quantity + :descr: Description + Albatross! + :quantity: 299 + :descr: On a stick! + Crunchy Frog! + :quantity: 1499 + :descr: If we took the bones out, it wouldn't be + crunchy, now would it? + + This would make the whole first column a header column ("stub"). + It's limited to a single column and a single paragraph fitting on + one source line. Also it wouldn't allow for empty cells or row + spans in the first column. But these are limitations that we could + live with, like those of simple tables. + +The List-driven table feature could be done in many ways. Each user +will have their preferred usage. Perhaps a single "list-table" +directive could handle them all, depending on which options and +content are present. + +Issues: + +* How to indicate that there's 1 header row? Perhaps two lists? :: + + .. list-table:: + + + - Treat + - Quantity + - Description + + * - Albatross! + - 299 + - On a stick! + + This is probably too subtle though. Better would be a directive + option, like ``:headrows: 1``. An early suggestion for the header + row(s) was to use a directive option:: + + .. field-list-table:: + :header: + - :treat: Treat + :quantity: Quantity + :descr: Description + - :treat: Albatross! + :quantity: 299 + :descr: On a stick! + + But the table data is at two levels and looks inconsistent. + + In general, we cannot extract the header row from field lists' field + names because field names cannot contain everything one might put in + a table cell. A separate header row also allows shorter field names + and doesn't force one to rewrite the whole table when the header + text changes. But for simpler cases, we can offer a ":header: + fields" option, which does extract header cells from field names:: + + .. field-list-table:: + :header: fields + + - :Treat: Albatross! + :Quantity: 299 + :Description: On a stick! + +* How to indicate the column widths? A directive option? :: + + .. list-table:: + :widths: 15 10 35 + + Automatic defaults from the text used? + +* How to handle row and/or column spans? + + In a field list, column-spans can be indicated by specifying the + first and last fields, separated by space-dash-space or ellipsis:: + + - :foo - baz: quuux + - :foo ... baz: quuux + + Commas were proposed for column spans:: + + - :foo, bar: quux + + But non-adjacent columns become problematic. Should we report an + error, or duplicate the value into each span of adjacent columns (as + was suggested)? The latter suggestion is appealing but may be too + clever. Best perhaps to simply specify the two ends. + + It was suggested that comma syntax should be allowed, too, in order + to allow the user to avoid trouble when changing the column order. + But changing the column order of a table with spans is not trivial; + we shouldn't make it easier to mess up. + + One possible syntax for row-spans is to simply treat any row where a + field is missing as a row-span from the last row where it appeared. + Leaving a field empty would still be possible by writing a field + with empty content. But this is too implicit. + + Another way would be to require an explicit continuation marker + (``...``/``-"-``/``"``?) in all but the first row of a spanned + field. Empty comments could work (".."). If implemented, the same + marker could also be supported in simple tables, which lack + row-spanning abilities. + + Explicit markup like ":rowspan:" and ":colspan:" was also suggested. + + Sometimes in a table, the first header row contains spans. It may + be necessary to provide a way to specify the column field names + independently of data rows. A directive option would do it. + +* We could specify "column-wise" or "row-wise" ordering, with the same + markup structure. For example, with definition data:: + + .. list-table:: + :column-wise: + + Treat + - Albatross! + - Crunchy Frog! + Quantity + - 299 + - 1499 + Description + - On a stick! + - If we took the bones out, it wouldn't be + crunchy, now would it? + +* A syntax for _`stubs in grid tables` is easy to imagine:: + + +------------------------++------------+----------+ + | Header row, column 1 || Header 2 | Header 3 | + +========================++============+==========+ + | body row 1, column 1 || column 2 | column 3 | + +------------------------++------------+----------+ + + Or this idea from Nick Moffitt:: + + +-----+---+---+ + | XOR # T | F | + +=====+===+===+ + | T # F | T | + +-----+---+---+ + | F # T | F | + +-----+---+---+ + + +Auto-Enumerated Lists +===================== + +Implemented 2005-03-24: combination of variation 1 & 2. + +The advantage of auto-numbered enumerated lists would be similar to +that of auto-numbered footnotes: lists could be written and rearranged +without having to manually renumber them. The disadvantages are also +the same: input and output wouldn't match exactly; the markup may be +ugly or confusing (depending on which alternative is chosen). + +1. Use the "#" symbol. Example:: + + #. Item 1. + #. Item 2. + #. Item 3. + + Advantages: simple, explicit. Disadvantage: enumeration sequence + cannot be specified (limited to arabic numerals); ugly. + +2. As a variation on #1, first initialize the enumeration sequence? + For example:: + + a) Item a. + #) Item b. + #) Item c. + + Advantages: simple, explicit, any enumeration sequence possible. + Disadvantages: ugly; perhaps confusing with mixed concrete/abstract + enumerators. + +3. Alternative suggested by Fred Bremmer, from experience with MoinMoin:: + + 1. Item 1. + 1. Item 2. + 1. Item 3. + + Advantages: enumeration sequence is explicit (could be multiple + "a." or "(I)" tokens). Disadvantages: perhaps confusing; otherwise + erroneous input (e.g., a duplicate item "1.") would pass silently, + either causing a problem later in the list (if no blank lines + between items) or creating two lists (with blanks). + + Take this input for example:: + + 1. Item 1. + + 1. Unintentional duplicate of item 1. + + 2. Item 2. + + Currently the parser will produce two list, "1" and "1,2" (no + warnings, because of the presence of blank lines). Using Fred's + notation, the current behavior is "1,1,2 -> 1 1,2" (without blank + lines between items, it would be "1,1,2 -> 1 [WARNING] 1,2"). What + should the behavior be with auto-numbering? + + Fred has produced a patch__, whose initial behavior is as follows:: + + 1,1,1 -> 1,2,3 + 1,2,2 -> 1,2,3 + 3,3,3 -> 3,4,5 + 1,2,2,3 -> 1,2,3 [WARNING] 3 + 1,1,2 -> 1,2 [WARNING] 2 + + (After the "[WARNING]", the "3" would begin a new list.) + + I have mixed feelings about adding this functionality to the spec & + parser. It would certainly be useful to some users (myself + included; I often have to renumber lists). Perhaps it's too + clever, asking the parser to guess too much. What if you *do* want + three one-item lists in a row, each beginning with "1."? You'd + have to use empty comments to force breaks. Also, I question + whether "1,2,2 -> 1,2,3" is optimal behavior. + + In response, Fred came up with "a stricter and more explicit rule + [which] would be to only auto-number silently if *all* the + enumerators of a list were identical". In that case:: + + 1,1,1 -> 1,2,3 + 1,2,2 -> 1,2 [WARNING] 2 + 3,3,3 -> 3,4,5 + 1,2,2,3 -> 1,2 [WARNING] 2,3 + 1,1,2 -> 1,2 [WARNING] 2 + + Should any start-value be allowed ("3,3,3"), or should + auto-numbered lists be limited to begin with ordinal-1 ("1", "A", + "a", "I", or "i")? + + __ http://sourceforge.net/tracker/index.php?func=detail&aid=548802 + &group_id=38414&atid=422032 + +4. Alternative proposed by Tony Ibbs:: + + #1. First item. + #3. Aha - I edited this in later. + #2. Second item. + + The initial proposal required unique enumerators within a list, but + this limits the convenience of a feature of already limited + applicability and convenience. Not a useful requirement; dropped. + + Instead, simply prepend a "#" to a standard list enumerator to + indicate auto-enumeration. The numbers (or letters) of the + enumerators themselves are not significant, except: + + - as a sequence indicator (arabic, roman, alphabetic; upper/lower), + + - and perhaps as a start value (first list item). + + Advantages: explicit, any enumeration sequence possible. + Disadvantages: a bit ugly. + + +----------------- + Not Implemented +----------------- + +Reworking Footnotes +=================== + +As a further wrinkle (see `Reworking Explicit Markup (Round 1)`_ +above), in the wee hours of 2002-02-28 I posted several ideas for +changes to footnote syntax: + + - Change footnote syntax from ``.. [1]`` to ``_[1]``? ... + - Differentiate (with new DTD elements) author-date "citations" + (``[GVR2002]``) from numbered footnotes? ... + - Render footnote references as superscripts without "[]"? ... + +These ideas are all related, and suggest changes in the +reStructuredText syntax as well as the docutils tree model. + +The footnote has been used for both true footnotes (asides expanding +on points or defining terms) and for citations (references to external +works). Rather than dealing with one amalgam construct, we could +separate the current footnote concept into strict footnotes and +citations. Citations could be interpreted and treated differently +from footnotes. Footnotes would be limited to numerical labels: +manual ("1") and auto-numbered (anonymous "#", named "#label"). + +The footnote is the only explicit markup construct (starts with ".. ") +that directly translates to a visible body element. I've always been +a little bit uncomfortable with the ".. " marker for footnotes because +of this; ".. " has a connotation of "special", but footnotes aren't +especially "special". Printed texts often put footnotes at the bottom +of the page where the reference occurs (thus "foot note"). Some HTML +designs would leave footnotes to be rendered the same positions where +they're defined. Other online and printed designs will gather +footnotes into a section near the end of the document, converting them +to "endnotes" (perhaps using a directive in our case); but this +"special processing" is not an intrinsic property of the footnote +itself, but a decision made by the document author or processing +system. + +Citations are almost invariably collected in a section at the end of a +document or section. Citations "disappear" from where they are +defined and are magically reinserted at some well-defined point. +There's more of a connection to the "special" connotation of the ".. " +syntax. The point at which the list of citations is inserted could be +defined manually by a directive (e.g., ".. citations::"), and/or have +default behavior (e.g., a section automatically inserted at the end of +the document) that might be influenced by options to the Writer. + +Syntax proposals: + ++ Footnotes: + + - Current syntax:: + + .. [1] Footnote 1 + .. [#] Auto-numbered footnote. + .. [#label] Auto-labeled footnote. + + - The syntax proposed in the original 2002-02-28 Doc-SIG post: + remove the ".. ", prefix a "_":: + + _[1] Footnote 1 + _[#] Auto-numbered footnote. + _[#label] Auto-labeled footnote. + + The leading underscore syntax (earlier dropped because + ``.. _[1]:`` was too verbose) is a useful reminder that footnotes + are hyperlink targets. + + - Minimal syntax: remove the ".. [" and "]", prefix a "_", and + suffix a ".":: + + _1. Footnote 1. + _#. Auto-numbered footnote. + _#label. Auto-labeled footnote. + + ``_1.``, ``_#.``, and ``_#label.`` are markers, + like list markers. + + Footnotes could be rendered something like this in HTML + + | 1. This is a footnote. The brackets could be dropped + | from the label, and a vertical bar could set them + | off from the rest of the document in the HTML. + + Two-way hyperlinks on the footnote marker ("1." above) would also + help to differentiate footnotes from enumerated lists. + + If converted to endnotes (by a directive/transform), a horizontal + half-line might be used instead. Page-oriented output formats + would typically use the horizontal line for true footnotes. + ++ Footnote references: + + - Current syntax:: + + [1]_, [#]_, [#label]_ + + - Minimal syntax to match the minimal footnote syntax above:: + + 1_, #_, #label_ + + As a consequence, pure-numeric hyperlink references would not be + possible; they'd be interpreted as footnote references. + ++ Citation references: no change is proposed from the current footnote + reference syntax:: + + [GVR2001]_ + ++ Citations: + + - Current syntax (footnote syntax):: + + .. [GVR2001] Python Documentation; van Rossum, Drake, et al.; + http://www.python.org/doc/ + + - Possible new syntax:: + + _[GVR2001] Python Documentation; van Rossum, Drake, et al.; + http://www.python.org/doc/ + + _[DJG2002] + Docutils: Python Documentation Utilities project; Goodger + et al.; http://docutils.sourceforge.net/ + + Without the ".. " marker, subsequent lines would either have to + align as in one of the above, or we'd have to allow loose + alignment (I'd rather not):: + + _[GVR2001] Python Documentation; van Rossum, Drake, et al.; + http://www.python.org/doc/ + +I proposed adopting the "minimal" syntax for footnotes and footnote +references, and adding citations and citation references to +reStructuredText's repertoire. The current footnote syntax for +citations is better than the alternatives given. + +From a reply by Tony Ibbs on 2002-03-01: + + However, I think easier with examples, so let's create one:: + + Fans of Terry Pratchett are perhaps more likely to use + footnotes [1]_ in their own writings than other people + [2]_. Of course, in *general*, one only sees footnotes + in academic or technical writing - it's use in fiction + and letter writing is not normally considered good + style [4]_, particularly in emails (not a medium that + lends itself to footnotes). + + .. [1] That is, little bits of referenced text at the + bottom of the page. + .. [2] Because Terry himself does, of course [3]_. + .. [3] Although he has the distinction of being + *funny* when he does it, and his fans don't always + achieve that aim. + .. [4] Presumably because it detracts from linear + reading of the text - this is, of course, the point. + + and look at it with the second syntax proposal:: + + Fans of Terry Pratchett are perhaps more likely to use + footnotes [1]_ in their own writings than other people + [2]_. Of course, in *general*, one only sees footnotes + in academic or technical writing - it's use in fiction + and letter writing is not normally considered good + style [4]_, particularly in emails (not a medium that + lends itself to footnotes). + + _[1] That is, little bits of referenced text at the + bottom of the page. + _[2] Because Terry himself does, of course [3]_. + _[3] Although he has the distinction of being + *funny* when he does it, and his fans don't always + achieve that aim. + _[4] Presumably because it detracts from linear + reading of the text - this is, of course, the point. + + (I note here that if I have gotten the indentation of the + footnotes themselves correct, this is clearly not as nice. And if + the indentation should be to the left margin instead, I like that + even less). + + and the third (new) proposal:: + + Fans of Terry Pratchett are perhaps more likely to use + footnotes 1_ in their own writings than other people + 2_. Of course, in *general*, one only sees footnotes + in academic or technical writing - it's use in fiction + and letter writing is not normally considered good + style 4_, particularly in emails (not a medium that + lends itself to footnotes). + + _1. That is, little bits of referenced text at the + bottom of the page. + _2. Because Terry himself does, of course 3_. + _3. Although he has the distinction of being + *funny* when he does it, and his fans don't always + achieve that aim. + _4. Presumably because it detracts from linear + reading of the text - this is, of course, the point. + + I think I don't, in practice, mind the targets too much (the use + of a dot after the number helps a lot here), but I do have a + problem with the body text, in that I don't naturally separate out + the footnotes as different than the rest of the text - instead I + keep wondering why there are numbers interspered in the text. The + use of brackets around the numbers ([ and ]) made me somehow parse + the footnote references as "odd" - i.e., not part of the body text + - and thus both easier to skip, and also (paradoxically) easier to + pick out so that I could follow them. + + Thus, for the moment (and as always susceptable to argument), I'd + say -1 on the new form of footnote reference (i.e., I much prefer + the existing ``[1]_`` over the proposed ``1_``), and ambivalent + over the proposed target change. + + That leaves David's problem of wanting to distinguish footnotes + and citations - and the only thing I can propose there is that + footnotes are numeric or # and citations are not (which, as a + human being, I can probably cope with!). + +From a reply by Paul Moore on 2002-03-01: + + I think the current footnote syntax ``[1]_`` is *exactly* the + right balance of distinctness vs unobtrusiveness. I very + definitely don't think this should change. + + On the target change, it doesn't matter much to me. + +From a further reply by Tony Ibbs on 2002-03-01, referring to the +"[1]" form and actual usage in email: + + Clearly this is a form people are used to, and thus we should + consider it strongly (in the same way that the usage of ``*..*`` + to mean emphasis was taken partly from email practise). + + Equally clearly, there is something "magical" for people in the + use of a similar form (i.e., ``[1]``) for both footnote reference + and footnote target - it seems natural to keep them similar. + + ... + + I think that this established plaintext usage leads me to strongly + believe we should retain square brackets at both ends of a + footnote. The markup of the reference end (a single trailing + underscore) seems about as minimal as we can get away with. The + markup of the target end depends on how one envisages the thing - + if ".." means "I am a target" (as I tend to see it), then that's + good, but one can also argue that the "_[1]" syntax has a neat + symmetry with the footnote reference itself, if one wishes (in + which case ".." presumably means "hidden/special" as David seems + to think, which is why one needs a ".." *and* a leading underline + for hyperlink targets. + +Given the persuading arguments voiced, we'll leave footnote & footnote +reference syntax alone. Except that these discussions gave rise to +the "auto-symbol footnote" concept, which has been added. Citations +and citation references have also been added. + + +Syntax for Questions & Answers +============================== + +Implement as a generic two-column marked list? As a standalone +(non-directive) construct? (Is the markup ambiguous?) Add support to +parts.contents? + +New elements would be required. Perhaps:: + + <!ELEMENT question_list (question_list_item+)> + <!ATTLIST question_list + numbering (none | local | global) + #IMPLIED + start NUMBER #IMPLIED> + <!ELEMENT question_list_item (question, answer*)> + <!ELEMENT question %text.model;> + <!ELEMENT answer (%body.elements;)+> + +Originally I thought of implementing a Q&A list with special syntax:: + + Q: What am I? + + A: You are a question-and-answer + list. + + Q: What are you? + + A: I am the omniscient "we". + +Where each "Q" and "A" could also be numbered (e.g., "Q1"). However, +a simple enumerated or bulleted list will do just fine for syntax. A +directive could treat the list specially; e.g. the first paragraph +could be treated as a question, the remainder as the answer (multiple +answers could be represented by nested lists). Without special +syntax, this directive becomes low priority. + +As described in the FAQ__, no special syntax or directive is needed +for this application. + +__ http://docutils.sf.net/FAQ.html + #how-can-i-mark-up-a-faq-or-other-list-of-questions-answers + + +-------- + Tabled +-------- + +Reworking Explicit Markup (Round 2) +=================================== + +See `Reworking Explicit Markup (Round 1)`_ for an earlier discussion. + +In April 2004, a new thread becan on docutils-develop: `Inconsistency +in RST markup`__. Several arguments were made; the first argument +begat later arguments. Below, the arguments are paraphrased "in +quotes", with responses. + +__ http://thread.gmane.org/gmane.text.docutils.devel/1386 + +1. References and targets take this form:: + + targetname_ + + .. _targetname: stuff + + But footnotes, "which generate links just like targets do", are + written as:: + + [1]_ + + .. [1] stuff + + "Footnotes should be written as":: + + [1]_ + + .. _[1]: stuff + + But they're not the same type of animal. That's not a "footnote + target", it's a *footnote*. Being a target is not a footnote's + primary purpose (an arguable point). It just happens to grow a + target automatically, for convenience. Just as a section title:: + + Title + ===== + + isn't a "title target", it's a *title*, which happens to grow a + target automatically. The consistency is there, it's just deeper + than at first glance. + + Also, ".. [1]" was chosen for footnote syntax because it closely + resembles one form of actual footnote rendering. ".. _[1]:" is too + verbose; excessive punctuation is required to get the job done. + + For more of the reasoning behind the syntax, see `Problems With + StructuredText (Hyperlinks) <problems.html#hyperlinks>`__ and + `Reworking Footnotes`_. + +2. "I expect directives to also look like ``.. this:`` [one colon] + because that also closely parallels the link and footnote target + markup." + + There are good reasons for the two-colon syntax: + + Two colons are used after the directive type for these reasons: + + - Two colons are distinctive, and unlikely to be used in common + text. + + - Two colons avoids clashes with common comment text like:: + + .. Danger: modify at your own risk! + + - If an implementation of reStructuredText does not recognize a + directive (i.e., the directive-handler is not installed), a + level-3 (error) system message is generated, and the entire + directive block (including the directive itself) will be + included as a literal block. Thus "::" is a natural choice. + + -- `restructuredtext.html#directives + <../../ref/rst/restructuredtext.html#directives>`__ + + The last reason is not particularly compelling; it's more of a + convenient coincidence or mnemonic. + +3. "Comments always seemed too easy. I almost never write comments. + I'd have no problem writing '.. comment:' in front of my comments. + In fact, it would probably be more readable, as comments *should* + be set off strongly, because they are very different from normal + text." + + Many people do use comments though, and some applications of + reStructuredText require it. For example, all reStructuredText + PEPs (and this document!) have an Emacs stanza at the bottom, in a + comment. Having to write ".. comment::" would be very obtrusive. + + Comments *should* be dirt-easy to do. It should be easy to + "comment out" a block of text. Comments in programming languages + and other markup languages are invariably easy. + + Any author is welcome to preface their comments with "Comment:" or + "Do Not Print" or "Note to Editor" or anything they like. A + "comment" directive could easily be implemented. It might be + confused with admonition directives, like "note" and "caution" + though. In unrelated (and unpublished and unfinished) work, adding + a "comment" directive as a true document element was considered:: + + If structure is necessary, we could use a "comment" directive + (to avoid nonsensical DTD changes, the "comment" directive + could produce an untitled topic element). + +4. "One of the goals of reStructuredText is to be *readable* by people + who don't know it. This construction violates that: it is not at + all obvious to the uninitiated that text marked by '..' is a + comment. On the other hand, '.. comment:' would be totally + transparent." + + Totally transparent, perhaps, but also very obtrusive. Another of + `reStructuredText's goals`_ is to be unobtrusive, and + ".. comment::" would violate that. The goals of reStructuredText + are many, and they conflict. Determining the right set of goals + and finding solutions that best fit is done on a case-by-case + basis. + + Even readability is has two aspects. Being readable without any + prior knowledge is one. Being as easily read in raw form as in + processed form is the other. ".." may not contribute to the former + aspect, but ".. comment::" would certainly detract from the latter. + + .. _author's note: + .. _reStructuredText's goals: ../../ref/rst/introduction.html#goals + +5. "Recently I sent someone an rst document, and they got confused; I + had to explain to them that '..' marks comments, *unless* it's a + directive, etc..." + + The explanation of directives *is* roundabout, defining comments in + terms of not being other things. That's definitely a wart. + +6. "Under the current system, a mistyped directive (with ':' instead + of '::') will be silently ignored. This is an error that could + easily go unnoticed." + + A parser option/setting like "--comments-on-stderr" would help. + +7. "I'd prefer to see double-dot-space / command / double-colon as the + standard Docutils markup-marker. It's unusual enough to avoid + being accidently used. Everything that starts with a double-dot + should end with a double-colon." + + That would increase the punctuation verbosity of some constructs + considerably. + +8. Edward Loper proposed the following plan for backwards + compatibility: + + 1. ".. foo" will generate a deprecation warning to stderr, and + nothing in the output (no system messages). + 2. ".. foo: bar" will be treated as a directive foo. If there + is no foo directive, then do the normal error output. + 3. ".. foo:: bar" will generate a deprecation warning to + stderr, and be treated as a directive. Or leave it valid? + + So some existing documents might start printing deprecation + warnings, but the only existing documents that would *break* + would be ones that say something like:: + + .. warning: this should be a comment + + instead of:: + + .. warning:: this should be a comment + + Here, we're trading fairly common a silent error (directive + falsely treated as a comment) for a fairly uncommon explicitly + flagged error (comment falsely treated as directive). To make + things even easier, we could add a sentence to the + unknown-directive error. Something like "If you intended to + create a comment, please use '.. comment:' instead". + +On one hand, I understand and sympathize with the points raised. On +the other hand, I think the current syntax strikes the right balance +(but I acknowledge a possible lack of objectivity). On the gripping +hand, the comment and directive syntax has become well established, so +even if it's a wart, it may be a wart we have to live with. + +Making any of these changes would cause a lot of breakage or at least +deprecation warnings. I'm not sure the benefit is worth the cost. + +For now, we'll treat this as an unresolved legacy issue. + + +------- + To Do +------- + +Nested Inline Markup +==================== + +These are collected notes on a long-discussed issue. The original +mailing list messages should be referred to for details. + +* In a 2001-10-31 discussion I wrote: + + Try, for example, `Ed Loper's 2001-03-21 post`_, which details + some rules for nested inline markup. I think the complexity is + prohibitive for the marginal benefit. (And if you can understand + that tree without going mad, you're a better man than I. ;-) + + Inline markup is already fragile. Allowing nested inline markup + would only be asking for trouble IMHO. If it proves absolutely + necessary, it can be added later. The rules for what can appear + inside what must be well thought out first though. + + .. _Ed Loper's 2001-03-21 post: + http://mail.python.org/pipermail/doc-sig/2001-March/001487.html + + -- http://mail.python.org/pipermail/doc-sig/2001-October/002354.html + +* In a 2001-11-09 Doc-SIG post, I wrote: + + The problem is that in the + what-you-see-is-more-or-less-what-you-get markup language that + is reStructuredText, the symbols used for inline markup ("*", + "**", "`", "``", etc.) may preclude nesting. + + I've rethought this position. Nested markup is not precluded, just + tricky. People and software parse "double and 'single' quotes" all + the time. Continuing, + + I've thought over how we might implement nested inline + markup. The first algorithm ("first identify the outer inline + markup as we do now, then recursively scan for nested inline + markup") won't work; counterexamples were given in my `last post + <http://mail.python.org/pipermail/doc-sig/2001-November/002363.html>`__. + + The second algorithm makes my head hurt:: + + while 1: + scan for start-string + if found: + push on stack + scan for start or end string + if new start string found: + recurse + elif matching end string found: + pop stack + elif non-matching end string found: + if its a markup error: + generate warning + elif the initial start-string was misinterpreted: + # e.g. in this case: ***strong** in emphasis* + restart with the other interpretation + # but it might be several layers back ... + ... + + This is similar to how the parser does section title + recognition, but sections are much more regular and + deterministic. + + Bottom line is, I don't think the benefits are worth the effort, + even if it is possible. I'm not going to try to write the code, + at least not now. If somebody codes up a consistent, working, + general solution, I'll be happy to consider it. + + -- http://mail.python.org/pipermail/doc-sig/2001-November/002388.html + +* In a `2003-05-06 Docutils-Users post`__ Paul Tremblay proposed a new + syntax to allow for easier nesting. It eventually evolved into + this:: + + :role:[inline text] + + The duplication with the existing interpreted text syntax is + problematic though. + + __ http://article.gmane.org/gmane.text.docutils.user/317 + +* Could the parser be extended to parse nested interpreted text? :: + + :emphasis:`Some emphasized text with :strong:`some more + emphasized text` in it and **perhaps** :reference:`a link`` + +* In a `2003-06-18 Docutils-Develop post`__, Mark Nodine reported on + his implementation of a form of nested inline markup in his + Perl-based parser (unpublished). He brought up some interesting + ideas. The implementation was flawed, however, by the change in + semantics required for backslash escapes. + + __ http://article.gmane.org/gmane.text.docutils.devel/795 + +* Docutils-develop threads between David Abrahams, David Goodger, and + Mark Nodine (beginning 2004-01-16__ and 2004-01-19__) hashed out + many of the details of a potentially successful implementation, as + described below. David Abrahams checked in code to the "nesting" + branch of CVS, awaiting thorough review. + + __ http://thread.gmane.org/gmane.text.docutils.devel/1102 + __ http://thread.gmane.org/gmane.text.docutils.devel/1125 + +It may be possible to accomplish nested inline markup in general with +a more powerful inline markup parser. There may be some issues, but +I'm not averse to the idea of nested inline markup in general. I just +don't have the time or inclination to write a new parser now. Of +course, a good patch would be welcome! + +I envisage something like this. Explicit-role interpreted text must +be nestable. Prefix-based is probably preferred, since suffix-based +will look like inline literals:: + + ``text`:role1:`:role2: + +But it can be disambiguated, so it ought to be left up to the author:: + + `\ `text`:role1:`:role2: + +In addition, other forms of inline markup may be nested if +unambiguous:: + + *emphasized ``literal`` and |substitution ref| and link_* + +IOW, the parser ought to be as permissive as possible. + + +Index Entries & Indexes +======================= + +Were I writing a book with an index, I guess I'd need two +different kinds of index targets: inline/implicit and +out-of-line/explicit. For example:: + + In this `paragraph`:index:, several words are being + `marked`:index: inline as implicit `index`:index: + entries. + + .. index:: markup + .. index:: syntax + + The explicit index directives above would refer to + this paragraph. It might also make sense to allow multiple + entries in an ``index`` directive: + + .. index:: + markup + syntax + +The words "paragraph", "marked", and "index" would become index +entries pointing at the words in the first paragraph. The index +entry words appear verbatim in the text. (Don't worry about the +ugly ":index:" part; if indexing is the only/main application of +interpreted text in your documents, it can be implicit and +omitted.) The two directives provide manual indexing, where the +index entry words ("markup" and "syntax") do not appear in the +main text. We could combine the two directives into one:: + + .. index:: markup; syntax + +Semicolons instead of commas because commas could *be* part of the +index target, like:: + + .. index:: van Rossum, Guido + +Another reason for index directives is because other inline markup +wouldn't be possible within inline index targets. + +Sometimes index entries have multiple levels. Given:: + + .. index:: statement syntax: expression statements + +In a hypothetical index, combined with other entries, it might +look like this:: + + statement syntax + expression statements ..... 56 + assignment ................ 57 + simple statements ......... 58 + compound statements ....... 60 + +Inline multi-level index targets could be done too. Perhaps +something like:: + + When dealing with `expression statements <statement syntax:>`, + we must remember ... + +The opposite sense could also be possible:: + + When dealing with `index entries <:multi-level>`, there are + many permutations to consider. + +Also "see / see also" index entries. + +Given:: + + Here's a paragraph. + + .. index:: paragraph + +(The "index" directive above actually targets the *preceding* +object.) The directive should produce something like this XML:: + + <paragraph> + <index_entry text="paragraph"/> + Here's a paragraph. + </paragraph> + +This kind of content model would also allow true inline +index-entries:: + + Here's a `paragraph`:index:. + +If the "index" role were the default for the application, it could be +dropped:: + + Here's a `paragraph`. + +Both of these would result in this XML:: + + <paragraph> + Here's a <index_entry>paragraph</index_entry>. + </paragraph> + + +from 2002-06-24 docutils-develop posts +-------------------------------------- + + If all of your index entries will appear verbatim in the text, + this should be sufficient. If not (e.g., if you want "Van Rossum, + Guido" in the index but "Guido van Rossum" in the text), we'll + have to figure out a supplemental mechanism, perhaps using + substitutions. + +I've thought a bit more on this, and I came up with two possibilities: + +1. Using interpreted text, embed the index entry text within the + interpreted text:: + + ... by `Guido van Rossum [Van Rossum, Guido]` ... + + The problem with this is obvious: the text becomes cluttered and + hard to read. The processed output would drop the text in + brackets, which goes against the spirit of interpreted text. + +2. Use substitutions:: + + ... by |Guido van Rossum| ... + + .. |Guido van Rossum| index:: Van Rossum, Guido + + A problem with this is that each substitution definition must have + a unique name. A subsequent ``.. |Guido van Rossum| index:: BDFL`` + would be illegal. Some kind of anonymous substitution definition + mechanism would be required, but I think that's going too far. + +Both of these alternatives are flawed. Any other ideas? + + +------------------- + ... Or Not To Do? +------------------- + +This is the realm of the possible but questionably probable. These +ideas are kept here as a record of what has been proposed, for +posterity and in case any of them prove to be useful. + + +Compound Enumerated Lists +========================= + +Allow for compound enumerators, such as "1.1." or "1.a." or "1(a)", to +allow for nested enumerated lists without indentation? + + +Indented Lists +============== + +Allow for variant styles by interpreting indented lists as if they +weren't indented? For example, currently the list below will be +parsed as a list within a block quote:: + + paragraph + + * list item 1 + * list item 2 + +But a lot of people seem to write that way, and HTML browsers make it +look as if that's the way it should be. The parser could check the +contents of block quotes, and if they contain only a single list, +remove the block quote wrapper. There would be two problems: + +1. What if we actually *do* want a list inside a block quote? + +2. What if such a list comes immediately after an indented construct, + such as a literal block? + +Both could be solved using empty comments (problem 2 already exists +for a block quote after a literal block). But that's a hack. + +Perhaps a runtime setting, allowing or disabling this convenience, +would be appropriate. But that raises issues too: + + User A, who writes lists indented (and their config file is set up + to allow it), sends a file to user B, who doesn't (and their + config file disables indented lists). The result of processing by + the two users will be different. + +It may seem minor, but it adds ambiguity to the parser, which is bad. + +See the `Doc-SIG discussion starting 2001-04-18`__ with Ed Loper's +"Structuring: a summary; and an attempt at EBNF", item 4 (and +follow-ups, here__ and here__). Also `docutils-users, 2003-02-17`__ +and `beginning 2003-08-04`__. + +__ http://mail.python.org/pipermail/doc-sig/2001-April/001776.html +__ http://mail.python.org/pipermail/doc-sig/2001-April/001789.html +__ http://mail.python.org/pipermail/doc-sig/2001-April/001793.html +__ http://sourceforge.net/mailarchive/message.php?msg_id=3838913 +__ http://sf.net/mailarchive/forum.php?thread_id=2957175&forum_id=11444 + + +Sloppy Indentation of List Items +================================ + +Perhaps the indentation shouldn't be so strict. Currently, this is +required:: + + 1. First line, + second line. + +Anything wrong with this? :: + + 1. First line, + second line. + +Problem? :: + + 1. First para. + + Block quote. (no good: requires some indent relative to first + para) + + Second Para. + + 2. Have to carefully define where the literal block ends:: + + Literal block + + Literal block? + +Hmm... Non-strict indentation isn't such a good idea. + + +Lazy Indentation of List Items +============================== + +Another approach: Going back to the first draft of reStructuredText +(2000-11-27 post to Doc-SIG):: + + - This is the fourth item of the main list (no blank line above). + The second line of this item is not indented relative to the + bullet, which precludes it from having a second paragraph. + +Change that to *require* a blank line above and below, to reduce +ambiguity. This "loosening" may be added later, once the parser's +been nailed down. However, a serious drawback of this approach is to +limit the content of each list item to a single paragraph. + + +David's Idea for Lazy Indentation +--------------------------------- + +Consider a paragraph in a word processor. It is a single logical line +of text which ends with a newline, soft-wrapped arbitrarily at the +right edge of the page or screen. We can think of a plaintext +paragraph in the same way, as a single logical line of text, ending +with two newlines (a blank line) instead of one, and which may contain +arbitrary line breaks (newlines) where it was accidentally +hard-wrapped by an application. We can compensate for the accidental +hard-wrapping by "unwrapping" every unindented second and subsequent +line. The indentation of the first line of a paragraph or list item +would determine the indentation for the entire element. Blank lines +would be required between list items when using lazy indentation. + +The following example shows the lazy indentation of multiple body +elements:: + + - This is the first paragraph + of the first list item. + + Here is the second paragraph + of the first list item. + + - This is the first paragraph + of the second list item. + + Here is the second paragraph + of the second list item. + +A more complex example shows the limitations of lazy indentation:: + + - This is the first paragraph + of the first list item. + + Next is a definition list item: + + Term + Definition. The indentation of the term is + required, as is the indentation of the definition's + first line. + + When the definition extends to more than + one line, lazy indentation may occur. (This is the second + paragraph of the definition.) + + - This is the first paragraph + of the second list item. + + - Here is the first paragraph of + the first item of a nested list. + + So this paragraph would be outside of the nested list, + but inside the second list item of the outer list. + + But this paragraph is not part of the list at all. + +And the ambiguity remains:: + + - Look at the hyphen at the beginning of the next line + - is it a second list item marker, or a dash in the text? + + Similarly, we may want to refer to numbers inside enumerated + lists: + + 1. How many socks in a pair? There are + 2. How many pants in a pair? Exactly + 1. Go figure. + +Literal blocks and block quotes would still require consistent +indentation for all their lines. For block quotes, we might be able +to get away with only requiring that the first line of each contained +element be indented. For example:: + + Here's a paragraph. + + This is a paragraph inside a block quote. + Second and subsequent lines need not be indented at all. + + - A bullet list inside + the block quote. + + Second paragraph of the + bullet list inside the block quote. + +Although feasible, this form of lazy indentation has problems. The +document structure and hierarchy is not obvious from the indentation, +making the source plaintext difficult to read. This will also make +keeping track of the indentation while writing difficult and +error-prone. However, these problems may be acceptable for Wikis and +email mode, where we may be able to rely on less complex structure +(few nested lists, for example). + + +Multiple Roles in Interpreted Text +================================== + +In reStructuredText, inline markup cannot be nested (yet; `see +above`__). This also applies to interpreted text. In order to +simultaneously combine multiple roles for a single piece of text, a +syntax extension would be necessary. Ideas: + +1. Initial idea:: + + `interpreted text`:role1,role2: + +2. Suggested by Jason Diamond:: + + `interpreted text`:role1:role2: + +If a document is so complex as to require nested inline markup, +perhaps another markup system should be considered. By design, +reStructuredText does not have the flexibility of XML. + +__ `Nested Inline Markup`_ + + +Parameterized Interpreted Text +============================== + +In some cases it may be expedient to pass parameters to interpreted +text, analogous to function calls. Ideas: + +1. Parameterize the interpreted text role itself (suggested by Jason + Diamond):: + + `interpreted text`:role1(foo=bar): + + Positional parameters could also be supported:: + + `CSS`:acronym(Cascading Style Sheets): is used for HTML, and + `CSS`:acronym(Content Scrambling System): is used for DVDs. + + Technical problem: current interpreted text syntax does not + recognize roles containing whitespace. Design problem: this smells + like programming language syntax, but reStructuredText is not a + programming language. + +2. Put the parameters inside the interpreted text:: + + `CSS (Cascading Style Sheets)`:acronym: is used for HTML, and + `CSS (Content Scrambling System)`:acronym: is used for DVDs. + + Although this could be defined on an individual basis (per role), + we ought to have a standard. Hyperlinks with embedded URIs already + use angle brackets; perhaps they could be used here too:: + + `CSS <Cascading Style Sheets>`:acronym: is used for HTML, and + `CSS <Content Scrambling System>`:acronym: is used for DVDs. + + Do angle brackets connote URLs too much for this to be acceptable? + How about the "tag" connotation -- does it save them or doom them? + +3. `Nested inline markup`_ could prove useful here:: + + `CSS :def:`Cascading Style Sheets``:acronym: is used for HTML, + and `CSS :def:`Content Scrambling System``:acronym: is used for + DVDs. + + Inline markup roles could even define the default roles of nested + inline markup, allowing this cleaner syntax:: + + `CSS `Cascading Style Sheets``:acronym: is used for HTML, and + `CSS `Content Scrambling System``:acronym: is used for DVDs. + +Does this push inline markup too far? Readability becomes a serious +issue. Substitutions may provide a better alternative (at the expense +of verbosity and duplication) by pulling the details out of the text +flow:: + + |CSS| is used for HTML, and |CSS-DVD| is used for DVDs. + + .. |CSS| acronym:: Cascading Style Sheets + .. |CSS-DVD| acronym:: Content Scrambling System + :text: CSS + +---------------------------------------------------------------------- + +This whole idea may be going beyond the scope of reStructuredText. +Documents requiring this functionality may be better off using XML or +another markup system. + +This argument comes up regularly when pushing the envelope of +reStructuredText syntax. I think it's a useful argument in that it +provides a check on creeping featurism. In many cases, the resulting +verbosity produces such unreadable plaintext that there's a natural +desire *not* to use it unless absolutely necessary. It's a matter of +finding the right balance. + + +Syntax for Interpreted Text Role Bindings +========================================= + +The following syntax (idea from Jeffrey C. Jacobs) could be used to +associate directives with roles:: + + .. :rewrite: class:: rewrite + + `She wore ribbons in her hair and it lay with streaks of + grey`:rewrite: + +The syntax is similar to that of substitution declarations, and the +directive/role association may resolve implementation issues. The +semantics, ramifications, and implementation details would need to be +worked out. + +The example above would implement the "rewrite" role as adding a +``class="rewrite"`` attribute to the interpreted text ("inline" +element). The stylesheet would then pick up on the "class" attribute +to do the actual formatting. + +The advantage of the new syntax would be flexibility. Uses other than +"class" may present themselves. The disadvantage is complexity: +having to implement new syntax for a relatively specialized operation, +and having new semantics in existing directives ("class::" would do +something different). + +The `"role" directive`__ has been implemented. + +__ ../../ref/rst/directives.html#role + + +Character Processing +==================== + +Several people have suggested adding some form of character processing +to reStructuredText: + +* Some sort of automated replacement of ASCII sequences: + + - ``--`` to em-dash (or ``--`` to en-dash, and ``---`` to em-dash). + - Convert quotes to curly quote entities. (Essentially impossible + for HTML? Unnecessary for TeX.) + - Various forms of ``:-)`` to smiley icons. + - ``"\ "`` to . Problem with line-wrapping though: it could + end up escaping the newline. + - Escaped newlines to <BR>. + - Escaped period or quote or dash as a disappearing catalyst to + allow character-level inline markup? + +* XML-style character entities, such as "©" for the copyright + symbol. + +Docutils has no need of a character entity subsystem. Supporting +Unicode and text encodings, character entities should be directly +represented in the text: a copyright symbol should be represented by +the copyright symbol character. If this is not possible in an +authoring environment, a pre-processing stage can be added, or a table +of substitution definitions can be devised. + +A "unicode" directive has been implemented to allow direct +specification of esoteric characters. In combination with the +substitution construct, "include" files defining common sets of +character entities can be defined and used. `A set of character +entity set definition files have been defined`__ (`tarball`__). +There's also `a description and instructions for use`__. + +__ http://docutils.sf.net/tmp/charents/ +__ http://docutils.sf.net/tmp/charents.tgz +__ http://docutils.sf.net/tmp/charents/README.html + +To allow for `character-level inline markup`_, a limited form of +character processing has been added to the spec and parser: escaped +whitespace characters are removed from the processed document. Any +further character processing will be of this functional type, rather +than of the character-encoding type. + +.. _character-level inline markup: + ../../ref/rst/restructuredtext.html#character-level-inline-markup + +* Directive idea:: + + .. text-replace:: "pattern" "replacement" + + - Support Unicode "U+XXXX" codes. + - Support regexps, perhaps with alternative "regexp-replace" + directive. + - Flags for regexps; ":flags:" option, or individuals. + - Specifically, should the default be case-sensistive or + -insensitive? + + +Page Or Line Breaks +=================== + +* Should ^L (or something else in reST) be defined to mean + force/suggest page breaks in whatever output we have? + + A "break" or "page-break" directive would be easy to add. A new + doctree element would be required though (perhaps "break"). The + final behavior would be up to the Writer. The directive argument + could be one of page/column/recto/verso for added flexibility. + + Currently ^L (Python's ``\f``) characters are treated as whitespace. + They're converted to single spaces, actually, as are vertical tabs + (^K, Python's ``\v``). It would be possible to recognize form feeds + as markup, but it requires some thought and discussion first. Are + there any downsides? Many editing environments do not allow the + insertion of control characters. Will it cause any harm? It would + be useful as a shorthand for the directive. + + It's common practice to use ^L before Emacs "Local Variables" + lists:: + + ^L + .. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: + + These are already present in many PEPs and Docutils project + documents. From the Emacs manual (info): + + A "local variables list" goes near the end of the file, in the + last page. (It is often best to put it on a page by itself.) + + It would be unfortunate if this construct caused a final blank page + to be generated (for those Writers that recognize the page breaks). + We'll have to add a transform that looks for a "break" plus zero or + more comments at the end of a document, and removes them. + + Probably a bad idea because there is no such thing as a page in a + generic document format. + +* Could the "break" concept above be extended to inline forms? + E.g. "^L" in the middle of a sentence could cause a line break. + Only recognize it at the end of a line (i.e., ``\f\n``)? + + Or is formfeed inappropriate? Perhaps vertical tab (``\v``), but + even that's a stretch. Can't use carriage returns, since they're + commonly used for line endings. + + Probably a bad idea as well because we do not want to use control + characters for well-readable and well-writable markup, and after all + we have the line block syntax for line breaks. + + +Superscript Markup +================== + +Add ``^superscript^`` inline markup? The only common non-markup uses +of "^" I can think of are as short hand for "superscript" itself and +for describing control characters ("^C to cancel"). The former +supports the proposed syntax, and it could be argued that the latter +ought to be literal text anyhow (e.g. "``^C`` to cancel"). + +However, superscripts are seldom needed, and new syntax would break +existing documents. When it's needed, the ``:superscript:`` +(``:sup:``) role can we used as well. + + +Code Execution +============== + +Add the following directives? + +- "exec": Execute Python code & insert the results. Call it + "python" to allow for other languages? + +- "system": Execute an ``os.system()`` call, and insert the results + (possibly as a literal block). Definitely dangerous! How to make + it safe? Perhaps such processing should be left outside of the + document, in the user's production system (a makefile or a script or + whatever). Or, the directive could be disabled by default and only + enabled with an explicit command-line option or config file setting. + Even then, an interactive prompt may be useful, such as: + + The file.txt document you are processing contains a "system" + directive requesting that the ``sudo rm -rf /`` command be + executed. Allow it to execute? (y/N) + +- "eval": Evaluate an expression & insert the text. At parse + time or at substitution time? Dangerous? Perhaps limit to canned + macros; see text.date_. + + .. _text.date: ../todo.html#text-date + +It's too dangerous (or too complicated in the case of "eval"). We do +not want to have such things in the core. + + +``encoding`` Directive +====================== + +Add an "encoding" directive to specify the character encoding of the +input data? Not a good idea for the following reasons: + +- When it sees the directive, the parser will already have read the + input data, and encoding determination will already have been done. + +- If a file with an "encoding" directive is edited and saved with + a different encoding, the directive may cause data corruption. + + +Support for Annotations +======================= + +Add an "annotation" role, as the equivalent of the HTML "title" +attribute? This is secondary information that may "pop up" when the +pointer hovers over the main text. A corresponding directive would be +required to associate annotations with the original text (by name, or +positionally as in anonymous targets?). + +There have not been many requests for such feature, though. Also, +cluttering WYSIWYG plaintext with annotations may not seem like a good +idea, and there is no "tool tip" in formats other than HTML. + + +``term`` Role +============= + +Add a "term" role for unfamiliar or specialized terminology? Probably +not; there is no real use case, and emphasis is enough for most cases. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/rst/problems.txt b/docs/dev/rst/problems.txt new file mode 100644 index 000000000..bc0101cbf --- /dev/null +++ b/docs/dev/rst/problems.txt @@ -0,0 +1,872 @@ +============================== + Problems With StructuredText +============================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +There are several problems, unresolved issues, and areas of +controversy within StructuredText_ (Classic and Next Generation). In +order to resolve all these issues, this analysis brings all of the +issues out into the open, enumerates all the alternatives, and +proposes solutions to be incorporated into the reStructuredText_ +specification. + + +.. contents:: + + +Formal Specification +==================== + +The description in the original StructuredText.py has been criticized +for being vague. For practical purposes, "the code *is* the spec." +Tony Ibbs has been working on deducing a `detailed description`_ from +the documentation and code of StructuredTextNG_. Edward Loper's +STMinus_ is another attempt to formalize a spec. + +For this kind of a project, the specification should always precede +the code. Otherwise, the markup is a moving target which can never be +adopted as a standard. Of course, a specification may be revised +during lifetime of the code, but without a spec there is no visible +control and thus no confidence. + + +Understanding and Extending the Code +==================================== + +The original StructuredText_ is a dense mass of sparsely commented +code and inscrutable regular expressions. It was not designed to be +extended and is very difficult to understand. StructuredTextNG_ has +been designed to allow input (syntax) and output extensions, but its +documentation (both internal [comments & docstrings], and external) is +inadequate for the complexity of the code itself. + +For reStructuredText to become truly useful, perhaps even part of +Python's standard library, it must have clear, understandable +documentation and implementation code. For the implementation of +reStructuredText to be taken seriously, it must be a sterling example +of the potential of docstrings; the implementation must practice what +the specification preaches. + + +Section Structure via Indentation +================================= + +Setext_ required that body text be indented by 2 spaces. The original +StructuredText_ and StructuredTextNG_ require that section structure +be indicated through indentation, as "inspired by Python". For +certain structures with a very limited, local extent (such as lists, +block quotes, and literal blocks), indentation naturally indicates +structure or hierarchy. For sections (which may have a very large +extent), structure via indentation is unnecessary, unnatural and +ambiguous. Rather, the syntax of the section title *itself* should +indicate that it is a section title. + +The original StructuredText states that "A single-line paragraph whose +immediately succeeding paragraphs are lower level is treated as a +header." Requiring indentation in this way is: + +- Unnecessary. The vast majority of docstrings and standalone + documents will have no more than one level of section structure. + Requiring indentation for such docstrings is unnecessary and + irritating. + +- Unnatural. Most published works use title style (type size, face, + weight, and position) and/or section/subsection numbering rather + than indentation to indicate hierarchy. This is a tradition with a + very long history. + +- Ambiguous. A StructuredText header is indistinguishable from a + one-line paragraph followed by a block quote (precluding the use of + block quotes). Enumerated section titles are ambiguous (is it a + header? is it a list item?). Some additional adornment must be + required to confirm the line's role as a title, both to a parser and + to the human reader of the source text. + +Python's use of significant whitespace is a wonderful (if not +original) innovation, however requiring indentation in ordinary +written text is hypergeneralization. + +reStructuredText_ indicates section structure through title adornment +style (as exemplified by this document). This is far more natural. +In fact, it is already in widespread use in plain text documents, +including in Python's standard distribution (such as the toplevel +README_ file). + + +Character Escaping Mechanism +============================ + +No matter what characters are chosen for markup, some day someone will +want to write documentation *about* that markup or using markup +characters in a non-markup context. Therefore, any complete markup +language must have an escaping or encoding mechanism. For a +lightweight markup system, encoding mechanisms like SGML/XML's '*' +are out. So an escaping mechanism is in. However, with carefully +chosen markup, it should be necessary to use the escaping mechanism +only infrequently. + +reStructuredText_ needs an escaping mechanism: a way to treat +markup-significant characters as the characters themselves. Currently +there is no such mechanism (although ZWiki uses '!'). What are the +candidates? + +1. ``!`` + (http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/NGEscaping) +2. ``\`` +3. ``~`` +4. doubling of characters + +The best choice for this is the backslash (``\``). It's "the single +most popular escaping character in the world!", therefore familiar and +unsurprising. Since characters only need to be escaped under special +circumstances, which are typically those explaining technical +programming issues, the use of the backslash is natural and +understandable. Python docstrings can be raw (prefixed with an 'r', +as in 'r""'), which would obviate the need for gratuitous doubling-up +of backslashes. + +(On 2001-03-29 on the Doc-SIG mailing list, GvR endorsed backslash +escapes, saying, "'nuff said. Backslash it is." Although neither +legally binding nor irrevocable nor any kind of guarantee of anything, +it is a good sign.) + +The rule would be: An unescaped backslash followed by any markup +character escapes the character. The escaped character represents the +character itself, and is prevented from playing a role in any markup +interpretation. The backslash is removed from the output. A literal +backslash is represented by an "escaped backslash," two backslashes in +a row. + +A carefully constructed set of recognition rules for inline markup +will obviate the need for backslash-escapes in almost all cases; see +`Delimitation of Inline Markup`_ below. + +When an expression (requiring backslashes and other characters used +for markup) becomes too complicated and therefore unreadable, a +literal block may be used instead. Inside literal blocks, no markup +is recognized, therefore backslashes (for the purpose of escaping +markup) become unnecessary. + +We could allow backslashes preceding non-markup characters to remain +in the output. This would make describing regular expressions and +other uses of backslashes easier. However, this would complicate the +markup rules and would be confusing. + + +Blank Lines in Lists +==================== + +Oft-requested in Doc-SIG (the earliest reference is dated 1996-08-13) +is the ability to write lists without requiring blank lines between +items. In docstrings, space is at a premium. Authors want to convey +their API or usage information in as compact a form as possible. +StructuredText_ requires blank lines between all body elements, +including list items, even when boundaries are obvious from the markup +itself. + +In reStructuredText, blank lines are optional between list items. +However, in order to eliminate ambiguity, a blank line is required +before the first list item and after the last. Nested lists also +require blank lines before the list start and after the list end. + + +Bullet List Markup +================== + +StructuredText_ includes 'o' as a bullet character. This is dangerous +and counter to the language-independent nature of the markup. There +are many languages in which 'o' is a word. For example, in Spanish:: + + Llamame a la casa + o al trabajo. + + (Call me at home or at work.) + +And in Japanese (when romanized):: + + Senshuu no doyoubi ni tegami + o kakimashita. + + ([I] wrote a letter on Saturday last week.) + +If a paragraph containing an 'o' word wraps such that the 'o' is the +first text on a line, or if a paragraph begins with such a word, it +could be misinterpreted as a bullet list. + +In reStructuredText_, 'o' is not used as a bullet character. '-', +'*', and '+' are the possible bullet characters. + + +Enumerated List Markup +====================== + +StructuredText enumerated lists are allowed to begin with numbers and +letters followed by a period or right-parenthesis, then whitespace. +This has surprising consequences for writing styles. For example, +this is recognized as an enumerated list item by StructuredText:: + + Mr. Creosote. + +People will write enumerated lists in all different ways. It is folly +to try to come up with the "perfect" format for an enumerated list, +and limit the docstring parser's recognition to that one format only. + +Rather, the parser should recognize a variety of enumerator styles. +It is also recommended that the enumerator of the first list item be +ordinal-1 ('1', 'A', 'a', 'I', or 'i'), as output formats may not be +able to begin a list at an arbitrary enumeration. + +An initial idea was to require two or more consistent enumerated list +items in a row. This idea proved impractical and was dropped. In +practice, the presence of a proper enumerator is enough to reliably +recognize an enumerated list item; any ambiguities are reported by the +parser. Here's the original idea for posterity: + + The parser should recognize a variety of enumerator styles, mark + each block as a potential enumerated list item (PELI), and + interpret the enumerators of adjacent PELIs to decide whether they + make up a consistent enumerated list. + + If a PELI is labeled with a "1.", and is immediately followed by a + PELI labeled with a "2.", we've got an enumerated list. Or "(A)" + followed by "(B)". Or "i)" followed by "ii)", etc. The chances + of accidentally recognizing two adjacent and consistently labeled + PELIs, are acceptably small. + + For an enumerated list to be recognized, the following must be + true: + + - the list must consist of multiple adjacent list items (2 or + more) + - the enumerators must all have the same format + - the enumerators must be sequential + + +Definition List Markup +====================== + +StructuredText uses ' -- ' (whitespace, two hyphens, whitespace) on +the first line of a paragraph to indicate a definition list item. The +' -- ' serves to separate the term (on the left) from the definition +(on the right). + +Many people use ' -- ' as an em-dash in their text, conflicting with +the StructuredText usage. Although the Chicago Manual of Style says +that spaces should not be used around an em-dash, Peter Funk pointed +out that this is standard usage in German (according to the Duden, the +official German reference), and possibly in other languages as well. +The widespread use of ' -- ' precludes its use for definition lists; +it would violate the "unsurprising" criterion. + +A simpler, and at least equally visually distinctive construct +(proposed by Guido van Rossum, who incidentally is a frequent user of +' -- ') would do just as well:: + + term 1 + Definition. + + term 2 + Definition 2, paragraph 1. + + Definition 2, paragraph 2. + +A reStructuredText definition list item consists of a term and a +definition. A term is a simple one-line paragraph. A definition is a +block indented relative to the term, and may contain multiple +paragraphs and other body elements. No blank line precedes a +definition (this distinguishes definition lists from block quotes). + + +Literal Blocks +============== + +The StructuredText_ specification has literal blocks indicated by +'example', 'examples', or '::' ending the preceding paragraph. STNG +only recognizes '::'; 'example'/'examples' are not implemented. This +is good; it fixes an unnecessary language dependency. The problem is +what to do with the sometimes- unwanted '::'. + +In reStructuredText_ '::' at the end of a paragraph indicates that +subsequent *indented* blocks are treated as literal text. No further +markup interpretation is done within literal blocks (not even +backslash-escapes). If the '::' is preceded by whitespace, '::' is +omitted from the output; if '::' was the sole content of a paragraph, +the entire paragraph is removed (no 'empty' paragraph remains). If +'::' is preceded by a non-whitespace character, '::' is replaced by +':' (i.e., the extra colon is removed). + +Thus, a section could begin with a literal block as follows:: + + Section Title + ------------- + + :: + + print "this is example literal" + + +Tables +====== + +The table markup scheme in classic StructuredText was horrible. Its +omission from StructuredTextNG is welcome, and its markup will not be +repeated here. However, tables themselves are useful in +documentation. Alternatives: + +1. This format is the most natural and obvious. It was independently + invented (no great feat of creation!), and later found to be the + format supported by the `Emacs table mode`_:: + + +------------+------------+------------+--------------+ + | Header 1 | Header 2 | Header 3 | Header 4 | + +============+============+============+==============+ + | Column 1 | Column 2 | Column 3 & 4 span (Row 1) | + +------------+------------+------------+--------------+ + | Column 1 & 2 span | Column 3 | - Column 4 | + +------------+------------+------------+ - Row 2 & 3 | + | 1 | 2 | 3 | - span | + +------------+------------+------------+--------------+ + + Tables are described with a visual outline made up of the + characters '-', '=', '|', and '+': + + - The hyphen ('-') is used for horizontal lines (row separators). + - The equals sign ('=') is optionally used as a header separator + (as of version 1.5.24, this is not supported by the Emacs table + mode). + - The vertical bar ('|') is used for for vertical lines (column + separators). + - The plus sign ('+') is used for intersections of horizontal and + vertical lines. + + Row and column spans are possible simply by omitting the column or + row separators, respectively. The header row separator must be + complete; in other words, a header cell may not span into the table + body. Each cell contains body elements, and may have multiple + paragraphs, lists, etc. Initial spaces for a left margin are + allowed; the first line of text in a cell determines its left + margin. + +2. Below is a simpler table structure. It may be better suited to + manual input than alternative #1, but there is no Emacs editing + mode available. One disadvantage is that it resembles section + titles; a one-column table would look exactly like section & + subsection titles. :: + + ============ ============ ============ ============== + Header 1 Header 2 Header 3 Header 4 + ============ ============ ============ ============== + Column 1 Column 2 Column 3 & 4 span (Row 1) + ------------ ------------ --------------------------- + Column 1 & 2 span Column 3 - Column 4 + ------------------------- ------------ - Row 2 & 3 + 1 2 3 - span + ============ ============ ============ ============== + + The table begins with a top border of equals signs with a space at + each column boundary (regardless of spans). Each row is + underlined. Internal row separators are underlines of '-', with + spaces at column boundaries. The last of the optional head rows is + underlined with '=', again with spaces at column boundaries. + Column spans have no spaces in their underline. Row spans simply + lack an underline at the row boundary. The bottom boundary of the + table consists of '=' underlines. A blank line is required + following a table. + +3. A minimalist alternative is as follows:: + + ==== ===== ======== ======== ======= ==== ===== ===== + Old State Input Action New State Notes + ----------- -------- ----------------- ----------- + ids types new type sys.msg. dupname ids types + ==== ===== ======== ======== ======= ==== ===== ===== + -- -- explicit -- -- new True + -- -- implicit -- -- new False + None False explicit -- -- new True + old False explicit implicit old new True + None True explicit explicit new None True + old True explicit explicit new,old None True [1] + None False implicit implicit new None False + old False implicit implicit new,old None False + None True implicit implicit new None True + old True implicit implicit new old True + ==== ===== ======== ======== ======= ==== ===== ===== + + The table begins with a top border of equals signs with one or more + spaces at each column boundary (regardless of spans). There must + be at least two columns in the table (to differentiate it from + section headers). Each line starts a new row. The rightmost + column is unbounded; text may continue past the edge of the table. + Each row/line must contain spaces at column boundaries, except for + explicit column spans. Underlines of '-' can be used to indicate + column spans, but should be used sparingly if at all. Lines + containing column span underlines may not contain any other text. + The last of the optional head rows is underlined with '=', again + with spaces at column boundaries. The bottom boundary of the table + consists of '=' underlines. A blank line is required following a + table. + + This table sums up the features. Using all the features in such a + small space is not pretty though:: + + ======== ======== ======== + Header 2 & 3 Span + ------------------ + Header 1 Header 2 Header 3 + ======== ======== ======== + Each line is a new row. + Each row consists of one line only. + Row spans are not possible. + The last column may spill over to the right. + Column spans are possible with an underline joining columns. + ---------------------------- + The span is limited to the row above the underline. + ======== ======== ======== + +4. As a variation of alternative 3, bullet list syntax in the first + column could be used to indicate row starts. Multi-line rows are + possible, but row spans are not. For example:: + + ===== ===== + col 1 col 2 + ===== ===== + - 1 Second column of row 1. + - 2 Second column of row 2. + Second line of paragraph. + - 3 Second column of row 3. + + Second paragraph of row 3, + column 2 + ===== ===== + + Column spans would be indicated on the line after the last line of + the row. To indicate a real bullet list within a first-column + cell, simply nest the bullets. + +5. In a further variation, we could simply assume that whitespace in + the first column implies a multi-line row; the text in other + columns is continuation text. For example:: + + ===== ===== + col 1 col 2 + ===== ===== + 1 Second column of row 1. + 2 Second column of row 2. + Second line of paragraph. + 3 Second column of row 3. + + Second paragraph of row 3, + column 2 + ===== ===== + + Limitations of this approach: + + - Cells in the first column are limited to one line of text. + + - Cells in the first column *must* contain some text; blank cells + would lead to a misinterpretation. An empty comment ("..") is + sufficient. + +6. Combining alternative 3 and 4, a bullet list in the first column + could mean multi-line rows, and no bullet list means single-line + rows only. + +Alternatives 1 and 5 has been adopted by reStructuredText. + + +Delimitation of Inline Markup +============================= + +StructuredText specifies that inline markup must begin with +whitespace, precluding such constructs as parenthesized or quoted +emphatic text:: + + "**What?**" she cried. (*exit stage left*) + +The `reStructuredText markup specification`_ allows for such +constructs and disambiguates inline markup through a set of +recognition rules. These recognition rules define the context of +markup start-strings and end-strings, allowing markup characters to be +used in most non-markup contexts without a problem (or a backslash). +So we can say, "Use asterisks (*) around words or phrases to +*emphasisze* them." The '(*)' will not be recognized as markup. This +reduces the need for markup escaping to the point where an escape +character is *almost* (but not quite!) unnecessary. + + +Underlining +=========== + +StructuredText uses '_text_' to indicate underlining. To quote David +Ascher in his 2000-01-21 Doc-SIG mailing list post, "Docstring +grammar: a very revised proposal": + + The tagging of underlined text with _'s is suboptimal. Underlines + shouldn't be used from a typographic perspective (underlines were + designed to be used in manuscripts to communicate to the + typesetter that the text should be italicized -- no well-typeset + book ever uses underlines), and conflict with double-underscored + Python variable names (__init__ and the like), which would get + truncated and underlined when that effect is not desired. Note + that while *complete* markup would prevent that truncation + ('__init__'), I think of docstring markups much like I think of + type annotations -- they should be optional and above all do no + harm. In this case the underline markup does harm. + +Underlining is not part of the reStructuredText specification. + + +Inline Literals +=============== + +StructuredText's markup for inline literals (text left as-is, +verbatim, usually in a monospaced font; as in HTML <TT>) is single +quotes ('literals'). The problem with single quotes is that they are +too often used for other purposes: + +- Apostrophes: "Don't blame me, 'cause it ain't mine, it's Chris'."; + +- Quoting text: + + First Bruce: "Well Bruce, I heard the prime minister use it. + 'S'hot enough to boil a monkey's bum in 'ere your Majesty,' he + said, and she smiled quietly to herself." + + In the UK, single quotes are used for dialogue in published works. + +- String literals: s = '' + +Alternatives:: + + 'text' \'text\' ''text'' "text" \"text\" ""text"" + #text# @text@ `text` ^text^ ``text'' ``text`` + +The examples below contain inline literals, quoted text, and +apostrophes. Each example should evaluate to the following HTML:: + + Some <TT>code</TT>, with a 'quote', "double", ain't it grand? + Does <TT>a[b] = 'c' + "d" + `2^3`</TT> work? + + 0. Some code, with a quote, double, ain't it grand? + Does a[b] = 'c' + "d" + `2^3` work? + 1. Some 'code', with a \'quote\', "double", ain\'t it grand? + Does 'a[b] = \'c\' + "d" + `2^3`' work? + 2. Some \'code\', with a 'quote', "double", ain't it grand? + Does \'a[b] = 'c' + "d" + `2^3`\' work? + 3. Some ''code'', with a 'quote', "double", ain't it grand? + Does ''a[b] = 'c' + "d" + `2^3`'' work? + 4. Some "code", with a 'quote', \"double\", ain't it grand? + Does "a[b] = 'c' + "d" + `2^3`" work? + 5. Some \"code\", with a 'quote', "double", ain't it grand? + Does \"a[b] = 'c' + "d" + `2^3`\" work? + 6. Some ""code"", with a 'quote', "double", ain't it grand? + Does ""a[b] = 'c' + "d" + `2^3`"" work? + 7. Some #code#, with a 'quote', "double", ain't it grand? + Does #a[b] = 'c' + "d" + `2^3`# work? + 8. Some @code@, with a 'quote', "double", ain't it grand? + Does @a[b] = 'c' + "d" + `2^3`@ work? + 9. Some `code`, with a 'quote', "double", ain't it grand? + Does `a[b] = 'c' + "d" + \`2^3\`` work? + 10. Some ^code^, with a 'quote', "double", ain't it grand? + Does ^a[b] = 'c' + "d" + `2\^3`^ work? + 11. Some ``code'', with a 'quote', "double", ain't it grand? + Does ``a[b] = 'c' + "d" + `2^3`'' work? + 12. Some ``code``, with a 'quote', "double", ain't it grand? + Does ``a[b] = 'c' + "d" + `2^3\``` work? + +Backquotes (#9 & #12) are the best choice. They are unobtrusive and +relatviely rarely used (more rarely than ' or ", anyhow). Backquotes +have the connotation of 'quotes', which other options (like carets, +#10) don't. + +Analogously with ``*emph*`` & ``**strong**``, double-backquotes (#12) +could be used for inline literals. If single-backquotes are used for +'interpreted text' (context-sensitive domain-specific descriptive +markup) such as function name hyperlinks in Python docstrings, then +double-backquotes could be used for absolute-literals, wherein no +processing whatsoever takes place. An advantage of double-backquotes +would be that backslash-escaping would no longer be necessary for +embedded single-backquotes; however, embedded double-backquotes (in an +end-string context) would be illegal. See `Backquotes in +Phrase-Links`__ in `Record of reStructuredText Syntax Alternatives`__. + +__ alternatives.html#backquotes-in-phrase-links +__ alternatives.html + +Alternative choices are carets (#10) and TeX-style quotes (#11). For +examples of TeX-style quoting, see +http://www.zope.org/Members/jim/StructuredTextWiki/CustomizingTheDocumentProcessor. + +Some existing uses of backquotes: + +1. As a synonym for repr() in Python. +2. For command-interpolation in shell scripts. +3. Used as open-quotes in TeX code (and carried over into plaintext + by TeXies). + +The inline markup start-string and end-string recognition rules +defined by the `reStructuredText markup specification`_ would allow +all of these cases inside inline literals, with very few exceptions. +As a fallback, literal blocks could handle all cases. + +Outside of inline literals, the above uses of backquotes would require +backslash-escaping. However, these are all prime examples of text +that should be marked up with inline literals. + +If either backquotes or straight single-quotes are used as markup, +TeX-quotes are too troublesome to support, so no special-casing of +TeX-quotes should be done (at least at first). If TeX-quotes have to +be used outside of literals, a single backslash-escaped would suffice: +\``TeX quote''. Ugly, true, but very infrequently used. + +Using literal blocks is a fallback option which removes the need for +backslash-escaping:: + + like this:: + + Here, we can do ``absolutely'' anything `'`'\|/|\ we like! + +No mechanism for inline literals is perfect, just as no escaping +mechanism is perfect. No matter what we use, complicated inline +expressions involving the inline literal quote and/or the backslash +will end up looking ugly. We can only choose the least often ugly +option. + +reStructuredText will use double backquotes for inline literals, and +single backqoutes for interpreted text. + + +Hyperlinks +========== + +There are three forms of hyperlink currently in StructuredText_: + +1. (Absolute & relative URIs.) Text enclosed by double quotes + followed by a colon, a URI, and concluded by punctuation plus white + space, or just white space, is treated as a hyperlink:: + + "Python":http://www.python.org/ + +2. (Absolute URIs only.) Text enclosed by double quotes followed by a + comma, one or more spaces, an absolute URI and concluded by + punctuation plus white space, or just white space, is treated as a + hyperlink:: + + "mail me", mailto:me@mail.com + +3. (Endnotes.) Text enclosed by brackets link to an endnote at the + end of the document: at the beginning of the line, two dots, a + space, and the same text in brackets, followed by the end note + itself:: + + Please refer to the fine manual [GVR2001]. + + .. [GVR2001] Python Documentation, Release 2.1, van Rossum, + Drake, et al., http://www.python.org/doc/ + +The problem with forms 1 and 2 is that they are neither intuitive nor +unobtrusive (they break design goals 5 & 2). They overload +double-quotes, which are too often used in ordinary text (potentially +breaking design goal 4). The brackets in form 3 are also too common +in ordinary text (such as [nested] asides and Python lists like [12]). + +Alternatives: + +1. Have no special markup for hyperlinks. + +2. A. Interpret and mark up hyperlinks as any contiguous text + containing '://' or ':...@' (absolute URI) or '@' (email + address) after an alphanumeric word. To de-emphasize the URI, + simply enclose it in parentheses: + + Python (http://www.python.org/) + + B. Leave special hyperlink markup as a domain-specific extension. + Hyperlinks in ordinary reStructuredText documents would be + required to be standalone (i.e. the URI text inline in the + document text). Processed hyperlinks (where the URI text is + hidden behind the link) are important enough to warrant syntax. + +3. The original Setext_ introduced a mechanism of indirect hyperlinks. + A source link word ('hot word') in the text was given a trailing + underscore:: + + Here is some text with a hyperlink_ built in. + + The hyperlink itself appeared at the end of the document on a line + by itself, beginning with two dots, a space, the link word with a + leading underscore, whitespace, and the URI itself:: + + .. _hyperlink http://www.123.xyz + + Setext used ``underscores_instead_of_spaces_`` for phrase links. + +With some modification, alternative 3 best satisfies the design goals. +It has the advantage of being readable and relatively unobtrusive. +Since each source link must match up to a target, the odd variable +ending in an underscore can be spared being marked up (although it +should generate a "no such link target" warning). The only +disadvantage is that phrase-links aren't possible without some +obtrusive syntax. + +We could achieve phrase-links if we enclose the link text: + +1. in double quotes:: + + "like this"_ + +2. in brackets:: + + [like this]_ + +3. or in backquotes:: + + `like this`_ + +Each gives us somewhat obtrusive markup, but that is unavoidable. The +bracketed syntax (#2) is reminiscent of links on many web pages +(intuitive), although it is somewhat obtrusive. Alternative #3 is +much less obtrusive, and is consistent with interpreted text: the +trailing underscore indicates the interpretation of the phrase, as a +hyperlink. #3 also disambiguates hyperlinks from footnote references. +Alternative #3 wins. + +The same trailing underscore markup can also be used for footnote and +citation references, removing the problem with ordinary bracketed text +and Python lists:: + + Please refer to the fine manual [GVR2000]_. + + .. [GVR2000] Python Documentation, van Rossum, Drake, et al., + http://www.python.org/doc/ + +The two-dots-and-a-space syntax was generalized by Setext for +comments, which are removed from the (visible) processed output. +reStructuredText uses this syntax for comments, footnotes, and link +target, collectively termed "explicit markup". For link targets, in +order to eliminate ambiguity with comments and footnotes, +reStructuredText specifies that a colon always follow the link target +word/phrase. The colon denotes 'maps to'. There is no reason to +restrict target links to the end of the document; they could just as +easily be interspersed. + +Internal hyperlinks (links from one point to another within a single +document) can be expressed by a source link as before, and a target +link with a colon but no URI. In effect, these targets 'map to' the +element immediately following. + +As an added bonus, we now have a perfect candidate for +reStructuredText directives, a simple extension mechanism: explicit +markup containing a single word followed by two colons and whitespace. +The interpretation of subsequent data on the directive line or +following is directive-dependent. + +To summarize:: + + .. This is a comment. + + .. The line below is an example of a directive. + .. version:: 1 + + This is a footnote [1]_. + + This internal hyperlink will take us to the footnotes_ area below. + + Here is a one-word_ external hyperlink. + + Here is `a hyperlink phrase`_. + + .. _footnotes: + .. [1] Footnote text goes here. + + .. external hyperlink target mappings: + .. _one-word: http://www.123.xyz + .. _a hyperlink phrase: http://www.123.xyz + +The presence or absence of a colon after the target link +differentiates an indirect hyperlink from a footnote, respectively. A +footnote requires brackets. Backquotes around a target link word or +phrase are required if the phrase contains a colon, optional +otherwise. + +Below are examples using no markup, the two StructuredText hypertext +styles, and the reStructuredText hypertext style. Each example +contains an indirect link, a direct link, a footnote/endnote, and +bracketed text. In HTML, each example should evaluate to:: + + <P>A <A HREF="http://spam.org">URI</A>, see <A HREF="#eggs2000"> + [eggs2000]</A> (in Bacon [Publisher]). Also see + <A HREF="http://eggs.org">http://eggs.org</A>.</P> + + <P><A NAME="eggs2000">[eggs2000]</A> "Spam, Spam, Spam, Eggs, + Bacon, and Spam"</P> + +1. No markup:: + + A URI http://spam.org, see eggs2000 (in Bacon [Publisher]). + Also see http://eggs.org. + + eggs2000 "Spam, Spam, Spam, Eggs, Bacon, and Spam" + +2. StructuredText absolute/relative URI syntax + ("text":http://www.url.org):: + + A "URI":http://spam.org, see [eggs2000] (in Bacon [Publisher]). + Also see "http://eggs.org":http://eggs.org. + + .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" + + Note that StructuredText does not recognize standalone URIs, + forcing doubling up as shown in the second line of the example + above. + +3. StructuredText absolute-only URI syntax + ("text", mailto:you@your.com):: + + A "URI", http://spam.org, see [eggs2000] (in Bacon + [Publisher]). Also see "http://eggs.org", http://eggs.org. + + .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" + +4. reStructuredText syntax:: + + 4. A URI_, see [eggs2000]_ (in Bacon [Publisher]). + Also see http://eggs.org. + + .. _URI: http:/spam.org + .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" + +The bracketed text '[Publisher]' may be problematic with +StructuredText (syntax 2 & 3). + +reStructuredText's syntax (#4) is definitely the most readable. The +text is separated from the link URI and the footnote, resulting in +cleanly readable text. + +.. _StructuredText: + http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/FrontPage +.. _Setext: http://docutils.sourceforge.net/mirror/setext.html +.. _reStructuredText: http://docutils.sourceforge.net/rst.html +.. _detailed description: + http://homepage.ntlworld.com/tibsnjoan/docutils/STNG-format.html +.. _STMinus: http://www.cis.upenn.edu/~edloper/pydoc/stminus.html +.. _StructuredTextNG: + http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/StructuredTextNG +.. _README: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/ + python/python/dist/src/README +.. _Emacs table mode: http://table.sourceforge.net/ +.. _reStructuredText Markup Specification: + ../../ref/rst/restructuredtext.html + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/semantics.txt b/docs/dev/semantics.txt new file mode 100644 index 000000000..cd20e15f6 --- /dev/null +++ b/docs/dev/semantics.txt @@ -0,0 +1,119 @@ +===================== + Docstring Semantics +===================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +These are notes for a possible future PEP providing the final piece of +the Python docstring puzzle: docstring semantics or documentation +methodology. `PEP 257`_, Docstring Conventions, sketches out some +guidelines, but does not get into methodology details. + +I haven't explored documentation methodology more because, in my +opinion, it is a completely separate issue from syntax, and it's even +more controversial than syntax. Nobody wants to be told how to lay +out their documentation, a la JavaDoc_. I think the JavaDoc way is +butt-ugly, but it *is* an established standard for the Java world. +Any standard documentation methodology has to be formal enough to be +useful but remain light enough to be usable. If the methodology is +too strict, too heavy, or too ugly, many/most will not want to use it. + +I think a standard methodology could benefit the Python community, but +it would be a hard sell. A PEP would be the place to start. For most +human-readable documentation needs, the free-form text approach is +adequate. We'd only need a formal methodology if we want to extract +the parameters into a data dictionary, index, or summary of some kind. + + +PythonDoc +========= + +(Not to be confused with Daniel Larsson's pythondoc_ project.) + +A Python version of the JavaDoc_ semantics (not syntax). A set of +conventions which are understood by the Docutils. What JavaDoc has +done is to establish a syntax that enables a certain documentation +methodology, or standard *semantics*. JavaDoc is not just syntax; it +prescribes a methodology. + +- Use field lists or definition lists for "tagged blocks". By this I + mean that field lists can be used similarly to JavaDoc's ``@tag`` + syntax. That's actually one of the motivators behind field lists. + For example, we could have:: + + """ + :Parameters: + - `lines`: a list of one-line strings without newlines. + - `until_blank`: Stop collecting at the first blank line if + true (1). + - `strip_indent`: Strip common leading indent if true (1, + default). + + :Return: + - a list of indented lines with mininum indent removed; + - the amount of the indent; + - whether or not the block finished with a blank line or at + the end of `lines`. + """ + + This is taken straight out of docutils/statemachine.py, in which I + experimented with a simple documentation methodology. Another + variation I've thought of exploits the Grouch_-compatible + "classifier" element of definition lists. For example:: + + :Parameters: + `lines` : [string] + List of one-line strings without newlines. + `until_blank` : boolean + Stop collecting at the first blank line if true (1). + `strip_indent` : boolean + Strip common leading indent if true (1, default). + +- Field lists could even be used in a one-to-one correspondence with + JavaDoc ``@tags``, although I doubt if I'd recommend it. Several + ports of JavaDoc's ``@tag`` methodology exist in Python, most + recently Ed Loper's "epydoc_". + + +Other Ideas +=========== + +- Can we extract comments from parsed modules? Could be handy for + documenting function/method parameters:: + + def method(self, + source, # path of input file + dest # path of output file + ): + + This would save having to repeat parameter names in the docstring. + + Idea from Mark Hammond's 1998-06-23 Doc-SIG post, "Re: [Doc-SIG] + Documentation tool": + + it would be quite hard to add a new param to this method without + realising you should document it + +- Frederic Giacometti's `iPhrase Python documentation conventions`_ is + an attachment to his Doc-SIG post of 2001-05-30. + + +.. _PEP 257: http://www.python.org/peps/pep-0257.html +.. _JavaDoc: http://java.sun.com/j2se/javadoc/ +.. _pythondoc: http://starship.python.net/crew/danilo/pythondoc/ +.. _Grouch: http://www.mems-exchange.org/software/grouch/ +.. _epydoc: http://epydoc.sf.net/ +.. _iPhrase Python documentation conventions: + http://mail.python.org/pipermail/doc-sig/2001-May/001840.html + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/testing.txt b/docs/dev/testing.txt new file mode 100644 index 000000000..bde54116f --- /dev/null +++ b/docs/dev/testing.txt @@ -0,0 +1,246 @@ +=================== + Docutils_ Testing +=================== + +:Author: Felix Wiemann +:Author: David Goodger +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +.. _Docutils: http://docutils.sourceforge.net/ + +.. contents:: + +When adding new functionality (or fixing bugs), be sure to add test +cases to the test suite. Practise test-first programming; it's fun, +it's addictive, and it works! + +This document describes how to run the Docutils test suite, how the +tests are organized and how to add new tests or modify existing tests. + + +Running the Test Suite +====================== + +Before checking in any changes, run the entire Docutils test suite to +be sure that you haven't broken anything. From a shell:: + + cd docutils/test + ./alltests.py + + +Python Versions +=============== + +The Docutils 0.4 release supports Python 2.1 [#py21]_ or later, with +some features only working (and being tested) with Python 2.3+. +Therefore, you should actually have Pythons 2.1 [#py21]_, 2.2, 2.3, as +well as the latest Python installed and always run the tests on all of +them. (A good way to do that is to always run the test suite through +a short script that runs ``alltests.py`` under each version of +Python.) If you can't afford intalling 3 or more Python versions, the +edge cases (2.1 and 2.3) should cover most of it. + +.. [#py21] Python 2.1 may be used providing the compiler package is + installed. The compiler package can be found in the Tools/ + directory of Python 2.1's source distribution. + +Good resources covering the differences between Python versions: + +* `What's New in Python 2.2`__ +* `What's New in Python 2.3`__ +* `What's New in Python 2.4`__ +* `PEP 290 - Code Migration and Modernization`__ + +__ http://www.python.org/doc/2.2.3/whatsnew/whatsnew22.html +__ http://www.python.org/doc/2.3.5/whatsnew/whatsnew23.html +__ http://www.python.org/doc/2.4.1/whatsnew/whatsnew24.html +__ http://www.python.org/peps/pep-0290.html + +.. _Python Check-in Policies: http://www.python.org/dev/tools.html +.. _sandbox directory: + http://svn.berlios.de/viewcvs/docutils/trunk/sandbox/ +.. _nightly repository tarball: + http://svn.berlios.de/svndumps/docutils-repos.gz + + +Unit Tests +========== + +Unit tests test single functions or modules (i.e. whitebox testing). + +If you are implementing a new feature, be sure to write a test case +covering its functionality. It happens very frequently that your +implementation (or even only a part of it) doesn't work with an older +(or even newer) Python version, and the only reliable way to detect +those cases is using tests. + +Often, it's easier to write the test first and then implement the +functionality required to make the test pass. + + +Writing New Tests +----------------- + +When writing new tests, it very often helps to see how a similar test +is implemented. For example, the files in the +``test_parsers/test_rst/`` directory all look very similar. So when +adding a test, you don't have to reinvent the wheel. + +If there is no similar test, you can write a new test from scratch +using Python's ``unittest`` module. For an example, please have a +look at the following imaginary ``test_square.py``:: + + #! /usr/bin/env python + + # Author: your name + # Contact: your email address + # Revision: $Revision$ + # Date: $Date$ + # Copyright: This module has been placed in the public domain. + + """ + Test module for docutils.square. + """ + + import unittest + import docutils.square + + + class SquareTest(unittest.TestCase): + + def test_square(self): + self.assertEqual(docutils.square.square(0), 0) + self.assertEqual(docutils.square.square(5), 25) + self.assertEqual(docutils.square.square(7), 49) + + def test_square_root(self): + self.assertEqual(docutils.square.sqrt(49), 7) + self.assertEqual(docutils.square.sqrt(0), 0) + self.assertRaises(docutils.square.SquareRootError, + docutils.square.sqrt, 20) + + + if __name__ == '__main__': + unittest.main() + +For more details on how to write tests, please refer to the +documentation of the ``unittest`` module. + + +.. _functional: + +Functional Tests +================ + +The directory ``test/functional/`` contains data for functional tests. + +Performing functional testing means testing the Docutils system as a +whole (i.e. blackbox testing). + + +Directory Structure +------------------- + ++ ``functional/`` The main data directory. + + + ``input/`` The input files. + + - ``some_test.txt``, for example. + + + ``output/`` The actual output. + + - ``some_test.html``, for example. + + + ``expected/`` The expected output. + + - ``some_test.html``, for example. + + + ``tests/`` The config files for processing the input files. + + - ``some_test.py``, for example. + + - ``_default.py``, the `default configuration file`_. + + +The Testing Process +------------------- + +When running ``test_functional.py``, all config files in +``functional/tests/`` are processed. (Config files whose names begin +with an underscore are ignored.) The current working directory is +always Docutils' main test directory (``test/``). + +For example, ``functional/tests/some_test.py`` could read like this:: + + # Source and destination file names. + test_source = "some_test.txt" + test_destination = "some_test.html" + + # Keyword parameters passed to publish_file. + reader_name = "standalone" + parser_name = "rst" + writer_name = "html" + settings_overrides['output-encoding'] = 'utf-8' + # Relative to main ``test/`` directory. + settings_overrides['stylesheet_path'] = '../docutils/writers/html4css1/html4css1.css' + +The two variables ``test_source`` and ``test_destination`` contain the +input file name (relative to ``functional/input/``) and the output +file name (relative to ``functional/output/`` and +``functional/expected/``). Note that the file names can be chosen +arbitrarily. However, the file names in ``functional/output/`` *must* +match the file names in ``functional/expected/``. + +If defined, ``_test_more`` must be a function with the following +signature:: + + def _test_more(expected_dir, output_dir, test_case, parameters): + +This function is called from the test case to perform tests beyond the +simple comparison of expected and actual output files. + +``test_source`` and ``test_destination`` are removed from the +namespace, as are all variables whose names begin with an underscore +("_"). The remaining names are passed as keyword arguments to +``docutils.core.publish_file``, so you can set reader, parser, writer +and anything else you want to configure. Note that +``settings_overrides`` is already initialized as a dictionary *before* +the execution of the config file. + + +Creating New Tests +------------------ + +In order to create a new test, put the input test file into +``functional/input/``. Then create a config file in +``functional/tests/`` which sets at least input and output file names, +reader, parser and writer. + +Now run ``test_functional.py``. The test will fail, of course, +because you do not have an expected output yet. However, an output +file will have been generated in ``functional/output/``. Check this +output file for validity and correctness. Then copy the file to +``functional/expected/``. + +If you rerun ``test_functional.py`` now, it should pass. + +If you run ``test_functional.py`` later and the actual output doesn't +match the expected output anymore, the test will fail. + +If this is the case and you made an intentional change, check the +actual output for validity and correctness, copy it to +``functional/expected/`` (overwriting the old expected output), and +commit the change. + + +.. _default configuration file: + +The Default Configuration File +------------------------------ + +The file ``functional/tests/_default.py`` contains default settings. +It is executed just before the actual configuration files, which has +the same effect as if the contents of ``_default.py`` were prepended +to every configuration file. diff --git a/docs/dev/todo.txt b/docs/dev/todo.txt new file mode 100644 index 000000000..6f1c6291d --- /dev/null +++ b/docs/dev/todo.txt @@ -0,0 +1,1964 @@ +====================== + Docutils_ To Do List +====================== + +:Author: David Goodger (with input from many); open to all Docutils + developers +:Contact: goodger@python.org +:Date: $Date$ +:Revision: $Revision$ +:Copyright: This document has been placed in the public domain. + +.. _Docutils: http://docutils.sourceforge.net/ + +.. contents:: + + +Priority items are marked with "@" symbols. The more @s, the higher +the priority. Items in question form (containing "?") are ideas which +require more thought and debate; they are potential to-do's. + +Many of these items are awaiting champions. If you see something +you'd like to tackle, please do! If there's something you'd like to +see done but are unable to implement it yourself, please consider +donating to Docutils: |donate| + +.. |donate| image:: http://images.sourceforge.net/images/project-support.jpg + :target: http://sourceforge.net/donate/index.php?group_id=38414 + :align: middle + :width: 88 + :height: 32 + :alt: Support the Docutils project! + +Please see also the Bugs_ document for a list of bugs in Docutils. + +.. _bugs: ../../BUGS.html + + +Release 0.4 +=========== + +We should get Docutils 0.4 out soon, but we shouldn't just cut a +"frozen snapshot" release. Here's a list of features (achievable in +the short term) to include: + +* [DONE in rev. 3901] Move support files to docutils/writers/support. + +* [DONE in rev. 4163] Convert ``docutils/writers/support/*`` into + individual writer packages. + +* [DONE in rev. 3901] Remove docutils.transforms.html.StylesheetCheck + (no longer needed because of the above change). + +* [DONE in rev. 3962] Incorporate new branch policy into the docs. + ("Development strategy" thread on Docutils-develop) + +* [DONE in rev. 4152] Added East-Asian double-width character support. + +* [DONE in rev. 4156] Merge the S5 branch. + +Anything else? + +Once released, + +* Tag it and create a maintenance branch (perhaps "maint-0-4"). + +* Declare that: + + - Docutils 0.4.x is the last version that will support Python 2.1 + (and perhaps higher?) + + - Docutils 0.4.x is the last version that will support (make + compromises for) Netscape Navigator 4 + + +Minimum Requirements for Python Standard Library Candidacy +========================================================== + +Below are action items that must be added and issues that must be +addressed before Docutils can be considered suitable to be proposed +for inclusion in the Python standard library. + +* Support for `document splitting`_. May require some major code + rework. + +* Support for subdocuments (see `large documents`_). + +* `Object numbering and object references`_. + +* `Nested inline markup`_. + +* `Python Source Reader`_. + +* The HTML writer needs to be rewritten (or a second HTML writer + added) to allow for custom classes, and for arbitrary splitting + (stack-based?). + +* Documentation_ of the architecture. Other docs too. + +* Plugin support. + +* A LaTeX writer making use of (La)TeX's power, so that the rendering + of the resulting documents is more easily customizable. (Similar to + what you wrote about a new HTML Writer.) + +* Suitability for `Python module documentation + <http://docutils.sf.net/sandbox/README.html#documenting-python>`_. + + +General +======= + +* Allow different report levels for STDERR and system_messages inside + the document? + +* Change the docutils-update script (in sandbox/infrastructure), to + support arbitrary branch snapshots. + +* Add a generic "container" element, equivalent to "inline", to which + a "class" attribute can be attached. Will require a reST directive + also. + +* Move some general-interest sandboxes out of individuals' + directories, into subprojects? + +* Add option for file (and URL) access restriction to make Docutils + usable in Wikis and similar applications. + + 2005-03-21: added ``file_insertion_enabled`` & ``raw_enabled`` + settings. These partially solve the problem, allowing or disabling + **all** file accesses, but not limited access. + +* Configuration file handling needs discussion: + + - There should be some error checking on the contents of config + files. How much checking should be done? How loudly should + Docutils complain if it encounters an error/problem? + + - Docutils doesn't complain when it doesn't find a configuration + file supplied with the ``--config`` option. Should it? (If yes, + error or warning?) + +* Internationalization: + + - I18n needs refactoring, the language dictionaries are difficult to + maintain. Maybe have a look at gettext or similar tools. + + - Language modules: in accented languages it may be useful to have + both accented and unaccented entries in the + ``bibliographic_fields`` mapping for versatility. + + - Add a "--strict-language" option & setting: no English fallback + for language-dependent features. + + - Add internationalization to _`footer boilerplate text` (resulting + from "--generator", "--source-link", and "--date" etc.), allowing + translations. + +* Add validation? See http://pytrex.sourceforge.net, RELAX NG, pyRXP. + +* In ``docutils.readers.get_reader_class`` (& ``parsers`` & + ``writers`` too), should we be importing "standalone" or + "docutils.readers.standalone"? (This would avoid importing + top-level modules if the module name is not in docutils/readers. + Potential nastiness.) + +* Perhaps store a _`name-to-id mapping file`? This could be stored + permanently, read by subsequent processing runs, and updated with + new entries. ("Persistent ID mapping"?) + +* Perhaps the ``Component.supports`` method should deal with + individual features ("meta" etc.) instead of formats ("html" etc.)? + +* Add _`object numbering and object references` (tables & figures). + These would be the equivalent of DocBook's "formal" elements. + + We may need _`persistent sequences`, such as chapter numbers. See + `OpenOffice.org XML`_ "fields". Should the sequences be automatic + or manual (user-specifyable)? + + We need to name the objects: + + - "name" option for the "figure" directive? :: + + .. figure:: image.png + :name: image's name + + Same for the "table" directive:: + + .. table:: optional title here + :name: table's name + + ===== ===== + x not x + ===== ===== + True False + False True + ===== ===== + + This would also allow other options to be set, like border + styles. The same technique could be used for other objects. + + A preliminary "table" directive has been implemented, supporting + table titles. Perhaps the name should derive from the title. + + - The object could also be done this way:: + + .. _figure name: + + .. figure:: image.png + + This may be a more general solution, equally applicable to tables. + However, explicit naming using an option seems simpler to users. + + - Perhaps the figure name could be incorporated into the figure + definition, as an optional inline target part of the directive + argument:: + + .. figure:: _`figure name` image.png + + Maybe with a delimiter:: + + .. figure:: _`figure name`: image.png + + Or some other, simpler syntax. + + We'll also need syntax for object references. See `OpenOffice.org + XML`_ "reference fields": + + - Parameterized substitutions? For example:: + + See |figure (figure name)| on |page (figure name)|. + + .. |figure (name)| figure-ref:: (name) + .. |page (name)| page-ref:: (name) + + The result would be:: + + See figure 3.11 on page 157. + + But this would require substitution directives to be processed at + reference-time, not at definition-time as they are now. Or, + perhaps the directives could just leave ``pending`` elements + behind, and the transforms do the work? How to pass the data + through? Too complicated. + + - An interpreted text approach is simpler and better:: + + See :figure:`figure name` on :page:`figure name`. + + The "figure" and "page" roles could generate appropriate + boilerplate text. The position of the role (prefix or suffix) + could also be utilized. + + See `Interpreted Text`_ below. + + - We could leave the boilerplate text up to the document:: + + See Figure :fig:`figure name` on page :pg:`figure name`. + + - Reference boilerplate could be specified in the document + (defaulting to nothing):: + + .. fignum:: + :prefix-ref: "Figure " + :prefix-caption: "Fig. " + :suffix-caption: : + + .. _OpenOffice.org XML: http://xml.openoffice.org/ + +* Think about _`large documents` made up of multiple subdocument + files. Issues: continuity (`persistent sequences`_ above), + cross-references (`name-to-id mapping file`_ above and `targets in + other documents`_ below), splitting (`document splitting`_ below). + + When writing a book, the author probably wants to split it up into + files, perhaps one per chapter (but perhaps even more detailed). + However, we'd like to be able to have references from one chapter to + another, and have continuous numbering (pages and chapters, as + applicable). Of course, none of this is implemented yet. There has + been some thought put into some aspects; see `the "include" + directive`__ and the `Reference Merging`_ transform below. + + When I was working with SGML in Japan, we had a system where there + was a top-level coordinating file, book.sgml, which contained the + top-level structure of a book: the <book> element, containing the + book <title> and empty component elements (<preface>, <chapter>, + <appendix>, etc.), each with filename attributes pointing to the + actual source for the component. Something like this:: + + <book id="bk01"> + <title>Title of the Book</title> + <preface inrefid="pr01"></preface> + <chapter inrefid="ch01"></chapter> + <chapter inrefid="ch02"></chapter> + <chapter inrefid="ch03"></chapter> + <appendix inrefid="ap01"></appendix> + </book> + + (The "inrefid" attribute stood for "insertion reference ID".) + + The processing system would process each component separately, but + it would recognize and use the book file to coordinate chapter and + page numbering, and keep a persistent ID to (title, page number) + mapping database for cross-references. Docutils could use a similar + system for large-scale, multipart documents. + + __ ../ref/rst/directives.html#including-an-external-document-fragment + + Aahz's idea: + + First the ToC:: + + .. ToC-list:: + Introduction.txt + Objects.txt + Data.txt + Control.txt + + Then a sample use:: + + .. include:: ToC.txt + + As I said earlier in chapter :chapter:`Objects.txt`, the + reference count gets increased every time a binding is made. + + Which produces:: + + As I said earlier in chapter 2, the + reference count gets increased every time a binding is made. + + The ToC in this form doesn't even need to be references to actual + reST documents; I'm simply doing it that way for a minimum of + future-proofing, in case I do want to add the ability to pick up + references within external chapters. + + Perhaps, instead of ToC (which would overload the "contents" + directive concept already in use), we could use "manifest". A + "manifest" directive might associate local reference names with + files:: + + .. manifest:: + intro: Introduction.txt + objects: Objects.txt + data: Data.txt + control: Control.txt + + Then the sample becomes:: + + .. include:: manifest.txt + + As I said earlier in chapter :chapter:`objects`, the + reference count gets increased every time a binding is made. + +* Add support for _`multiple output files`. + +* Add testing for Docutils' front end tools? + +* Publisher: "Ordinary setup" shouldn't requre specific ordering; at + the very least, there ought to be error checking higher up in the + call chain. [Aahz] + + ``Publisher.get_settings`` requires that all components be set up + before it's called. Perhaps the I/O *objects* shouldn't be set, but + I/O *classes*. Then options are set up (``.set_options``), and + ``Publisher.set_io`` (or equivalent code) is called with source & + destination paths, creating the I/O objects. + + Perhaps I/O objects shouldn't be instantiated until required. For + split output, the Writer may be called multiple times, once for each + doctree, and each doctree should have a separate Output object (with + a different path). Is the "Builder" pattern applicable here? + +* Perhaps I/O objects should become full-fledged components (i.e. + subclasses of ``docutils.Component``, as are Readers, Parsers, and + Writers now), and thus have associated option/setting specs and + transforms. + +* Multiple file I/O suggestion from Michael Hudson: use a file-like + object or something you can iterate over to get file-like objects. + +* Add an "--input-language" option & setting? Specify a different + language module for input (bibliographic fields, directives) than + for output. The "--language" option would set both input & output + languages. + +* Auto-generate reference tables for language-dependent features? + Could be generated from the source modules. A special command-line + option could be added to Docutils front ends to do this. (Idea from + Engelbert Gruber.) + +* Enable feedback of some kind from internal decisions, such as + reporting the successful input encoding. Modify runtime settings? + System message? Simple stderr output? + +* Rationalize Writer settings (HTML/LaTeX/PEP) -- share settings. + +* Merge docs/user/latex.txt info into tools.txt and config.txt. + +* Add an "--include file" command-line option (config setting too?), + equivalent to ".. include:: file" as the first line of the doc text? + Especially useful for character entity sets, text transform specs, + boilerplate, etc. + +* Parameterize the Reporter object or class? See the `2004-02-18 + "rest checking and source path"`_ thread. + + .. _2004-02-18 "rest checking and source path": + http://thread.gmane.org/gmane.text.docutils.user/1112 + +* Add a "disable_transforms" setting? And a dummy Writer subclass + that does nothing when its .write() method is called? Would allow + for easy syntax checking. See the `2004-02-18 "rest checking and + source path"`_ thread. + +* Add a generic meta-stylesheet mechanism? An external file could + associate style names ("class" attributes) with specific elements. + Could be generalized to arbitrary output attributes; useful for HTML + & XMLs. Aahz implemented something like this in + sandbox/aahz/Effective/EffMap.py. + +* .. _classes for table cells: + + William Dode suggested that table cells be assigned "class" + attributes by columns, so that stylesheets can affect text + alignment. Unfortunately, there doesn't seem to be a way (in HTML + at least) to leverage the "colspec" elements (HTML "col" tags) by + adding classes to them. The resulting HTML is very verbose:: + + <td class="col1">111</td> + <td class="col2">222</td> + ... + + At the very least, it should be an option. People who don't use it + shouldn't be penalized by increases in their HTML file sizes. + + Table rows could also be assigned classes (like odd/even). That + would be easier to implement. + + How should it be implemented? + + * There could be writer options (column classes & row classes) with + standard values. + + * The table directive could grow some options. Something like + ":cell-classes: col1 col2 col3" (either must match the number of + columns, or repeat to fill?) and ":row-classes: odd even" (repeat + to fill; body rows only, or header rows too?). + + Probably per-table directive options are best. The "class" values + could be used by any writer, and applying such classes to all tables + in a document with writer options is too broad. + +* Add file-specific settings support to config files, like:: + + [file index.txt] + compact-lists: no + + Is this even possible? Should the criterion be the name of the + input file or the output file? + +* The "validator" support added to OptionParser is very similar to + "traits_" in SciPy_. Perhaps something could be done with them? + (Had I known about traits when I was implementing docutils.frontend, + I may have used them instead of rolling my own.) + + .. _traits: http://code.enthought.com/traits/ + .. _SciPy: http://www.scipy.org/ + +* tools/buildhtml.py: Extend the --prune option ("prune" config + setting) to accept file names (generic path) in addition to + directories (e.g. --prune=docs/user/rst/cheatsheet.txt, which should + *not* be converted to HTML). + +* Add support for _`plugins`. + +* _`Config directories`: Currently, ~/.docutils, ./docutils.conf/, & + /etc/docutils.conf are read as configuration files. Proposal: allow + ~/.docutils to be a a configuration *directory*, along with + /etc/docutils/ and ./docutils.conf/. Within these directories, + check for config.txt files. We can also have subdirectories here, + for plugins, S5 themes, components (readers/writers/parsers) etc. + + Docutils will continue to support configuration files for backwards + compatibility. + +* Add support for document decorations other than headers & footers? + For example, top/bottom/side navigation bars for web pages. Generic + decorations? + + Seems like a bad idea as long as it isn't independent from the ouput + format (for example, navigation bars are only useful for web pages). + +* docutils_update: Check for a ``Makefile`` in a directory, and run + ``make`` if found? This would allow for variant processing on + specific source files, such as running rst2s5.py instead of + rst2html.py. + +* Add a "disable table of contents" setting? The S5 writer could set + it as a default. Rationale: + + The ``contents`` (table of contents) directive must not be used + [in S5/HTML documents]. It changes the CSS class of headings + and they won't show up correctly in the screen presentation. + + -- `Easy Slide Shows With reStructuredText & S5 + <../user/slide-shows.html>`_ + + +Documentation +============= + +User Docs +--------- + +* Add a FAQ entry about using Docutils (with reStructuredText) on a + server and that it's terribly slow. See the first paragraphs in + <http://article.gmane.org/gmane.text.docutils.user/1584>. + +* Add document about what Docutils has previously been used for + (web/use-cases.txt?). + + +Developer Docs +-------------- + +* Complete `Docutils Runtime Settings <../api/runtime-settings.html>`_. + +* Improve the internal module documentation (docstrings in the code). + Specific deficiencies listed below. + + - docutils.parsers.rst.states.State.build_table: data structure + required (including StringList). + + - docutils.parsers.rst.states: more complete documentation of parser + internals. + +* docs/ref/doctree.txt: DTD element structural relationships, + semantics, and attributes. In progress; element descriptions to be + completed. + +* Document the ``pending`` elements, how they're generated and what + they do. + +* Document the transforms (perhaps in docstrings?): how they're used, + what they do, dependencies & order considerations. + +* Document the HTML classes used by html4css1.py. + +* Write an overview of the Docutils architecture, as an introduction + for developers. What connects to what, why, and how. Either update + PEP 258 (see PEPs_ below) or as a separate doc. + +* Give information about unit tests. Maybe as a howto? + +* Document the docutils.nodes APIs. + +* Complete the docs/api/publisher.txt docs. + + +How-Tos +------- + +* Creating Docutils Writers + +* Creating Docutils Readers + +* Creating Docutils Transforms + +* Creating Docutils Parsers + +* Using Docutils as a Library + + +PEPs +---- + +* Complete PEP 258 Docutils Design Specification. + + - Fill in the blanks in API details. + + - Specify the nodes.py internal data structure implementation? + + [Tibs:] Eventually we need to have direct documentation in + there on how it all hangs together - the DTD is not enough + (indeed, is it still meant to be correct? [Yes, it is. + --DG]). + +* Rework PEP 257, separating style from spec from tools, wrt Docutils? + See Doc-SIG from 2001-06-19/20. + + +Python Source Reader +==================== + +General: + +* Analyze Tony Ibbs' PySource code. + +* Analyze Doug Hellmann's HappyDoc project. + +* Investigate how POD handles literate programming. + +* Take the best ideas and integrate them into Docutils. + +Miscellaneous ideas: + +* Ask Python-dev for opinions (GvR for a pronouncement) on special + variables (__author__, __version__, etc.): convenience vs. namespace + pollution. Ask opinions on whether or not Docutils should recognize + & use them. + +* If we can detect that a comment block begins with ``##``, a la + JavaDoc, it might be useful to indicate interspersed section headers + & explanatory text in a module. For example:: + + """Module docstring.""" + + ## + # Constants + # ========= + + a = 1 + b = 2 + + ## + # Exception Classes + # ================= + + class MyException(Exception): pass + + # etc. + +* Should standalone strings also become (module/class) docstrings? + Under what conditions? We want to prevent arbitrary strings from + becomming docstrings of prior attribute assignments etc. Assume + that there must be no blank lines between attributes and attribute + docstrings? (Use lineno of NEWLINE token.) + + Triple-quotes are sometimes used for multi-line comments (such as + commenting out blocks of code). How to reconcile? + +* HappyDoc's idea of using comment blocks when there's no docstring + may be useful to get around the conflict between `additional + docstrings`_ and ``from __future__ import`` for module docstrings. + A module could begin like this:: + + #!/usr/bin/env python + # :Author: Me + # :Copyright: whatever + + """This is the public module docstring (``__doc__``).""" + + # More docs, in comments. + # All comments at the beginning of a module could be + # accumulated as docstrings. + # We can't have another docstring here, because of the + # ``__future__`` statement. + + from __future__ import division + + Using the JavaDoc convention of a doc-comment block beginning with + ``##`` is useful though. It allows doc-comments and implementation + comments. + + .. _additional docstrings: + ../peps/pep-0258.html#additional-docstrings + +* HappyDoc uses an initial comment block to set "parser configuration + values". Do the same thing for Docutils, to set runtime settings on + a per-module basis? I.e.:: + + # Docutils:setting=value + + Could be used to turn on/off function parameter comment recognition + & other marginal features. Could be used as a general mechanism to + augment config files and command-line options (but which takes + precedence?). + +* Multi-file output should be divisible at arbitrary level. + +* Support all forms of ``import`` statements: + + - ``import module``: listed as "module" + - ``import module as alias``: "alias (module)" + - ``from module import identifier``: "identifier (from module)" + - ``from module import identifier as alias``: "alias (identifier + from module)" + - ``from module import *``: "all identifiers (``*``) from module" + +* Have links to colorized Python source files from API docs? And + vice-versa: backlinks from the colorized source files to the API + docs! + +* In summaries, use the first *sentence* of a docstring if the first + line is not followed by a blank line. + + +reStructuredText Parser +======================= + +Also see the `... Or Not To Do?`__ list. + +__ rst/alternatives.html#or-not-to-do + +* Treat enumerated lists that are not arabic and consist of only one + item in a single line as ordinary paragraphs. See + <http://article.gmane.org/gmane.text.docutils.user/2635>. + +* The citation syntax could use some enhancements. See + <http://thread.gmane.org/gmane.text.docutils.user/2499> and + <http://thread.gmane.org/gmane.text.docutils.user/2443>. + +* The current list-recognition logic has too many false positives, as + in :: + + * Aorta + * V. cava superior + * V. cava inferior + + Here ``V.`` is recognized as an enumerator, which leads to + confusion. We need to find a solution that resolves such problems + without complicating the spec to much. + + See <http://thread.gmane.org/gmane.text.docutils.user/2524>. + +* Add indirect links via citation references & footnote references. + Example:: + + `Goodger (2005)`_ is helpful. + + .. _Goodger (2005): [goodger2005]_ + .. [goodger2005] citation text + + See <http://thread.gmane.org/gmane.text.docutils.user/2499>. + +* Allow multiple block quotes, only separated by attributions + (http://article.gmane.org/gmane.text.docutils.devel/2985), e.g.:: + + quote 1 + + ---Attrib 1 + + quote 2 + + ---Attrib 2 + +* Change the specification so that more punctuation is allowed + before/after inline markup start/end string + (http://article.gmane.org/gmane.text.docutils.cvs/3824). + +* Complain about bad URI characters + (http://article.gmane.org/gmane.text.docutils.user/2046) and + disallow internal whitespace + (http://article.gmane.org/gmane.text.docutils.user/2214). + +* Create ``info``-level system messages for unnecessarily + backslash-escaped characters (as in ``"\something"``, rendered as + "something") to allow checking for errors which silently slipped + through. + +* Add (functional) tests for untested roles. + +* Add test for ":figwidth: image" option of "figure" directive. (Test + code needs to check if PIL is available on the system.) + +* Add support for CJK double-width whitespace (indentation) & + punctuation characters (markup; e.g. double-width "*", "-", "+")? + +* Add motivation sections for constructs in spec. + +* Support generic hyperlink references to _`targets in other + documents`? Not in an HTML-centric way, though (it's trivial to say + ``http://www.example.com/doc#name``, and useless in non-HTML + contexts). XLink/XPointer? ``.. baseref::``? See Doc-SIG + 2001-08-10. + +* .. _adaptable file extensions: + + In target URLs, it would be useful to not explicitly specify the + file extension. If we're generating HTML, then ".html" is + appropriate; if PDF, then ".pdf"; etc. How about using ".*" to + indicate "choose the most appropriate filename extension"? For + example:: + + .. _Another Document: another.* + + What is to be done for output formats that don't *have* hyperlinks? + For example, LaTeX targeted at print. Hyperlinks may be "called + out", as footnotes with explicit URLs. + + But then there's also LaTeX targeted at PDFs, which *can* have + links. Perhaps a runtime setting for "*" could explicitly provide + the extension, defaulting to the output file's extension. + + Should the system check for existing files? No, not practical. + + Handle documents only, or objects (images, etc.) also? + + If this handles images also, how to differentiate between document + and image links? Element context (within "image")? Which image + extension to use for which document format? Again, a runtime + setting would suffice. + + This may not be just a parser issue; it may need framework support. + + Mailing list threads: `Images in both HTML and LaTeX`__ (especially + `this summary of Felix's objections`__), `more-universal links?`__, + `Output-format-sensitive link targets?`__ + + __ http://thread.gmane.org/gmane.text.docutils.user/1239 + __ http://article.gmane.org/gmane.text.docutils.user/1278 + __ http://thread.gmane.org/gmane.text.docutils.user/1915 + __ http://thread.gmane.org/gmane.text.docutils.user/2438 + +* Implement the header row separator modification to table.el. (Wrote + to Takaaki Ota & the table.el mailing list on 2001-08-12, suggesting + support for "=====" header rows. On 2001-08-17 he replied, saying + he'd put it on his to-do list, but "don't hold your breath".) + +* Fix the parser's indentation handling to conform with the stricter + definition in the spec. (Explicit markup blocks should be strict or + forgiving?) + + .. XXX What does this mean? Can you elaborate, David? + +* Make the parser modular. Allow syntax constructs to be added or + disabled at run-time. Subclassing is probably not enough because it + makes it difficult to apply multiple extensions. + +* Generalize the "doctest block" construct (which is overly + Python-centric) to other interactive sessions? "Doctest block" + could be renamed to "I/O block" or "interactive block", and each of + these could also be recognized as such by the parser: + + - Shell sessions:: + + $ cat example1.txt + A block beginning with a "$ " prompt is interpreted as a shell + session interactive block. As with Doctest blocks, the + interactive block ends with the first blank line, and wouldn't + have to be indented. + + - Root shell sessions:: + + # cat example2.txt + A block beginning with a "# " prompt is interpreted as a root + shell session (the user is or has to be logged in as root) + interactive block. Again, the block ends with a blank line. + + Other standard (and unambiguous) interactive session prompts could + easily be added (such as "> " for WinDOS). + + Tony Ibbs spoke out against this idea (2002-06-14 Doc-SIG thread + "docutils feedback"). + +* The "doctest" element should go away. The construct could simply be + a front-end to generic literal blocks. We could immediately (in + 0.4, or 0.5) remove the doctest node from the doctree, but leave the + syntax in reST. The reST parser could represent doctest blocks as + literal blocks with a class attribute. The syntax could be left in + reST for a set period of time. + +* Add support for pragma (syntax-altering) directives. + + Some pragma directives could be local-scope unless explicitly + specified as global/pragma using ":global:" options. + +* Support whitespace in angle-bracketed standalone URLs according to + Appendix E ("Recommendations for Delimiting URI in Context") of `RFC + 2396`_. + + .. _RFC 2396: http://www.rfc-editor.org/rfc/rfc2396.txt + +* Use the vertical spacing of the source text to determine the + corresponding vertical spacing of the output? + +* [From Mark Nodine] For cells in simple tables that comprise a + single line, the justification can be inferred according to the + following rules: + + 1. If the text begins at the leftmost column of the cell, + then left justification, ELSE + 2. If the text begins at the rightmost column of the cell, + then right justification, ELSE + 3. Center justification. + + The onus is on the author to make the text unambiguous by adding + blank columns as necessary. There should be a parser setting to + turn off justification-recognition (normally on would be fine). + + Decimal justification? + + All this shouldn't be done automatically. Only when it's requested + by the user, e.g. with something like this:: + + .. table:: + :auto-indent: + + (Table goes here.) + + Otherwise it will break existing documents. + +* Generate a warning or info message for paragraphs which should have + been lists, like this one:: + + 1. line one + 3. line two + +* Generalize the "target-notes" directive into a command-line option + somehow? See docutils-develop 2003-02-13. + +* Allow a "::"-only paragraph (first line, actually) to introduce a + _`literal block without a blank line`? (Idea from Paul Moore.) :: + + :: + This is a literal block + + Is indentation enough to make the separation between a paragraph + which contains just a ``::`` and the literal text unambiguous? + (There's one problem with this concession: If one wants a definition + list item which defines the term "::", we'd have to escape it.) It + would only be reasonable to apply it to "::"-only paragraphs though. + I think the blank line is visually necessary if there's text before + the "::":: + + The text in this paragraph needs separation + from the literal block following:: + This doesn't look right. + +* Add new syntax for _`nested inline markup`? Or extend the parser to + parse nested inline markup somehow? See the `collected notes + <rst/alternatives.html#nested-inline-markup>`__. + +* Drop the backticks from embedded URIs with omitted reference text? + Should the angle brackets be kept in the output or not? :: + + <file_name>_ + + Probably not worth the trouble. + +* Add _`math markup`. We should try for a general solution, that's + applicable to any output format. Using a standard, such as MathML_, + would be best. TeX (or itex_) would be acceptable as a *front-end* + to MathML. See `the culmination of a relevant discussion + <http://article.gmane.org/gmane.text.docutils.user/118>`__. + + Both a directive and an interpreted text role will be necessary (for + each markup). Directive example:: + + .. itex:: + \alpha_t(i) = P(O_1, O_2, \dots O_t, q_t = S_i \lambda) + + The same thing inline:: + + The equation in question is :itex:`\alpha_t(i) = P(O_1, O_2, + \dots O_t, q_t = S_i \lambda)`. + + .. _MathML: http://www.w3.org/TR/MathML2/ + .. _itex: http://pear.math.pitt.edu/mathzilla/itex2mmlItex.html + +* How about a syntax for alternative hyperlink behavior, such as "open + in a new window" (as in HTML's ``<a target="_blank">``)? Double + angle brackets might work for inline targets:: + + The `reference docs <<url>>`__ may be handy. + + But what about explicit targets? + + The MoinMoin wiki uses a caret ("^") at the beginning of the URL + ("^" is not a legal URI character). That could work for both inline + and explicit targets:: + + The `reference docs <^url>`__ may be handy. + + .. _name: ^url + + This may be too specific to HTML. It hasn't been requested very + often either. + +* Add an option to add URI schemes at runtime. + +* _`Segmented lists`:: + + : segment : segment : segment + : segment : segment : very long + segment + : segment : segment : segment + + The initial colon (":") can be thought of as a type of bullet + + We could even have segment titles:: + + :: title : title : title + : segment : segment : segment + : segment : segment : segment + + This would correspond well to DocBook's SegmentedList. Output could + be tabular or "name: value" pairs, as described in DocBook's docs. + +* Allow backslash-escaped colons in field names:: + + :Case Study\: Event Handling: This chapter will be dropped. + +* _`footnote spaces`: + + When supplying the command line options + --footnote-references=brackets and --use-latex-footnotes with the + LaTeX writer (which might very well happen when using configuration + files), the spaces in front of footnote references aren't trimmed. + +* Enable grid _`tables inside XML comments`, where "--" ends comments. + I see three implementation possibilities: + + 1. Make the table syntax characters into "table" directive options. + This is the most flexible but most difficult, and we probably + don't need that much flexibility. + + 2. Substitute "~" for "-" with a specialized directive option + (e.g. ":tildes:"). + + 3. Make the standard table syntax recognize "~" as well as "-", even + without a directive option. Individual tables would have to be + internally consistent. + + Directive options are preferable to configuration settings, because + tables are document-specific. A pragma directive would be another + approach, to set the syntax once for a whole document. + + In the meantime, the list-table_ directive is a good replacement for + grid tables inside XML comments. + + .. _list-table: ../ref/rst/directives.html#list-table + +* Generalize docinfo contents (bibliographic fields): remove specific + fields, and have only a single generic "field"? + + +Directives +---------- + +Directives below are often referred to as "module.directive", the +directive function. The "module." is not part of the directive name +when used in a document. + +* Make the _`directive interface` object-oriented + (http://article.gmane.org/gmane.text.docutils.user/1871). + +* Allow for field lists in list tables. See + <http://thread.gmane.org/gmane.text.docutils.devel/3392>. + +* .. _unify tables: + + Unify table implementations and unify options of table directives + (http://article.gmane.org/gmane.text.docutils.user/1857). + +* Allow directives to be added at run-time? + +* Use the language module for directive option names? + +* Add "substitution_only" and "substitution_ok" function attributes, + and automate context checking? + +* Change directive functions to directive classes? Superclass' + ``__init__()`` could handle all the bookkeeping. + +* Implement options or features on existing directives: + + - Add a "name" option to directives, to set an author-supplied + identifier? + + - All directives that produce titled elements should grow implicit + reference names based on the titles. + + - Allow the _`:trim:` option for all directives when they occur in a + substitution definition, not only the unicode_ directive. + + .. _unicode: ../ref/rst/directives.html#unicode-character-codes + + - _`images.figure`: "title" and "number", to indicate a formal + figure? + + - _`parts.sectnum`: "local"?, "refnum" + + A "local" option could enable numbering for sections from a + certain point down, and sections in the rest of the document are + not numbered. For example, a reference section of a manual might + be numbered, but not the rest. OTOH, an all-or-nothing approach + would probably be enough. + + The "sectnum" directive should be usable multiple times in a + single document. For example, in a long document with "chapter" + and "appendix" sections, there could be a second "sectnum" before + the first appendix, changing the sequence used (from 1,2,3... to + A,B,C...). This is where the "local" concept comes in. This part + of the implementation can be left for later. + + A "refnum" option (better name?) would insert reference names + (targets) consisting of the reference number. Then a URL could be + of the form ``http://host/document.html#2.5`` (or "2-5"?). Allow + internal references by number? Allow name-based *and* + number-based ids at the same time, or only one or the other (which + would the table of contents use)? Usage issue: altering the + section structure of a document could render hyperlinks invalid. + + - _`parts.contents`: Add a "suppress" or "prune" option? It would + suppress contents display for sections in a branch from that point + down. Or a new directive, like "prune-contents"? + + Add an option to include topics in the TOC? Another for sidebars? + The "topic" directive could have a "contents" option, or the + "contents" directive" could have an "include-topics" option. See + docutils-develop 2003-01-29. + + - _`parts.header` & _`parts.footer`: Support multiple, named headers + & footers? For example, separate headers & footers for odd, even, + and the first page of a document. + + This may be too specific to output formats which have a notion of + "pages". + + - _`misc.class`: + + - Add a ``:parent:`` option for setting the parent's class + (http://article.gmane.org/gmane.text.docutils.devel/3165). + + - _`misc.include`: + + - Option to select a range of lines? + + - Option to label lines? + + - How about an environment variable, say RSTINCLUDEPATH or + RSTPATH, for standard includes (as in ``.. include:: <name>``)? + This could be combined with a setting/option to allow + user-defined include directories. + + - Add support for inclusion by URL? :: + + .. include:: + :url: http://www.example.org/inclusion.txt + + - _`misc.raw`: add a "destination" option to the "raw" directive? :: + + .. raw:: html + :destination: head + + <link ...> + + It needs thought & discussion though, to come up with a consistent + set of destination labels and consistent behavior. + + And placing HTML code inside the <head> element of an HTML + document is rather the job of a templating system. + + - _`body.sidebar`: Allow internal section structure? Adornment + styles would be independent of the main document. + + That is really complicated, however, and the document model + greatly benefits from its simplicity. + +* Implement directives. Each of the list items below begins with an + identifier of the form, "module_name.directive_function_name". The + directive name itself could be the same as the + directive_function_name, or it could differ. + + - _`html.imagemap` + + It has the disadvantage that it's only easily implementable for + HTML, so it's specific to one output format. + + (For non-HTML writers, the imagemap would have to be replaced with + the image only.) + + - _`parts.endnotes` (or "footnotes"): See `Footnote & Citation Gathering`_. + + - _`parts.citations`: See `Footnote & Citation Gathering`_. + + - _`misc.language`: Specify (= change) the language of a document at + parse time. + + - _`misc.settings`: Set any(?) Docutils runtime setting from within + a document? Needs much thought and discussion. + + - _`misc.gather`: Gather (move, or copy) all instances of a specific + element. A generalization of the "endnotes" & "citations" ideas. + + - Add a custom "directive" directive, equivalent to "role"? For + example:: + + .. directive:: incr + + .. class:: incremental + + .. incr:: + + "``.. incr::``" above is equivalent to "``.. class:: incremental``". + + Another example:: + + .. directive:: printed-links + + .. topic:: Links + :class: print-block + + .. target-notes:: + :class: print-inline + + This acts like macros. The directive contents will have to be + evaluated when referenced, not when defined. + + * Needs a better name? "Macro", "substitution"? + * What to do with directive arguments & options when the + macro/directive is referenced? + + - .. _conditional directives: + + Docutils already has the ability to say "use this content for + Writer X" (via the "raw" directive), but it doesn't have the + ability to say "use this content for any Writer other than X". It + wouldn't be difficult to add this ability though. + + My first idea would be to add a set of conditional directives. + Let's call them "writer-is" and "writer-is-not" for discussion + purposes (don't worry about implemention details). We might + have:: + + .. writer-is:: text-only + + :: + + +----------+ + | SNMP | + +----------+ + | UDP | + +----------+ + | IP | + +----------+ + | Ethernet | + +----------+ + + .. writer-is:: pdf + + .. figure:: protocol_stack.eps + + .. writer-is-not:: text-only pdf + + .. figure:: protocol_stack.png + + This could be an interface to the Filter transform + (docutils.transforms.components.Filter). + + The ideas in `adaptable file extensions`_ above may also be + applicable here. + + SVG's "switch" statement may provide inspiration. + + Here's an example of a directive that could produce multiple + outputs (*both* raw troff pass-through *and* a GIF, for example) + and allow the Writer to select. :: + + .. eqn:: + + .EQ + delim %% + .EN + %sum from i=o to inf c sup i~=~lim from {m -> inf} + sum from i=0 to m sup i% + .EQ + delim off + .EN + + - _`body.example`: Examples; suggested by Simon Hefti. Semantics as + per Docbook's "example"; admonition-style, numbered, reference, + with a caption/title. + + - _`body.index`: Index targets. + + See `Index Entries & Indexes + <./rst/alternatives.html#index-entries-indexes>`__. + + - _`body.literal`: Literal block, possibly "formal" (see `object + numbering and object references`_ above). Possible options: + + - "highlight" a range of lines + + - include only a specified range of lines + + - "number" or "line-numbers" + + - "styled" could indicate that the directive should check for + style comments at the end of lines to indicate styling or + markup. + + Specific derivatives (i.e., a "python-interactive" directive) + could interpret style based on cues, like the ">>> " prompt and + "input()"/"raw_input()" calls. + + See docutils-users 2003-03-03. + + - _`body.listing`: Code listing with title (to be numbered + eventually), equivalent of "figure" and "table" directives. + + - _`colorize.python`: Colorize Python code. Fine for HTML output, + but what about other formats? Revert to a literal block? Do we + need some kind of "alternate" mechanism? Perhaps use a "pending" + transform, which could switch its output based on the "format" in + use. Use a factory function "transformFF()" which returns either + "HTMLTransform()" instance or "GenericTransform" instance? + + If we take a Python-to-HTML pretty-printer and make it output a + Docutils internal doctree (as per nodes.py) instead of HTML, then + each output format's stylesheet (or equivalent) mechanism could + take care of the rest. The pretty-printer code could turn this + doctree fragment:: + + <literal_block xml:space="preserve"> + print 'This is Python code.' + for i in range(10): + print i + </literal_block> + + into something like this ("</>" is end-tag shorthand):: + + <literal_block xml:space="preserve" class="python"> + <keyword>print</> <string>'This is Python code.'</> + <keyword>for</> <identifier>i</> <keyword + >in</> <expression>range(10)</>: + <keyword>print</> <expression>i</> + </literal_block> + + But I'm leaning toward adding a single new general-purpose + element, "phrase", equivalent to HTML's <span>. Here's the + example rewritten using the generic "phrase":: + + <literal_block xml:space="preserve" class="python"> + <phrase class="keyword">print</> <phrase + class="string">'This is Python code.'</> + <phrase class="keyword">for</> <phrase + class="identifier">i</> <phrase class="keyword">in</> <phrase + class="expression">range(10)</>: + <phrase class="keyword">print</> <phrase + class="expression">i</> + </literal_block> + + It's more verbose but more easily extensible and more appropriate + for the case at hand. It allows us to edit style sheets to add + support for new formats, not the Docutils code itself. + + Perhaps a single directive with a format parameter would be + better:: + + .. colorize:: python + + print 'This is Python code.' + for i in range(10): + print i + + But directives can have synonyms for convenience. "format:: + python" was suggested, but "format" seems too generic. + + - _`pysource.usage`: Extract a usage message from the program, + either by running it at the command line with a ``--help`` option + or through an exposed API. [Suggestion for Optik.] + + +Interpreted Text +---------------- + +Interpreted text is entirely a reStructuredText markup construct, a +way to get around built-in limitations of the medium. Some roles are +intended to introduce new doctree elements, such as "title-reference". +Others are merely convenience features, like "RFC". + +All supported interpreted text roles must already be known to the +Parser when they are encountered in a document. Whether pre-defined +in core/client code, or in the document, doesn't matter; the roles +just need to have already been declared. Adding a new role may +involve adding a new element to the DTD and may require extensive +support, therefore such additions should be well thought-out. There +should be a limited number of roles. + +The only place where no limit is placed on variation is at the start, +at the Reader/Parser interface. Transforms are inserted by the Reader +into the Transformer's queue, where non-standard elements are +converted. Once past the Transformer, no variation from the standard +Docutils doctree is possible. + +An example is the Python Source Reader, which will use interpreted +text extensively. The default role will be "Python identifier", which +will be further interpreted by namespace context into <class>, +<method>, <module>, <attribute>, etc. elements (see pysource.dtd), +which will be transformed into standard hyperlink references, which +will be processed by the various Writers. No Writer will need to have +any knowledge of the Python-Reader origin of these elements. + +* Add explicit interpreted text roles for the rest of the implicit + inline markup constructs: named-reference, anonymous-reference, + footnote-reference, citation-reference, substitution-reference, + target, uri-reference (& synonyms). + +* Add directives for each role as well? This would allow indirect + nested markup:: + + This text contains |nested inline markup|. + + .. |nested inline markup| emphasis:: + + nested ``inline`` markup + +* Implement roles: + + - "_`raw-wrapped`" (or "_`raw-wrap`"): Base role to wrap raw text + around role contents. + + For example, the following reStructuredText source ... :: + + .. role:: red(raw-formatting) + :prefix: + :html: <font color="red"> + :latex: {\color{red} + :suffix: + :html: </font> + :latex: } + + colored :red:`text` + + ... will yield the following document fragment:: + + <paragraph> + colored + <inline classes="red"> + <raw format="html"> + <font color="red"> + <raw format="latex"> + {\color{red} + <inline classes="red"> + text + <raw format="html"> + </font> + <raw format="latex"> + } + + Possibly without the intermediate "inline" node. + + - "acronym" and "abbreviation": Associate the full text with a short + form. Jason Diamond's description: + + I want to translate ```reST`:acronym:`` into ``<acronym + title='reStructuredText'>reST</acronym>``. The value of the + title attribute has to be defined out-of-band since you can't + parameterize interpreted text. Right now I have them in a + separate file but I'm experimenting with creating a directive + that will use some form of reST syntax to let you define them. + + Should Docutils complain about undefined acronyms or + abbreviations? + + What to do if there are multiple definitions? How to + differentiate between CSS (Content Scrambling System) and CSS + (Cascading Style Sheets) in a single document? David Priest + responds, + + The short answer is: you don't. Anyone who did such a thing + would be writing very poor documentation indeed. (Though I + note that `somewhere else in the docs`__, there's mention of + allowing replacement text to be associated with the + abbreviation. That takes care of the duplicate + acronyms/abbreviations problem, though a writer would be + foolish to ever need it.) + + __ `inline parameter syntax`_ + + How to define the full text? Possibilities: + + 1. With a directive and a definition list? :: + + .. acronyms:: + + reST + reStructuredText + DPS + Docstring Processing System + + Would this list remain in the document as a glossary, or would + it simply build an internal lookup table? A "glossary" + directive could be used to make the intention clear. + Acronyms/abbreviations and glossaries could work together. + + Then again, a glossary could be formed by gathering individual + definitions from around the document. + + 2. Some kind of `inline parameter syntax`_? :: + + `reST <reStructuredText>`:acronym: is `WYSIWYG <what you + see is what you get>`:acronym: plaintext markup. + + .. _inline parameter syntax: + rst/alternatives.html#parameterized-interpreted-text + + 3. A combination of 1 & 2? + + The multiple definitions issue could be handled by establishing + rules of priority. For example, directive-based lookup tables + have highest priority, followed by the first inline definition. + Multiple definitions in directive-based lookup tables would + trigger warnings, similar to the rules of `implicit hyperlink + targets`__. + + __ ../ref/rst/restructuredtext.html#implicit-hyperlink-targets + + 4. Using substitutions? :: + + .. |reST| acronym:: reST + :text: reStructuredText + + What do we do for other formats than HTML which do not support + tool tips? Put the full text in parentheses? + + - "figure", "table", "listing", "chapter", "page", etc: See `object + numbering and object references`_ above. + + - "glossary-term": This would establish a link to a glossary. It + would require an associated "glossary-entry" directive, whose + contents could be a definition list:: + + .. glossary-entry:: + + term1 + definition1 + term2 + definition2 + + This would allow entries to be defined anywhere in the document, + and collected (via a "glossary" directive perhaps) at one point. + + +Unimplemented Transforms +======================== + +* _`Footnote & Citation Gathering` + + Collect and move footnotes & citations to the end of a document. + (Separate transforms.) + +* _`Reference Merging` + + When merging two or more subdocuments (such as docstrings), + conflicting references may need to be resolved. There may be: + + * duplicate reference and/or substitution names that need to be made + unique; and/or + * duplicate footnote numbers that need to be renumbered. + + Should this be done before or after reference-resolving transforms + are applied? What about references from within one subdocument to + inside another? + +* _`Document Splitting` + + If the processed document is written to multiple files (possibly in + a directory tree), it will need to be split up. Internal references + will have to be adjusted. + + (HTML only? Initially, yes. Eventually, anything should be + splittable.) + + Ideas: + + - Insert a "destination" attribute into the root element of each + split-out document, containing the path/filename. The Output + object or Writer will recognize this attribute and split out the + files accordingly. Must allow for common headers & footers, + prev/next, breadcrumbs, etc. + + - Transform a single-root document into a document containing + multiple subdocuments, recursively. The content model of the + "document" element would have to change to:: + + <!ELEMENT document + ( (title, subtitle?)?, + decoration?, + (docinfo, transition?)?, + %structure.model;, + document* )> + + (I.e., add the last line -- 0 or more document elements.) + + Let's look at the case of hierarchical (directories and files) + HTML output. Each document element containing further document + elements would correspond to a directory (with an index.html file + for the content preceding the subdocuments). Each document + element containing no subdocuments (i.e., structure model elements + only) corresponds to a concrete file with no directory. + + The natural transform would be to map sections to subdocuments, + but possibly only a given number of levels deep. + +* _`Navigation` + + If a document is split up, each segment will need navigation links: + parent, children (small TOC), previous (preorder), next (preorder). + Part of `Document Splitting`_? + +* _`List of System Messages` + + The ``system_message`` elements are inserted into the document tree, + adjacent to the problems themselves where possible. Some (those + generated post-parse) are kept until later, in + ``document.messages``, and added as a special final section, + "Docutils System Messages". + + Docutils could be made to generate hyperlinks to all known + system_messages and add them to the document, perhaps to the end of + the "Docutils System Messages" section. + + Fred L. Drake, Jr. wrote: + + I'd like to propose that both parse- and transformation-time + messages are included in the "Docutils System Messages" section. + If there are no objections, I can make the change. + + The advantage of the current way of doing things is that parse-time + system messages don't require a transform; they're already in the + document. This is valuable for testing (unit tests, + tools/quicktest.py). So if we do decide to make a change, I think + the insertion of parse-time system messages ought to remain as-is + and the Messages transform ought to move all parse-time system + messages (remove from their originally inserted positions, insert in + System Messages section). + +* _`Index Generation` + + +HTML Writer +=========== + +* Add support for _`multiple stylesheets`. See + <http://thread.gmane.org/gmane.text.docutils.cvs/4336>. + +* Idea for field-list rendering: hanging indent:: + + Field name (bold): First paragraph of field body begins + with the field name inline. + + If the first item of a field body is not a paragraph, + it would begin on the following line. + +* Add more support for <link> elements, especially for navigation + bars. + + The framework does not have a notion of document relationships, so + probably raw.destination_ should be used. + + We'll have framework support for document relationships when support + for `multiple output files`_ is added. The HTML writer could + automatically generate <link> elements then. + + .. _raw.destination: misc.raw_ + +* Base list compaction on the spacing of source list? Would require + parser support. (Idea: fantasai, 16 Dec 2002, doc-sig.) + +* Add a tool tip ("title" attribute?) to footnote back-links + identifying them as such. Text in Docutils language module. + + +PEP/HTML Writer +=============== + +* Remove the generic style information (duplicated from html4css1.css) + from pep.css to avoid redundancy. + + We need support for `multiple stylesheets`_ first, though. + + +LaTeX writer +============ + +* Add an ``--embed-stylesheet`` (and ``--link-stylesheet``) option. + + +HTML SlideShow Writer +===================== + +Add a Writer for presentations, derivative of the HTML Writer. Given +an input document containing one section per slide, the output would +consist of a master document for the speaker, and a slide file (or set +of filess, one (or more) for each slide). Each slide would contain +the slide text (large, stylesheet-controlled) and images, plus "next" +and "previous" links in consistent places. The speaker's master +document would contain a small version of the slide text with +speaker's notes interspersed. The master document could use +``target="whatever"`` to direct links to a separate window on a second +monitor (e.g., a projector). + +Ideas: + +* Base the output on |S5|_. I discovered |S5| a few weeks before it + appeared on Slashdot, after writing most of this section. It turns + out that |S5| does most of what I wanted. + + Chris Liechti has `integrated S5 with the HTML writer + <http://homepage.hispeed.ch/py430/python/index.html#rst2s5>`__. + + .. |S5| replace:: S\ :sup:`5` + .. _S5: http://www.meyerweb.com/eric/tools/s5/ + +Below, "[S5]" indicates that |S5| already implements the feature or +may implement all or part of the feature. "[S5 1.1]" indicates that +|S5| version 1.1 implements the feature (a preview of the 1.1 beta is +available in the `S5 testbed`_). + +.. _S5 testbed: http://meyerweb.com/eric/tools/s5/testbed/ + +Features & issues: + +* [S5 1.1] Incremental slides, where each slide adds to the one before + (ticking off items in a list, delaying display of later items). The + speaker's master document would list each transition in the TOC and + provide links in the content. + + * Use transitions to separate stages. Problem with transitions is + that they can't be used everywhere -- not, for example, within a + list (see the example below). + + * Use a special directive to separate stages. Possible names: + pause, delay, break, cut, continue, suspend, hold, stay, stop. + Should the directive be available in all contexts (and ineffectual + in all but SlideShow context), or added at runtime by the + SlideShow Writer? Probably such a "pause" directive should only + be available for slide shows; slide shows are too much of a + special case to justify adding a directive (and node?) to the + core. + + The directive could accept text content, which would be rendered + while paused but would disappear when the slide is continued (the + text could also be a link to the next slide). In the speaker's + master document, the text "paused:" could appear, prefixed to the + directive text. + + * Use a special directive or class to declare incremental content. + This works best with the S5 ideas. For example:: + + Slide Title + =========== + + .. incremental:: + + * item one + * item two + * item three + + Add an option to make all bullet lists implicitly incremental? + +* Speaker's notes -- how to intersperse? Could use reST comments + (".."), but make them visible in the speaker's master document. If + structure is necessary, we could use a "comment" directive (to avoid + nonsensical DTD changes, the "comment" directive could produce an + untitled topic element). + + The speaker's notes could (should?) be separate from S5's handout + content. + +* The speaker's master document could use frames for easy navigation: + TOC on the left, content on the right. + + - It would be nice if clicking in the TOC frame simultaneously + linked to both the speaker's notes frame and to the slide window, + synchronizing both. Needs JavaScript? + + - TOC would have to be tightly formatted -- minimal indentation. + + - TOC auto-generated, as in the PEP Reader. (What if there already + is a "contents" directive in the document?) + + - There could be another frame on the left (top-left or bottom-left) + containing a single "Next" link, always pointing to the next slide + (synchronized, of course). Also "Previous" link? FF/Rew go to + the beginning of the next/current parent section? First/Last + also? Tape-player-style buttons like ``|<< << < > >> >>|``? + +* [S5] Need to support templating of some kind, for uniform slide + layout. S5 handles this via CSS. + + Build in support for limited features? E.g., top/bottom or + left/right banners, images on each page, background color and/or + image, etc. + +* [S5?] One layout for all slides, or allow some variation? + + While S5 seems to support only one style per HTML file, it's + pretty easy to split a presentation in different files and + insert a hyperlink to the last slide of the first part and load + the second part by a click on it. + + -- Chris Liechti + +* For nested sections, do we show the section's ancestry on each + slide? Optional? No -- leave the implementation to someone who + wants it. + +* [S5] Stylesheets for slides: + + - Tweaked for different resolutions, 1024x768 etc. + - Some layout elements have fixed positions. + - Text must be quite large. + - Allow 10 lines of text per slide? 15? + - Title styles vary by level, but not so much? + +* [not required with S5.] Need a transform to number slides for + output filenames?, and for hyperlinks? + +* Directive to begin a new, untitled (blank) slide? + +* Directive to begin a new slide, continuation, using the same title + as the previous slide? (Unnecessary?) + +* Have a timeout on incremental items, so the colour goes away after 1 + second. + +Here's an example that I was hoping to show at PyCon DC 2005:: + + ======================== + The Docutils SlideShow + ======================== + + Welcome To The Docutils SlideShow! + ================================== + + .. pause:: + + David Goodger + + goodger@python.org + + http://python.net/~goodger + + .. (introduce yourself) + + Hi, I'm David Goodger from Montreal, Canada. + + I've been working on Docutils since 2000. + Time flies! + + .. pause:: + + Docutils + + http://docutils.sourceforge.net + + .. I also volunteer as a Python Enhancement Proposal (or PEP) + editor. + + .. SlideShow is a new feature of Docutils. This presentation was + written using the Docutils SlideShow system. The slides you + are seeing are HTML, rendered by a standard Mozilla Firefox + browser. + + + The Docutils SlideShow System + ============================= + + .. The Docutils SlideShow System provides + + Easy and open presentations. + + + Features + ======== + + * reStructuredText-based input files. + + .. reStructuredText is a what-you-see-is-what-you-get + plaintext format. Easy to read & write, non-proprietary, + editable in your favourite text editor. + + .. Parsers for other markup languages can be added to Docutils. + In the future, I hope some are. + + .. pause:: ... + + * Stylesheet-driven HTML output. + + .. The format of all elements of the output slides are + controlled by CSS (cascading stylesheets). + + .. pause:: ... + + * Works with any modern browser. + + .. that supports CSS, frames, and JavaScript. + Tested with Mozilla Firefox. + + .. pause:: ... + + * Works on any OS. + + + Etc. + ==== + + That's as far as I got, but you get the idea... + + +Front-End Tools +=============== + +* What about if we don't know which Reader and/or Writer we are + going to use? If the Reader/Writer is specified on the + command-line? (Will this ever happen?) + + Perhaps have different types of front ends: + + a) _`Fully qualified`: Reader and Writer are hard-coded into the + front end (e.g. ``pep2html [options]``, ``pysource2pdf + [options]``). + + b) _`Partially qualified`: Reader is hard-coded, and the Writer is + specified a sub-command (e.g. ``pep2 html [options]``, + ``pysource2 pdf [options]``). The Writer is known before option + processing happens, allowing the OptionParser to be built + dynamically. Alternatively, the Writer could be hard-coded and + the Reader specified as a sub-command (e.g. ``htmlfrom pep + [options]``). + + c) _`Unqualified`: Reader and Writer are specified as subcommands + (e.g. ``publish pep html [options]``, ``publish pysource pdf + [options]``). A single front end would be sufficient, but + probably only useful for testing purposes. + + d) _`Dynamic`: Reader and/or Writer are specified by options, with + defaults if unspecified (e.g. ``publish --writer pdf + [options]``). Is this possible? The option parser would have + to be told about new options it needs to handle, on the fly. + Component-specific options would have to be specified *after* + the component-specifying option. + + Allow common options before subcommands, as in CVS? Or group all + options together? In the case of the `fully qualified`_ + front ends, all the options will have to be grouped together + anyway, so there's no advantage (we can't use it to avoid + conflicts) to splitting common and component-specific options + apart. + +* Parameterize help text & defaults somehow? Perhaps a callback? Or + initialize ``settings_spec`` in ``__init__`` or ``init_options``? + +* Disable common options that don't apply? + +* Add ``--section-numbering`` command line option. The "sectnum" + directive should override the ``--no-section-numbering`` command + line option then. + +* Create a single dynamic_ or unqualified_ front end that can be + installed? + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/website.txt b/docs/dev/website.txt new file mode 100644 index 000000000..193e9c0f2 --- /dev/null +++ b/docs/dev/website.txt @@ -0,0 +1,46 @@ +=================== + Docutils Web Site +=================== + +:Author: David Goodger; open to all Docutils developers +:Contact: goodger@python.org +:Date: $Date$ +:Revision: $Revision$ +:Copyright: This document has been placed in the public domain. + +The Docutils web site, <http://docutils.sourceforge.net/>, is +maintained automatically by the ``docutils-update`` script, run as an +hourly cron job on shell.berlios.de (by user "felixwiemann"). The +script will process any .txt file which is newer than the +corresponding .html file in the project's web directory on +shell.berlios.de (``/home/groups/docutils/htdocs/aux/htdocs/``) and +upload the changes to the web site at SourceForge. For a new .txt +file, just SSH to ``<username>@shell.berlios.de`` and :: + + cd /home/groups/docutils/htdocs/aux/htdocs/ + touch filename.html + chmod g+w filename.html + sleep 1 + touch filename.txt + +The script will take care of the rest within an hour. Thereafter +whenever the .txt file is modified (checked in to SVN), the .html will +be regenerated automatically. + +After adding directories to SVN, allow the script to run once to +create the directories in the filesystem before preparing for HTML +processing as described above. + +The docutils-update__ script is located at +``sandbox/infrastructure/docutils-update``. + +__ http://docutils.sf.net/sandbox/infrastructure/docutils-update + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: |