diff options
author | (no author) <(no author)@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2002-05-01 02:53:07 +0000 |
---|---|---|
committer | (no author) <(no author)@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2002-05-01 02:53:07 +0000 |
commit | 9e0c8b3997b83945c58c5ede77278f205b97d03e (patch) | |
tree | 5fdea7c382ce0fdcc49bc2bc0f9e8faaa7f4effc /docutils/docs/dev | |
parent | 6a5511292419427c9a4c9258b71c47d072a02878 (diff) | |
download | docutils-tibs.tar.gz |
This commit was manufactured by cvs2svn to create branch 'tibs'.tibs
git-svn-id: http://svn.code.sf.net/p/docutils/code/branches/tibs@63 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
Diffstat (limited to 'docutils/docs/dev')
-rw-r--r-- | docutils/docs/dev/pysource.dtd | 212 | ||||
-rw-r--r-- | docutils/docs/dev/pysource.txt | 173 | ||||
-rw-r--r-- | docutils/docs/dev/rst/alternatives.txt | 1334 | ||||
-rw-r--r-- | docutils/docs/dev/rst/problems.txt | 770 | ||||
-rw-r--r-- | docutils/docs/dev/semantics.txt | 47 | ||||
-rw-r--r-- | docutils/docs/dev/todo.txt | 388 |
6 files changed, 0 insertions, 2924 deletions
diff --git a/docutils/docs/dev/pysource.dtd b/docutils/docs/dev/pysource.dtd deleted file mode 100644 index 463844a68..000000000 --- a/docutils/docs/dev/pysource.dtd +++ /dev/null @@ -1,212 +0,0 @@ -<!-- -====================================================================== - Docutils Python Source DTD -====================================================================== -:Author: David Goodger -:Contact: goodger@users.sourceforge.net -:Revision: $Revision$ -:Date: $Date$ -:Copyright: This DTD has been placed in the public domain. -:Filename: pysource.dtd - -This DTD (document type definition) extends the Generic DTD (see -below). - -More information about this DTD and the Docutils project can be found -at http://docutils.sourceforge.net/. The latest version of this DTD -is available from http://docutils.sourceforge.net/spec/pysource.dtd. - -The proposed formal public identifier for this DTD is:: - - +//IDN python.org//DTD Docutils Python Source//EN//XML ---> - - -<!-- -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Parameter Entity Overrides -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---> - -<!ENTITY % additional.structural.elements - " | package_section | module_section | class_section - | method_section | function_section | module_attribute_section - | class_attribute_section | instance_attribute_section "> - -<!ENTITY % additional.inline.elements - " | package | module | class | method | function - | variable | parameter | type - | module_attribute | class_attribute | instance_attribute - | exception_class | warning_class "> - - -<!-- -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Generic DTD -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -This DTD extends the Docutils Generic DTD, available from -http://docutils.sourceforge.net/spec/docutils.dtd. ---> - -<!ENTITY % docutils PUBLIC - "+//IDN python.org//DTD Docutils Generic//EN//XML" - "docutils.dtd"> -%docutils; - - -<!-- -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Additional Structural Elements -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---> - -<!ELEMENT package_section (package, %structure.model;)> -<!ATTLIST package_section %basic.atts;> - -<!ELEMENT module_section (module, %structure.model;)> -<!ATTLIST module_section %basic.atts;> - -<!ELEMENT class_section - (class, inheritance_list?, parameter_list?, %structure.model;)> -<!ATTLIST class_section %basic.atts;> - -<!ELEMENT method_section (method, parameter_list?, %structure.model;)> -<!ATTLIST method_section %basic.atts;> - -<!ELEMENT function_section (function, parameter_list?, %structure.model;)> -<!ATTLIST function_section %basic.atts;> - -<!ELEMENT module_attribute_section - (module_attribute, initial_value?, %structure.model;)> -<!ATTLIST module_attribute_section %basic.atts;> - -<!ELEMENT class_attribute_section - (class_attribute, initial_value?, %structure.model;)> -<!ATTLIST class_attribute_section %basic.atts;> - -<!ELEMENT instance_attribute_section - (instance_attribute, initial_value?, %structure.model;)> -<!ATTLIST instance_attribute_section %basic.atts;> - -<!ELEMENT inheritance_list (class+)> -<!ATTLIST inheritance_list %basic.atts;> - -<!ELEMENT parameter_list - ((parameter_item+, optional_parameters*) | optional_parameters+)> -<!ATTLIST parameter_list %basic.atts;> - -<!ELEMENT parameter_item ((parameter | parameter_tuple), parameter_default?)> -<!ATTLIST parameter_item %basic.atts;> - -<!ELEMENT optional_parameters (parameter_item+, optional_parameters*)> -<!ATTLIST optional_parameters %basic.atts;> - -<!ELEMENT parameter_tuple (parameter | parameter_tuple)+> -<!ATTLIST parameter_tuple %basic.atts;> - -<!ELEMENT parameter_default (#PCDATA)> -<!ATTLIST parameter_default %basic.atts;> - -<!ELEMENT initial_value (#PCDATA)> -<!ATTLIST initial_value %basic.atts;> - - -<!-- -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Additional Inline Elements -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---> - -<!-- Also used as the `package_section` identifier/title. --> -<!ELEMENT package (#PCDATA)> -<!ATTLIST package - %basic.atts; - %link.atts;> - -<!-- Also used as the `module_section` identifier/title. --> -<!ELEMENT module (#PCDATA)> -<!ATTLIST module - %basic.atts; - %link.atts;> - -<!-- -Also used as the `class_section` identifier/title, and in the -`inheritance` element. ---> -<!ELEMENT class (#PCDATA)> -<!ATTLIST class - %basic.atts; - %link.atts;> - -<!-- Also used as the `method_section` identifier/title. --> -<!ELEMENT method (#PCDATA)> -<!ATTLIST method - %basic.atts; - %link.atts;> - -<!-- Also used as the `function_section` identifier/title. --> -<!ELEMENT function (#PCDATA)> -<!ATTLIST function - %basic.atts; - %link.atts;> - -<!-- -Also used as the `module_attribute_section` identifier/title. A module -attribute is an exported module-level global variable. ---> -<!ELEMENT module_attribute (#PCDATA)> -<!ATTLIST module_attribute - %basic.atts; - %link.atts;> - -<!-- Also used as the `class_attribute_section` identifier/title. --> -<!ELEMENT class_attribute (#PCDATA)> -<!ATTLIST class_attribute - %basic.atts; - %link.atts;> - -<!-- -Also used as the `instance_attribute_section` identifier/title. ---> -<!ELEMENT instance_attribute (#PCDATA)> -<!ATTLIST instance_attribute - %basic.atts; - %link.atts;> - -<!ELEMENT variable (#PCDATA)> -<!ATTLIST variable - %basic.atts; - %link.atts;> - -<!-- Also used in `parameter_list`. --> -<!ELEMENT parameter (#PCDATA)> -<!ATTLIST parameter - %basic.atts; - %link.atts; - excess_positional %yesorno; #IMPLIED - excess_keyword %yesorno; #IMPLIED> - -<!ELEMENT type (#PCDATA)> -<!ATTLIST type - %basic.atts; - %link.atts;> - -<!ELEMENT exception_class (#PCDATA)> -<!ATTLIST exception_class - %basic.atts; - %link.atts;> - -<!ELEMENT warning_class (#PCDATA)> -<!ATTLIST warning_class - %basic.atts; - %link.atts;> - - -<!-- -Local Variables: -mode: sgml -indent-tabs-mode: nil -fill-column: 70 -End: ---> diff --git a/docutils/docs/dev/pysource.txt b/docutils/docs/dev/pysource.txt deleted file mode 100644 index a75fd1d4d..000000000 --- a/docutils/docs/dev/pysource.txt +++ /dev/null @@ -1,173 +0,0 @@ -====================== - Python Source Reader -====================== -:Author: David Goodger -:Contact: goodger@users.sourceforge.net -:Revision: $Revision$ -:Date: $Date$ - -This document explores issues around extracting and processing -docstrings from Python modules. - -For definitive element hierarchy details, see the "Python Plaintext -Document Interface DTD" XML document type definition, pysource.dtd_ -(which modifies the generic docutils.dtd_). Descriptions below list -'DTD elements' (XML 'generic identifiers' or tag names) corresponding -to syntax constructs. - - -.. contents:: - - -Python Source Reader -==================== - -The Python Source Reader ("PySource") model that's evolving in my mind -goes something like this: - -1. Extract the docstring/namespace [#]_ tree from the module(s) and/or - package(s). - - .. [#] See `Docstring Extractor`_ above. - -2. Run the parser on each docstring in turn, producing a forest of - doctrees (per nodes.py). - -3. Join the docstring trees together into a single tree, running - transforms: - - - merge hyperlinks - - merge namespaces - - create various sections like "Module Attributes", "Functions", - "Classes", "Class Attributes", etc.; see spec/ppdi.dtd - - convert the above special sections to ordinary doctree nodes - -4. Run transforms on the combined doctree. Examples: resolving - cross-references/hyperlinks (including interpreted text on Python - identifiers); footnote auto-numbering; first field list -> - bibliographic elements. - - (Or should step 4's transforms come before step 3?) - -5. Pass the resulting unified tree to the writer/builder. - -I've had trouble reconciling the roles of input parser and output -writer with the idea of modes ("readers" or "directors"). Does the -mode govern the tranformation of the input, the output, or both? -Perhaps the mode should be split into two. - -For example, say the source of our input is a Python module. Our -"input mode" should be the "Python Source Reader". It discovers (from -``__docformat__``) that the input parser is "reStructuredText". If we -want HTML, we'll specify the "HTML" output formatter. But there's a -piece missing. What *kind* or *style* of HTML output do we want? -PyDoc-style, LibRefMan style, etc. (many people will want to specify -and control their own style). Is the output style specific to a -particular output format (XML, HTML, etc.)? Is the style specific to -the input mode? Or can/should they be independent? - -I envision interaction between the input parser, an "input mode" , and -the output formatter. The same intermediate data format would be used -between each of these, being transformed as it progresses. - - -Docstring Extractor -=================== - -We need code that scans a parsed Python module, and returns an ordered -tree containing the names, docstrings (including attribute and -additional docstrings), and additional info (in parentheses below) of -all of the following objects: - -- packages -- modules -- module attributes (+ values) -- classes (+ inheritance) -- class attributes (+ values) -- instance attributes (+ values) -- methods (+ formal parameters & defaults) -- functions (+ formal parameters & defaults) - -(Extract comments too? For example, comments at the start of a module -would be a good place for bibliographic field lists.) - -In order to evaluate interpreted text cross-references, namespaces for -each of the above will also be required. - -See python-dev/docstring-develop thread "AST mining", started on -2001-08-14. - - -Interpreted Text -================ - -DTD elements: package, module, class, method, function, -module_attribute, class_attribute, instance_attribute, variable, -parameter, type, exception_class, warning_class. - -In Python docstrings, interpreted text is used to classify and mark up -program identifiers, such as the names of variables, functions, -classes, and modules. If the identifier alone is given, its role is -inferred implicitly according to the Python namespace lookup rules. -For functions and methods (even when dynamically assigned), -parentheses ('()') may be included:: - - This function uses `another()` to do its work. - -For class, instance and module attributes, dotted identifiers are used -when necessary:: - - class Keeper(Storer): - - """ - Extend `Storer`. Class attribute `instances` keeps track of - the number of `Keeper` objects instantiated. - """ - - instances = 0 - """How many `Keeper` objects are there?""" - - def __init__(self): - """ - Extend `Storer.__init__()` to keep track of instances. - - Keep count in `self.instances` and data in `self.data`. - """ - Storer.__init__(self) - self.instances += 1 - - self.data = [] - """Store data in a list, most recent last.""" - - def storedata(self, data): - """ - Extend `Storer.storedata()`; append new `data` to a list - (in `self.data`). - """ - self.data = data - -To classify identifiers explicitly, the role is given along with the -identifier in either prefix or suffix form:: - - Use :method:`Keeper.storedata` to store the object's data in - `Keeper.data`:instance_attribute:. - -The role may be one of 'package', 'module', 'class', 'method', -'function', 'module_attribute', 'class_attribute', -'instance_attribute', 'variable', 'parameter', 'type', -'exception_class', 'exception', 'warning_class', or 'warning'. Other -roles may be defined. - -.. _reStructuredText Markup Specification: - http://docutils.sourceforge.net/spec/rst/reStructuredText.html -.. _Docutils: http://docutils.sourceforge.net/ -.. _pysource.dtd: http://docutils.sourceforge.net/spec/pysource.dtd -.. _docutils.dtd: http://docutils.sourceforge.net/spec/docutils.dtd - - -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - fill-column: 70 - End: diff --git a/docutils/docs/dev/rst/alternatives.txt b/docutils/docs/dev/rst/alternatives.txt deleted file mode 100644 index 4faae1f98..000000000 --- a/docutils/docs/dev/rst/alternatives.txt +++ /dev/null @@ -1,1334 +0,0 @@ -================================================== - A Record of reStructuredText Syntax Alternatives -================================================== -:Author: David Goodger -:Contact: goodger@users.sourceforge.net -:Revision: $Revision$ -:Date: $Date$ - -The following are ideas, alternatives, and justifications that were -considered for reStructuredText syntax, which did not originate with -Setext_ or StructuredText_. For an analysis of constructs which *did* -originate with StructuredText or Setext, please see `Problems With -StructuredText`_. See the `reStructuredText Markup Specification`_ -for full details of the established syntax. - -.. _Setext: http://docutils.sourceforge.net/mirror/setext.html -.. _StructuredText: - http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage -.. _Problems with StructuredText: problems.html -.. _reStructuredText Markup Specification: reStructuredText.html - - -.. contents:: - - -... Or Not To Do? -================= - -This is the realm of the possible but questionably probable. These -ideas are kept here as a record of what has been proposed, for -posterity and in case any of them prove to be useful. - - -Compound Enumerated Lists -------------------------- - -Allow for compound enumerators, such as "1.1." or "1.a." or "1(a)", to -allow for nested enumerated lists without indentation? - - -Sloppy Indentation of List Items --------------------------------- - -Perhaps the indentation shouldn't be so strict. Currently, this is -required:: - - 1. First line, - second line. - -Anything wrong with this? :: - - 1. First line, - second line. - -Problem? :: - - 1. First para. - - Block quote. (no good: requires some indent relative to first - para) - - Second Para. - - 2. Have to carefully define where the literal block ends:: - - Literal block - - Literal block? - -Hmm... Non-strict indentation isn't such a good idea. - - -Lazy Indentation of List Items ------------------------------- - -Another approach: Going back to the first draft of reStructuredText -(2000-11-27 post to Doc-SIG):: - - - This is the fourth item of the main list (no blank line above). - The second line of this item is not indented relative to the - bullet, which precludes it from having a second paragraph. - -Change that to *require* a blank line above and below, to reduce -ambiguity. This "loosening" may be added later, once the parser's -been nailed down. However, a serious drawback of this approach is to -limit the content of each list item to a single paragraph. - - -David's Idea for Lazy Indentation -````````````````````````````````` - -Consider a paragraph in a word processor. It is a single logical line -of text which ends with a newline, soft-wrapped arbitrarily at the -right edge of the page or screen. We can think of a plaintext -paragraph in the same way, as a single logical line of text, ending -with two newlines (a blank line) instead of one, and which may contain -arbitrary line breaks (newlines) where it was accidentally -hard-wrapped by an application. We can compensate for the accidental -hard-wrapping by "unwrapping" every unindented second and subsequent -line. The indentation of the first line of a paragraph or list item -would determine the indentation for the entire element. Blank lines -would be required between list items when using lazy indentation. - -The following example shows the lazy indentation of multiple body -elements:: - - - This is the first paragraph - of the first list item. - - Here is the second paragraph - of the first list item. - - - This is the first paragraph - of the second list item. - - Here is the second paragraph - of the second list item. - -A more complex example shows the limitations of lazy indentation:: - - - This is the first paragraph - of the first list item. - - Next is a definition list item: - - Term - Definition. The indentation of the term is - required, as is the indentation of the definition's - first line. - - When the definition extends to more than - one line, lazy indentation may occur. (This is the second - paragraph of the definition.) - - - This is the first paragraph - of the second list item. - - - Here is the first paragraph of - the first item of a nested list. - - So this paragraph would be outside of the nested list, - but inside the second list item of the outer list. - - But this paragraph is not part of the list at all. - -And the ambiguity remains:: - - - Look at the hyphen at the beginning of the next line - - is it a second list item marker, or a dash in the text? - - Similarly, we may want to refer to numbers inside enumerated - lists: - - 1. How many socks in a pair? There are - 2. How many pants in a pair? Exactly - 1. Go figure. - -Literal blocks and block quotes would still require consistent -indentation for all their lines. For block quotes, we might be able -to get away with only requiring that the first line of each contained -element be indented. For example:: - - Here's a paragraph. - - This is a paragraph inside a block quote. - Second and subsequent lines need not be indented at all. - - - A bullet list inside - the block quote. - - Second paragraph of the - bullet list inside the block quote. - -Although feasible, this form of lazy indentation has problems. The -document structure and hierarchy is not obvious from the indentation, -making the source plaintext difficult to read. This will also make -keeping track of the indentation while writing difficult and -error-prone. However, these problems may be acceptable for Wikis and -email mode, where we may be able to rely on less complex structure -(few nested lists, for example). - - -Field Lists -=========== - -Prior to the syntax for field lists being finalized, several -alternatives were proposed. - -1. Unadorned RFC822_ everywhere:: - - Author: Me - Version: 1 - - Advantages: clean, precedent (RFC822-compliant). Disadvantage: - ambiguous (these paragraphs are a prime example). - - Conclusion: rejected. - -2. Special case: use unadorned RFC822_ for the very first or very last - text block of a document:: - - """ - Author: Me - Version: 1 - - The rest of the document... - """ - - Advantages: clean, precedent (RFC822-compliant). Disadvantages: - special case, flat (unnested) field lists only, still ambiguous:: - - """ - Usage: cmdname [options] arg1 arg2 ... - - We obviously *don't* want the like above to be interpreted as a - field list item. Or do we? - """ - - Conclusion: rejected for the general case, accepted for specific - contexts (PEPs, email). - -3. Use a directive:: - - .. fields:: - - Author: Me - Version: 1 - - Advantages: explicit and unambiguous, RFC822-compliant. - Disadvantage: cumbersome. - - Conclusion: rejected for the general case (but such a directive - could certainly be written). - -4. Use Javadoc-style:: - - @Author: Me - @Version: 1 - @param a: integer - - Advantages: unambiguous, precedent, flexible. Disadvantages: - non-intuitive, ugly, not RFC822-compliant. - - Conclusion: rejected. - -5. Use leading colons:: - - :Author: Me - :Version: 1 - - Advantages: unambiguous, obvious (*almost* RFC822-compliant), - flexible, perhaps even elegant. Disadvantages: no precedent, not - quite RFC822-compliant. - - Conclusion: accepted! - -6. Use double colons:: - - Author:: Me - Version:: 1 - - Advantages: unambiguous, obvious? (*almost* RFC822-compliant), - flexible, similar to syntax already used for literal blocks and - directives. Disadvantages: no precedent, not quite - RFC822-compliant, similar to syntax already used for literal blocks - and directives. - - Conclusion: rejected because of the syntax similarity & conflicts. - -Why is RFC822 compliance important? It's a universal Internet -standard, and super obvious. Also, I'd like to support the PEP format -(ulterior motive: get PEPs to use reStructuredText as their standard). -But it *would* be easy to get used to an alternative (easy even to -convert PEPs; probably harder to convert python-deviants ;-). - -Unfortunately, without well-defined context (such as in email headers: -RFC822 only applies before any blank lines), the RFC822 format is -ambiguous. It is very common in ordinary text. To implement field -lists unambiguously, we need explicit syntax. - -The following question was posed in a footnote: - - Should "bibliographic field lists" be defined at the parser level, - or at the DPS transformation level? In other words, are they - reStructuredText-specific, or would they also be applicable to - another (many/every other?) syntax? - -The answer is that bibliographic fields are a -reStructuredText-specific markup convention. Other syntaxes may -implement the bibliographic elements explicitly. For example, there -would be no need for such a transformation for an XML-based markup -syntax. - -.. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt - - -Interpreted Text "Roles" -======================== - -The original purpose of interpreted text was as a mechanism for -descriptive markup, to describe the nature or role of a word or -phrase. For example, in XML we could say "<function>len</function>" -to mark up "len" as a function. It is envisaged that within Python -docstrings (inline documentation in Python module source files, the -primary market for reStructuredText) the role of a piece of -interpreted text can be inferred implicitly from the context of the -docstring within the program source. For other applications, however, -the role may have to be indicated explicitly. - -Interpreted text is enclosed in single backquotes (`). - -1. Initially, it was proposed that an explicit role could be indicated - as a word or phrase within the enclosing backquotes: - - - As a prefix, separated by a colon and whitespace:: - - `role: interpreted text` - - - As a suffix, separated by whitespace and a colon:: - - `interpreted text :role` - - There are problems with the initial approach: - - - There could be ambiguity with interpreted text containing colons. - For example, an index entry of "Mission: Impossible" would - require a backslash-escaped colon. - - - The explicit role is descriptive markup, not content, and will - not be visible in the processed output. Putting it inside the - backquotes doesn't feel right; the *role* isn't being quoted. - -2. Tony Ibbs suggested that the role be placed outside the - backquotes:: - - role:`prefix` or `suffix`:role - - This removes the embedded-colons ambiguity, but limits the role - identifier to be a single word (whitespace would be illegal). - Since roles are not meant to be visible after processing, the lack - of whitespace support is not important. - - The suggested syntax remains ambiguous with respect to ratios and - some writing styles. For example, suppose there is a "signal" - identifier, and we write:: - - ...calculate the `signal`:noise ratio. - - "noise" looks like a role. - -3. As an improvement on #2, we can bracket the role with colons:: - - :role:`prefix` or `suffix`:role: - - This syntax is similar to that of field lists, which is fine since - both are doing similar things: describing. - - This is the syntax chosen for reStructuredText. - -4. Another alternative is two colons instead of one:: - - role::`prefix` or `suffix`::role - - But this is used for analogies ("A:B::C:D": "A is to B as C is to - D"). - - Both alternative #2 and #4 lack delimiters on both sides of the - role, making it difficult to parse (by the reader). - -5. Some kind of bracketing could be used: - - - Parentheses:: - - (role)`prefix` or `suffix`(role) - - - Braces:: - - {role}`prefix` or `suffix`{role} - - - Square brackets:: - - [role]`prefix` or `suffix`[role] - - - Angle brackets:: - - <role>`prefix` or `suffix`<role> - - (The overlap of \*ML tags with angle brackets would be too - confusing and precludes their use.) - -Syntax #3 was chosen for reStructuredText. - - -Comments -======== - -A problem with comments (actually, with all indented constructs) is -that they cannot be followed by an indented block -- a block quote -- -without swallowing it up. - -I thought that perhaps comments should be one-liners only. But would -this mean that footnotes, hyperlink targets, and directives must then -also be one-liners? Not a good solution. - -Tony Ibbs suggested a "comment" directive. I added that we could -limit a comment to a single text block, and that a "multi-block -comment" could use "comment-start" and "comment-end" directives. This -would remove the indentation incompatibility. A "comment" directive -automatically suggests "footnote" and (hyperlink) "target" directives -as well. This could go on forever! Bad choice. - -Garth Kidd suggested that an "empty comment", a ".." explicit markup -start with nothing on the first line (except possibly whitespace) and -a blank line immediately following, could serve as an "unindent". An -empty comment does **not** swallow up indented blocks following it, -so block quotes are safe. "A tiny but practical wart." Accepted. - - -Anonymous Hyperlinks -==================== - -Alan Jaffray came up with this idea, along with the following syntax:: - - Search the `Python DOC-SIG mailing list archives`{}_. - - .. _: http://mail.python.org/pipermail/doc-sig/ - -The idea is sound and useful. I suggested a "double underscore" -syntax:: - - Search the `Python DOC-SIG mailing list archives`__. - - .. __: http://mail.python.org/pipermail/doc-sig/ - -But perhaps single underscores are okay? The syntax looks better, but -the hyperlink itself doesn't explicitly say "anonymous":: - - Search the `Python DOC-SIG mailing list archives`_. - - .. _: http://mail.python.org/pipermail/doc-sig/ - -Mixing anonymous and named hyperlinks becomes confusing. The order of -targets is not significant for named hyperlinks, but it is for -anonymous hyperlinks:: - - Hyperlinks: anonymous_, named_, and another anonymous_. - - .. _named: named - .. _: anonymous1 - .. _: anonymous2 - -Without the extra syntax of double underscores, determining which -hyperlink references are anonymous may be difficult. We'd have to -check which references don't have corresponding targets, and match -those up with anonymous targets. Keeping to a simple consistent -ordering (as with auto-numbered footnotes) seems simplest. - -reStructuredText will use the explicit double-underscore syntax for -anonymous hyperlinks. An alternative (see `Reworking Explicit -Markup`_ below) for the somewhat awkward ".. __:" syntax is "__":: - - An anonymous__ reference. - - __ http://anonymous - - -Reworking Explicit Markup -========================= - -Alan Jaffray came up with the idea of `anonymous hyperlinks`_, added -to reStructuredText. Subsequently it was asserted that hyperlinks -(especially anonymous hyperlinks) would play an increasingly important -role in reStructuredText documents, and therefore they require a -simpler and more concise syntax. This prompted a review of the -current and proposed explicit markup syntaxes with regards to -improving usability. - -1. Original syntax:: - - .. _blah: internal hyperlink target - .. _blah: http://somewhere external hyperlink target - .. _blah: blahblah_ indirect hyperlink target - .. __: anonymous internal target - .. __: http://somewhere anonymous external target - .. __: blahblah_ anonymous indirect target - .. [blah] http://somewhere footnote - .. blah:: http://somewhere directive - .. blah: http://somewhere comment - - .. Note:: - - The comment text was intentionally made to look like a hyperlink - target. - - Origins: - - * Except for the colon (a delimiter necessary to allow for - phrase-links), hyperlink target ``.. _blah:`` comes from Setext. - * Comment syntax from Setext. - * Footnote syntax from StructuredText ("named links"). - * Directives and anonymous hyperlinks original to reStructuredText. - - Advantages: - - + Consistent explicit markup indicator: "..". - + Consistent hyperlink syntax: ".. _" & ":". - - Disadvantages: - - - Anonymous target markup is awkward: ".. __:". - - The explicit markup indicator ("..") is excessively overloaded? - - Comment text is limited (can't look like a footnote, hyperlink, - or directive). But this is probably not important. - -2. Alan Jaffray's proposed syntax #1:: - - __ _blah internal hyperlink target - __ blah: http://somewhere external hyperlink target - __ blah: blahblah_ indirect hyperlink target - __ anonymous internal target - __ http://somewhere anonymous external target - __ blahblah_ anonymous indirect target - __ [blah] http://somewhere footnote - .. blah:: http://somewhere directive - .. blah: http://somewhere comment - - The hyperlink-connoted underscores have become first-level syntax. - - Advantages: - - + Anonymous targets are simpler. - + All hyperlink targets are one character shorter. - - Disadvantages: - - - Inconsistent internal hyperlink targets. Unlike all other named - hyperlink targets, there's no colon. There's an extra leading - underscore, but we can't drop it because without it, "blah" looks - like a relative URI. Unless we restore the colon:: - - __ blah: internal hyperlink target - - - Obtrusive markup? - -3. Alan Jaffray's proposed syntax #2:: - - .. _blah internal hyperlink target - .. blah: http://somewhere external hyperlink target - .. blah: blahblah_ indirect hyperlink target - .. anonymous internal target - .. http://somewhere anonymous external target - .. blahblah_ anonymous indirect target - .. [blah] http://somewhere footnote - !! blah: http://somewhere directive - ## blah: http://somewhere comment - - Leading underscores have been (almost) replaced by "..", while - comments and directives have gained their own syntax. - - Advantages: - - + Anonymous hyperlinks are simpler. - + Unique syntax for comments. Connotation of "comment" from - some programming languages (including our favorite). - + Unique syntax for directives. Connotation of "action!". - - Disadvantages: - - - Inconsistent internal hyperlink targets. Again, unlike all other - named hyperlink targets, there's no colon. There's a leading - underscore, matching the trailing underscores of references, - which no other hyperlink targets have. We can't drop that one - leading underscore though: without it, "blah" looks like a - relative URI. Again, unless we restore the colon:: - - .. blah: internal hyperlink target - - - All (except for internal) hyperlink targets lack their leading - underscores, losing the "hyperlink" connotation. - - - Obtrusive syntax for comments. Alternatives:: - - ;; blah: http://somewhere - (also comment syntax in Lisp & others) - ,, blah: http://somewhere - ("comma comma": sounds like "comment"!) - - - Iffy syntax for directives. Alternatives? - -4. Tony Ibbs' proposed syntax:: - - .. _blah: internal hyperlink target - .. _blah: http://somewhere external hyperlink target - .. _blah: blahblah_ indirect hyperlink target - .. anonymous internal target - .. http://somewhere anonymous external target - .. blahblah_ anonymous indirect target - .. [blah] http://somewhere footnote - .. blah:: http://somewhere directive - .. blah: http://somewhere comment - - This is the same as the current syntax, except for anonymous - targets which drop their "__: ". - - Advantage: - - + Anonymous targets are simpler. - - Disadvantages: - - - Anonymous targets lack their leading underscores, losing the - "hyperlink" connotation. - - Anonymous targets are almost indistinguishable from comments. - (Better to know "up front".) - -5. David Goodger's proposed syntax: Perhaps going back to one of - Alan's earlier suggestions might be the best solution. How about - simply adding "__ " as a synonym for ".. __: " in the original - syntax? These would become equivalent:: - - .. __: anonymous internal target - .. __: http://somewhere anonymous external target - .. __: blahblah_ anonymous indirect target - - __ anonymous internal target - __ http://somewhere anonymous external target - __ blahblah_ anonymous indirect target - -Alternative 5 has been adopted. - - -Backquotes in Phrase-Links -========================== - -[From a 2001-06-05 Doc-SIG post in reply to questions from Doug -Hellmann.] - -The first draft of the spec, posted to the Doc-SIG in November 2000, -used square brackets for phrase-links. I changed my mind because: - -1. In the first draft, I had already decided on single-backquotes for - inline literal text. - -2. However, I wanted to minimize the necessity for backslash escapes, - for example when quoting Python repr-equivalent syntax that uses - backquotes. - -3. The processing of identifiers (funtion/method/attribute/module/etc. - names) into hyperlinks is a useful feature. PyDoc recognizes - identifiers heuristically, but it doesn't take much imagination to - come up with counter-examples where PyDoc's heuristics would result - in embarassing failure. I wanted to do it deterministically, and - that called for syntax. I called this construct "interpreted - text". - -4. Leveraging off the ``*emphasis*/**strong**`` syntax, lead to the - idea of using double-backquotes as syntax. - -5. I worked out some rules for inline markup recognition. - -6. In combination with #5, double backquotes lent themselves to inline - literals, neatly satisfying #2, minimizing backslash escapes. In - fact, the spec says that no interpretation of any kind is done - within double-backquote inline literal text; backslashes do *no* - escaping within literal text. - -7. Single backquotes are then freed up for interpreted text. - -8. I already had square brackets required for footnote references. - -9. Since interpreted text will typically turn into hyperlinks, it was - a natural fit to use backquotes as the phrase-quoting syntax for - trailing-underscore hyperlinks. - -The original inspiration for the trailing underscore hyperlink syntax -was Setext. But for phrases Setext used a very cumbersome -``underscores_between_words_like_this_`` syntax. - -The underscores can be viewed as if they were right-pointing arrows: -``-->``. So ``hyperlink_`` points away from the reference, and -``.. _hyperlink:`` points toward the target. - - -Substitution Mechanism -====================== - -Substitutions arose out of a Doc-SIG thread begun on 2001-10-28 by -Alan Jaffray, "reStructuredText inline markup". It reminded me of a -missing piece of the reStructuredText puzzle, first referred to in my -contribution to "Documentation markup & processing / PEPs" (Doc-SIG -2001-06-21). - -Substitutions allow the power and flexibility of directives to be -shared by inline text. They are a way to allow arbitrarily complex -inline objects, while keeping the details out of the flow of text. -They are the equivalent of SGML/XML's named entities. For example, an -inline image (using reference syntax alternative 4d (vertical bars) -and definition alternative 3, the alternatives chosen for inclusion in -the spec):: - - The |biohazard| symbol must be used on containers used to dispose - of medical waste. - - .. |biohazard| image:: biohazard.png - [height=20 width=20] - -The ``|biohazard|`` substitution reference will be replaced in-line by -whatever the ``.. |biohazard|`` substitution definition generates (in -this case, an image). A substitution definition contains the -substitution text bracketed with vertical bars, followed by a an -embedded inline-compatible directive, such as "image". A transform is -required to complete the substitution. - -Syntax alternatives for the reference: - -1. Use the existing interpreted text syntax, with a predefined role - such as "sub":: - - The `biohazard`:sub: symbol... - - Advantages: existing syntax, explicit. Disadvantages: verbose, - obtrusive. - -2. Use a variant of the interpreted text syntax, with a new suffix - akin to the underscore in phrase-link references:: - - (a) `name`@ - (b) `name`# - (c) `name`& - (d) `name`/ - (e) `name`< - (f) `name`:: - (g) `name`: - - - Due to incompatibility with other constructs and ordinary text - usage, (f) and (g) are not possible. - -3. Use interpreted text syntax with a fixed internal format:: - - (a) `:name:` - (b) `name:` - (c) `name::` - (d) `::name::` - (e) `%name%` - (f) `#name#` - (g) `/name/` - (h) `&name&` - (i) `|name|` - (j) `[name]` - (k) `<name>` - (l) `&name;` - (m) `'name'` - - To avoid ML confusion (k) and (l) are definitely out. Square - brackets (j) won't work in the target (the substitution definition - would be indistinguishable from a footnote). - - The ```/name/``` syntax (g) is reminiscent of "s/find/sub" - substitution syntax in ed-like languages. However, it may have a - misleading association with regexps, and looks like an absolute - POSIX path. (i) is visually equivalent and lacking the - connotations. - - A disadvantage of all of these is that they limit interpreted text, - albeit only slightly. - -4. Use specialized syntax, something new:: - - (a) #name# - (b) @name@ - (c) /name/ - (d) |name| - (e) <<name>> - (f) //name// - (g) ||name|| - (h) ^name^ - (i) [[name]] - (j) ~name~ - (k) !name! - (l) =name= - (m) ?name? - (n) >name< - - "#" (a) and "@" (b) are obtrusive. "/" (c) without backquotes - looks just like a POSIX path; it is likely for such usage to appear - in text. - - "|" (d) and "^" (h) are feasible. - -5. Redefine the trailing underscore syntax. See definition syntax - alternative 4, below. - -Syntax alternatives for the definition: - -1. Use the existing directive syntax, with a predefined directive such - as "sub". It contains a further embedded directive resolving to an - inline-compatible object:: - - .. sub:: biohazard - .. image:: biohazard.png - [height=20 width=20] - - .. sub:: parrot - That bird wouldn't *voom* if you put 10,000,000 volts - through it! - - The advantages and disadvantages are the same as in inline - alternative 1. - -2. Use syntax as in #1, but with an embedded directivecompressed:: - - .. sub:: biohazard image:: biohazard.png - [height=20 width=20] - - This is a bit better than alternative 1, but still too much. - -3. Use a variant of directive syntax, incorporating the substitution - text, obviating the need for a special "sub" directive name. If we - assume reference alternative 4d (vertical bars), the matching - definition would look like this:: - - .. |biohazard| image:: biohazard.png - [height=20 width=20] - -4. (Suggested by Alan Jaffray on Doc-SIG from 2001-11-06.) - - Instead of adding new syntax, redefine the trailing underscore - syntax to mean "substitution reference" instead of "hyperlink - reference". Alan's example:: - - I had lunch with Jonathan_ today. We talked about Zope_. - - .. _Jonathan: lj [user=jhl] - .. _Zope: http://www.zope.org/ - - A problem with the proposed syntax is that URIs which look like - simple reference names (alphanum plus ".", "-", "_") would be - indistinguishable from substitution directive names. A more - consistent syntax would be:: - - I had lunch with Jonathan_ today. We talked about Zope_. - - .. _Jonathan: lj:: user=jhl - .. _Zope: http://www.zope.org/ - - (``::`` after ``.. _Jonathan: lj``.) - - The "Zope" target is a simple external hyperlink, but the - "Jonathan" target contains a directive. Alan proposed is that the - reference text be replaced by whatever the referenced directive - (the "directive target") produces. A directive reference becomes a - hyperlink reference if the contents of the directive target resolve - to a hyperlink. If the directive target resolves to an icon, the - reference is replaced by an inline icon. If the directive target - resolves to a hyperlink, the directive reference becomes a - hyperlink reference. - - This seems too indirect and complicated for easy comprehension. - - The reference in the text will sometimes become a link, sometimes - not. Sometimes the reference text will remain, sometimes not. We - don't know *at the reference*:: - - This is a `hyperlink reference`_; its text will remain. - This is an `inline icon`_; its text will disappear. - - That's a problem. - -The syntax that has been incorporated into the spec and parser is -reference alternative 4d with definition alternative 3:: - - The |biohazard| symbol... - - .. |biohazard| image:: biohazard.png - [height=20 width=20] - -We can also combine substitution references with hyperlink references, -by appending a "_" (named hyperlink reference) or "__" (anonymous -hyperlink reference) suffix to the substitution reference. This -allows us to click on an image-link:: - - The |biohazard|_ symbol... - - .. |biohazard| image:: biohazard.png - [height=20 width=20] - .. _biohazard: http://www.cdc.gov/ - -There have been several suggestions for the naming of these -constructs, originally called "substitution references" and -"substitutions". - -1. Candidate names for the reference construct: - - (a) substitution reference - (b) tagging reference - (c) inline directive reference - (d) directive reference - (e) indirect inline directive reference - (f) inline directive placeholder - (g) inline directive insertion reference - (h) directive insertion reference - (i) insertion reference - (j) directive macro reference - (k) macro reference - (l) substitution directive reference - -2. Candidate names for the definition construct: - - (a) substitution - (b) substitution directive - (c) tag - (d) tagged directive - (e) directive target - (f) inline directive - (g) inline directive definition - (h) referenced directive - (i) indirect directive - (j) indirect directive definition - (k) directive definition - (l) indirect inline directive - (m) named directive definition - (n) inline directive insertion definition - (o) directive insertion definition - (p) insertion definition - (q) insertion directive - (r) substitution definition - (s) directive macro definition - (t) macro definition - (u) substitution directive definition - (v) substitution definition - -"Inline directive reference" (1c) seems to be an appropriate term at -first, but the term "inline" is redundant in the case of the -reference. Its counterpart "inline directive definition" (2g) is -awkward, because the directive definition itself is not inline. - -"Directive reference" (1d) and "directive definition" (2k) are too -vague. "Directive definition" could be used to refer to any -directive, not just those used for inline substitutions. - -One meaning of the term "macro" (1k, 2s, 2t) is too -programming-language-specific. Also, macros are typically simple text -substitution mechanisms: the text is substituted first and evaluated -later. reStructuredText substitution definitions are evaluated in -place at parse time and substituted afterwards. - -"Insertion" (1h, 1i, 2n-2q) is almost right, but it implies that -something new is getting added rather than one construct being -replaced by another. - -Which brings us back to "substitution". The overall best names are -"substitution reference" (1a) and "substitution definition" (2v). A -long way to go to add one word! - - -Reworking Footnotes -=================== - -As a further wrinkle (see `Reworking Explicit Markup`_ above), in the -wee hours of 2002-02-28 I posted several ideas for changes to footnote -syntax: - - - Change footnote syntax from ``.. [1]`` to ``_[1]``? ... - - Differentiate (with new DTD elements) author-date "citations" - (``[GVR2002]``) from numbered footnotes? ... - - Render footnote references as superscripts without "[]"? ... - -These ideas are all related, and suggest changes in the -reStructuredText syntax as well as the docutils tree model. - -The footnote has been used for both true footnotes (asides expanding -on points or defining terms) and for citations (references to external -works). Rather than dealing with one amalgam construct, we could -separate the current footnote concept into strict footnotes and -citations. Citations could be interpreted and treated differently -from footnotes. Footnotes would be limited to numerical labels: -manual ("1") and auto-numbered (anonymous "#", named "#label"). - -The footnote is the only explicit markup construct (starts with ".. ") -that directly translates to a visible body element. I've always been -a little bit uncomfortable with the ".. " marker for footnotes because -of this; ".. " has a connotation of "special", but footnotes aren't -especially "special". Printed texts often put footnotes at the bottom -of the page where the reference occurs (thus "foot note"). Some HTML -designs would leave footnotes to be rendered the same positions where -they're defined. Other online and printed designs will gather -footnotes into a section near the end of the document, converting them -to "endnotes" (perhaps using a directive in our case); but this -"special processing" is not an intrinsic property of the footnote -itself, but a decision made by the document author or processing -system. - -Citations are almost invariably collected in a section at the end of a -document or section. Citations "disappear" from where they are -defined and are magically reinserted at some well-defined point. -There's more of a connection to the "special" connotation of the ".. " -syntax. The point at which the list of citations is inserted could be -defined manually by a directive (e.g., ".. citations::"), and/or have -default behavior (e.g., a section automatically inserted at the end of -the document) that might be influenced by options to the Writer. - -Syntax proposals: - -+ Footnotes: - - - Current syntax:: - - .. [1] Footnote 1 - .. [#] Auto-numbered footnote. - .. [#label] Auto-labeled footnote. - - - The syntax proposed in the original 2002-02-28 Doc-SIG post: - remove the ".. ", prefix a "_":: - - _[1] Footnote 1 - _[#] Auto-numbered footnote. - _[#label] Auto-labeled footnote. - - The leading underscore syntax (earlier dropped because - ``.. _[1]:`` was too verbose) is a useful reminder that footnotes - are hyperlink targets. - - - Minimal syntax: remove the ".. [" and "]", prefix a "_", and - suffix a ".":: - - _1. Footnote 1. - _#. Auto-numbered footnote. - _#label. Auto-labeled footnote. - - ``_1.``, ``_#.``, and ``_#label.`` are markers, - like list markers. - - Footnotes could be rendered something like this in HTML - - | 1. This is a footnote. The brackets could be dropped - | from the label, and a vertical bar could set them - | off from the rest of the document in the HTML. - - Two-way hyperlinks on the footnote marker ("1." above) would also - help to differentiate footnotes from enumerated lists. - - If converted to endnotes (by a directive/transform), a horizontal - half-line might be used instead. Page-oriented output formats - would typically use the horizontal line for true footnotes. - -+ Footnote references: - - - Current syntax:: - - [1]_, [#]_, [#label]_ - - - Minimal syntax to match the minimal footnote syntax above:: - - 1_, #_, #label_ - - As a consequence, pure-numeric hyperlink references would not be - possible; they'd be interpreted as footnote references. - -+ Citation references: no change is proposed from the current footnote - reference syntax:: - - [GVR2001]_ - -+ Citations: - - - Current syntax (footnote syntax):: - - .. [GVR2001] Python Documentation; van Rossum, Drake, et al.; - http://www.python.org/doc/ - - - Possible new syntax:: - - _[GVR2001] Python Documentation; van Rossum, Drake, et al.; - http://www.python.org/doc/ - - _[DJG2002] - Docutils: Python Documentation Utilities project; Goodger - et al.; http://docutils.sourceforge.net/ - - Without the ".. " marker, subsequent lines would either have to - align as in one of the above, or we'd have to allow loose - alignment (I'd rather not):: - - _[GVR2001] Python Documentation; van Rossum, Drake, et al.; - http://www.python.org/doc/ - -I proposed adopting the "minimal" syntax for footnotes and footnote -references, and adding citations and citation references to -reStructuredText's repertoire. The current footnote syntax for -citations is better than the alternatives given. - -From a reply by Tony Ibbs on 2002-03-01: - - However, I think easier with examples, so let's create one:: - - Fans of Terry Pratchett are perhaps more likely to use - footnotes [1]_ in their own writings than other people - [2]_. Of course, in *general*, one only sees footnotes - in academic or technical writing - it's use in fiction - and letter writing is not normally considered good - style [4]_, particularly in emails (not a medium that - lends itself to footnotes). - - .. [1] That is, little bits of referenced text at the - bottom of the page. - .. [2] Because Terry himself does, of course [3]_. - .. [3] Although he has the distinction of being - *funny* when he does it, and his fans don't always - achieve that aim. - .. [4] Presumably because it detracts from linear - reading of the text - this is, of course, the point. - - and look at it with the second syntax proposal:: - - Fans of Terry Pratchett are perhaps more likely to use - footnotes [1]_ in their own writings than other people - [2]_. Of course, in *general*, one only sees footnotes - in academic or technical writing - it's use in fiction - and letter writing is not normally considered good - style [4]_, particularly in emails (not a medium that - lends itself to footnotes). - - _[1] That is, little bits of referenced text at the - bottom of the page. - _[2] Because Terry himself does, of course [3]_. - _[3] Although he has the distinction of being - *funny* when he does it, and his fans don't always - achieve that aim. - _[4] Presumably because it detracts from linear - reading of the text - this is, of course, the point. - - (I note here that if I have gotten the indentation of the - footnotes themselves correct, this is clearly not as nice. And if - the indentation should be to the left margin instead, I like that - even less). - - and the third (new) proposal:: - - Fans of Terry Pratchett are perhaps more likely to use - footnotes 1_ in their own writings than other people - 2_. Of course, in *general*, one only sees footnotes - in academic or technical writing - it's use in fiction - and letter writing is not normally considered good - style 4_, particularly in emails (not a medium that - lends itself to footnotes). - - _1. That is, little bits of referenced text at the - bottom of the page. - _2. Because Terry himself does, of course 3_. - _3. Although he has the distinction of being - *funny* when he does it, and his fans don't always - achieve that aim. - _4. Presumably because it detracts from linear - reading of the text - this is, of course, the point. - - I think I don't, in practice, mind the targets too much (the use - of a dot after the number helps a lot here), but I do have a - problem with the body text, in that I don't naturally separate out - the footnotes as different than the rest of the text - instead I - keep wondering why there are numbers interspered in the text. The - use of brackets around the numbers ([ and ]) made me somehow parse - the footnote references as "odd" - i.e., not part of the body text - - and thus both easier to skip, and also (paradoxically) easier to - pick out so that I could follow them. - - Thus, for the moment (and as always susceptable to argument), I'd - say -1 on the new form of footnote reference (i.e., I much prefer - the existing ``[1]_`` over the proposed ``1_``), and ambivalent - over the proposed target change. - - That leaves David's problem of wanting to distinguish footnotes - and citations - and the only thing I can propose there is that - footnotes are numeric or # and citations are not (which, as a - human being, I can probably cope with!). - -From a reply by Paul Moore on 2002-03-01: - - I think the current footnote syntax ``[1]_`` is *exactly* the - right balance of distinctness vs unobtrusiveness. I very - definitely don't think this should change. - - On the target change, it doesn't matter much to me. - -From a further reply by Tony Ibbs on 2002-03-01, referring to the -"[1]" form and actual usage in email: - - Clearly this is a form people are used to, and thus we should - consider it strongly (in the same way that the usage of ``*..*`` - to mean emphasis was taken partly from email practise). - - Equally clearly, there is something "magical" for people in the - use of a similar form (i.e., ``[1]``) for both footnote reference - and footnote target - it seems natural to keep them similar. - - ... - - I think that this established plaintext usage leads me to strongly - believe we should retain square brackets at both ends of a - footnote. The markup of the reference end (a single trailing - underscore) seems about as minimal as we can get away with. The - markup of the target end depends on how one envisages the thing - - if ".." means "I am a target" (as I tend to see it), then that's - good, but one can also argue that the "_[1]" syntax has a neat - symmetry with the footnote reference itself, if one wishes (in - which case ".." presumably means "hidden/special" as David seems - to think, which is why one needs a ".." *and* a leading underline - for hyperlink targets. - -Given the persuading arguments voiced, we'll leave footnote & footnote -reference syntax alone. Except that these discussions gave rise to -the "auto-symbol footnote" concept, which has been added. Citations -and citation references have also been added. - - -Auto-Enumerated Lists -===================== - -The advantage of auto-numbered enumerated lists would be similar to -that of auto-numbered footnotes: lists could be written and rearranged -without having to manually renumber them. The disadvantages are also -the same: input and output wouldn't match exactly; the markup may be -ugly or confusing (depending on which alternative is chosen). - -1. Use the "#" symbol. Example:: - - #. Item 1. - #. Item 2. - #. Item 3. - - Advantages: simple, explicit. Disadvantage: enumeration sequence - cannot be specified (limited to arabic numerals); ugly. - -2. As a variation on #1, first initialize the enumeration sequence? - For example:: - - a) Item a. - #) Item b. - #) Item c. - - Advantages: simple, explicit, any enumeration sequence possible. - Disadvantages: ugly; perhaps confusing with mixed concrete/abstract - enumerators. - -3. Alternative suggested by Fred Bremmer, from experience with MoinMoin:: - - 1. Item 1. - 1. Item 2. - 1. Item 3. - - Advantages: enumeration sequence is explicit (could be multiple - "a." or "(I)" tokens). Disadvantages: perhaps confusing; otherwise - erroneous input (e.g., a duplicate item "1.") would pass silently, - either causing a problem later in the list (if no blank lines - between items) or creating two lists (with blanks). - - Take this input for example:: - - 1. Item 1. - - 1. Unintentional duplicate of item 1. - - 2. Item 2. - - Currently the parser will produce two list, "1" and "1,2" (no - warnings, because of the presence of blank lines). Using Fred's - notation, the current behavior is "1,1,2 -> 1 1,2" (without blank - lines between items, it would be "1,1,2 -> 1 [WARNING] 1,2"). What - should the behavior be with auto-numbering? - - Fred has produced a patch__, whose initial behavior is as follows:: - - 1,1,1 -> 1,2,3 - 1,2,2 -> 1,2,3 - 3,3,3 -> 3,4,5 - 1,2,2,3 -> 1,2,3 [WARNING] 3 - 1,1,2 -> 1,2 [WARNING] 2 - - (After the "[WARNING]", the "3" would begin a new list.) - - I have mixed feelings about adding this functionality to the spec & - parser. It would certainly be useful to some users (myself - included; I often have to renumber lists). Perhaps it's too - clever, asking the parser to guess too much. What if you *do* want - three one-item lists in a row, each beginning with "1."? You'd - have to use empty comments to force breaks. Also, I question - whether "1,2,2 -> 1,2,3" is optimal behavior. - - In response, Fred came up with "a stricter and more explicit rule - [which] would be to only auto-number silently if *all* the - enumerators of a list were identical". In that case:: - - 1,1,1 -> 1,2,3 - 1,2,2 -> 1,2 [WARNING] 2 - 3,3,3 -> 3,4,5 - 1,2,2,3 -> 1,2 [WARNING] 2,3 - 1,1,2 -> 1,2 [WARNING] 2 - - Should any start-value be allowed ("3,3,3"), or should - auto-numbered lists be limited to begin with ordinal-1 ("1", "A", - "a", "I", or "i")? - - __ http://sourceforge.net/tracker/index.php?func=detail&aid=548802 - &group_id=38414&atid=422032 - -4. Alternative proposed by Tony Ibbs:: - - #1. First item. - #3. Aha - I edited this in later. - #2. Second item. - - The initial proposal required unique enumerators within a list, but - this limits the convenience of a feature of already limited - applicability and convenience. Not a useful requirement; dropped. - - Instead, simply prepend a "#" to a standard list enumerator to - indicate auto-enumeration. The numbers (or letters) of the - enumerators themselves are not significant, except: - - - as a sequence indicator (arabic, roman, alphabetic; upper/lower), - - - and perhaps as a start value (first list item). - - Advantages: explicit, any enumeration sequence possible. - Disadvantages: a bit ugly. - - -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - End: diff --git a/docutils/docs/dev/rst/problems.txt b/docutils/docs/dev/rst/problems.txt deleted file mode 100644 index e97887a01..000000000 --- a/docutils/docs/dev/rst/problems.txt +++ /dev/null @@ -1,770 +0,0 @@ -============================== - Problems With StructuredText -============================== -:Author: David Goodger -:Contact: goodger@users.sourceforge.net -:Revision: $Revision$ -:Date: $Date$ - -There are several problems, unresolved issues, and areas of -controversy within StructuredText_ (Classic and Next Generation). In -order to resolve all these issues, this analysis brings all of the -issues out into the open, enumerates all the alternatives, and -proposes solutions to be incorporated into the reStructuredText_ -specification. - - -.. contents:: - - -Formal Specification -==================== - -The description in the original StructuredText.py has been criticized -for being vague. For practical purposes, "the code *is* the spec." -Tony Ibbs has been working on deducing a `detailed description`_ from -the documentation and code of StructuredTextNG_. Edward Loper's -STMinus_ is another attempt to formalize a spec. - -For this kind of a project, the specification should always precede -the code. Otherwise, the markup is a moving target which can never be -adopted as a standard. Of course, a specification may be revised -during lifetime of the code, but without a spec there is no visible -control and thus no confidence. - - -Understanding and Extending the Code -==================================== - -The original StructuredText_ is a dense mass of sparsely commented -code and inscrutable regular expressions. It was not designed to be -extended and is very difficult to understand. StructuredTextNG_ has -been designed to allow input (syntax) and output extensions, but its -documentation (both internal [comments & docstrings], and external) is -inadequate for the complexity of the code itself. - -For reStructuredText to become truly useful, perhaps even part of -Python's standard library, it must have clear, understandable -documentation and implementation code. For the implementation of -reStructuredText to be taken seriously, it must be a sterling example -of the potential of docstrings; the implementation must practice what -the specification preaches. - - -Section Structure via Indentation -================================= - -Setext_ required that body text be indented by 2 spaces. The original -StructuredText_ and StructuredTextNG_ require that section structure -be indicated through indentation, as "inspired by Python". For -certain structures with a very limited, local extent (such as lists, -block quotes, and literal blocks), indentation naturally indicates -structure or hierarchy. For sections (which may have a very large -extent), structure via indentation is unnecessary, unnatural and -ambiguous. Rather, the syntax of the section title *itself* should -indicate that it is a section title. - -The original StructuredText states that "A single-line paragraph whose -immediately succeeding paragraphs are lower level is treated as a -header." Requiring indentation in this way is: - -- Unnecessary. The vast majority of docstrings and standalone - documents will have no more than one level of section structure. - Requiring indentation for such docstrings is unnecessary and - irritating. - -- Unnatural. Most published works use title style (type size, face, - weight, and position) and/or section/subsection numbering rather - than indentation to indicate hierarchy. This is a tradition with a - very long history. - -- Ambiguous. A StructuredText header is indistinguishable from a - one-line paragraph followed by a block quote (precluding the use of - block quotes). Enumerated section titles are ambiguous (is it a - header? is it a list item?). Some additional adornment must be - required to confirm the line's role as a title, both to a parser and - to the human reader of the source text. - -Python's use of significant whitespace is a wonderful (if not -original) innovation, however requiring indentation in ordinary -written text is hypergeneralization. - -reStructuredText_ indicates section structure through title adornment -style (as exemplified by this document). This is far more natural. -In fact, it is already in widespread use in plain text documents, -including in Python's standard distribution (such as the toplevel -README_ file). - - -Character Escaping Mechanism -============================ - -No matter what characters are chosen for markup, some day someone will -want to write documentation *about* that markup or using markup -characters in a non-markup context. Therefore, any complete markup -language must have an escaping or encoding mechanism. For a -lightweight markup system, encoding mechanisms like SGML/XML's '*' -are out. So an escaping mechanism is in. However, with carefully -chosen markup, it should be necessary to use the escaping mechanism -only infrequently. - -reStructuredText_ needs an escaping mechanism: a way to treat -markup-significant characters as the characters themselves. Currently -there is no such mechanism (although ZWiki uses '!'). What are the -candidates? - -1. ``!`` (http://dev.zope.org/Members/jim/StructuredTextWiki/NGEscaping) -2. ``\`` -3. ``~`` -4. doubling of characters - -The best choice for this is the backslash (``\``). It's "the single -most popular escaping character in the world!", therefore familiar and -unsurprising. Since characters only need to be escaped under special -circumstances, which are typically those explaining technical -programming issues, the use of the backslash is natural and -understandable. Python docstrings can be raw (prefixed with an 'r', -as in 'r""'), which would obviate the need for gratuitous doubling-up -of backslashes. - -(On 2001-03-29 on the Doc-SIG mailing list, GvR endorsed backslash -escapes, saying, "'nuff said. Backslash it is." Although neither -legally binding nor irrevocable nor any kind of guarantee of anything, -it is a good sign.) - -The rule would be: An unescaped backslash followed by any markup -character escapes the character. The escaped character represents the -character itself, and is prevented from playing a role in any markup -interpretation. The backslash is removed from the output. A literal -backslash is represented by an "escaped backslash," two backslashes in -a row. - -A carefully constructed set of recognition rules for inline markup -will obviate the need for backslash-escapes in almost all cases; see -`Delimitation of Inline Markup`_ below. - -When an expression (requiring backslashes and other characters used -for markup) becomes too complicated and therefore unreadable, a -literal block may be used instead. Inside literal blocks, no markup -is recognized, therefore backslashes (for the purpose of escaping -markup) become unnecessary. - -We could allow backslashes preceding non-markup characters to remain -in the output. This would make describing regular expressions and -other uses of backslashes easier. However, this would complicate the -markup rules and would be confusing. - - -Blank Lines in Lists -==================== - -Oft-requested in Doc-SIG (the earliest reference is dated 1996-08-13) -is the ability to write lists without requiring blank lines between -items. In docstrings, space is at a premium. Authors want to convey -their API or usage information in as compact a form as possible. -StructuredText_ requires blank lines between all body elements, -including list items, even when boundaries are obvious from the markup -itself. - -In reStructuredText, blank lines are optional between list items. -However, in order to eliminate ambiguity, a blank line is required -before the first list item and after the last. Nested lists also -require blank lines before the list start and after the list end. - - -Bullet List Markup -================== - -StructuredText_ includes 'o' as a bullet character. This is dangerous -and counter to the language-independent nature of the markup. There -are many languages in which 'o' is a word. For example, in Spanish:: - - Llamame a la casa - o al trabajo. - - (Call me at home or at work.) - -And in Japanese (when romanized):: - - Senshuu no doyoubi ni tegami - o kakimashita. - - ([I] wrote a letter on Saturday last week.) - -If a paragraph containing an 'o' word wraps such that the 'o' is the -first text on a line, or if a paragraph begins with such a word, it -could be misinterpreted as a bullet list. - -In reStructuredText_, 'o' is not used as a bullet character. '-', -'*', and '+' are the possible bullet characters. - - -Enumerated List Markup -====================== - -StructuredText enumerated lists are allowed to begin with numbers and -letters followed by a period or right-parenthesis, then whitespace. -This has surprising consequences for writing styles. For example, -this is recognized as an enumerated list item by StructuredText:: - - Mr. Creosote. - -People will write enumerated lists in all different ways. It is folly -to try to come up with the "perfect" format for an enumerated list, -and limit the docstring parser's recognition to that one format only. - -Rather, the parser should recognize a variety of enumerator styles. -It is also recommended that the enumerator of the first list item be -ordinal-1 ('1', 'A', 'a', 'I', or 'i'), as output formats may not be -able to begin a list at an arbitrary enumeration. - -An initial idea was to require two or more consistent enumerated list -items in a row. This idea proved impractical and was dropped. In -practice, the presence of a proper enumerator is enough to reliably -recognize an enumerated list item; any ambiguities are reported by the -parser. Here's the original idea for posterity: - - The parser should recognize a variety of enumerator styles, mark - each block as a potential enumerated list item (PELI), and - interpret the enumerators of adjacent PELIs to decide whether they - make up a consistent enumerated list. - - If a PELI is labeled with a "1.", and is immediately followed by a - PELI labeled with a "2.", we've got an enumerated list. Or "(A)" - followed by "(B)". Or "i)" followed by "ii)", etc. The chances - of accidentally recognizing two adjacent and consistently labeled - PELIs, are acceptably small. - - For an enumerated list to be recognized, the following must be - true: - - - the list must consist of multiple adjacent list items (2 or - more) - - the enumerators must all have the same format - - the enumerators must be sequential - - -Definition List Markup -====================== - -StructuredText uses ' -- ' (whitespace, two hyphens, whitespace) on -the first line of a paragraph to indicate a definition list item. The -' -- ' serves to separate the term (on the left) from the definition -(on the right). - -Many people use ' -- ' as an em-dash in their text, conflicting with -the StructuredText usage. Although the Chicago Manual of Style says -that spaces should not be used around an em-dash, Peter Funk pointed -out that this is standard usage in German (according to the Duden, the -official German reference), and possibly in other languages as well. -The widespread use of ' -- ' precludes its use for definition lists; -it would violate the "unsurprising" criterion. - -A simpler, and at least equally visually distinctive construct -(proposed by Guido van Rossum, who incidentally is a frequent user of -' -- ') would do just as well:: - - term 1 - Definition. - - term 2 - Definition 2, paragraph 1. - - Definition 2, paragraph 2. - -A reStructuredText definition list item consists of a term and a -definition. A term is a simple one-line paragraph. A definition is a -block indented relative to the term, and may contain multiple -paragraphs and other body elements. No blank line precedes a -definition (this distinguishes definition lists from block quotes). - - -Literal Blocks -============== -The StructuredText_ specification has literal blocks indicated by -'example', 'examples', or '::' ending the preceding paragraph. STNG -only recognizes '::'; 'example'/'examples' are not implemented. This -is good; it fixes an unnecessary language dependency. The problem is -what to do with the sometimes- unwanted '::'. - -In reStructuredText_ '::' at the end of a paragraph indicates that -subsequent *indented* blocks are treated as literal text. No further -markup interpretation is done within literal blocks (not even -backslash-escapes). If the '::' is preceded by whitespace, '::' is -omitted from the output; if '::' was the sole content of a paragraph, -the entire paragraph is removed (no 'empty' paragraph remains). If -'::' is preceded by a non-whitespace character, '::' is replaced by -':' (i.e., the extra colon is removed). - -Thus, a section could begin with a literal block as follows:: - - Section Title - ------------- - - :: - - print "this is example literal" - - -Tables -====== - -The table markup scheme in classic StructuredText was horrible. Its -omission from StructuredTextNG is welcome, and its markup will not be -repeated here. However, tables themselves are useful in -documentation. Alternatives: - -1. This format is the most natural and obvious. It was independently - invented (no great feat of creation!), and later found to be the - format supported by the `Emacs table mode`_:: - - +------------+------------+------------+--------------+ - | Header 1 | Header 2 | Header 3 | Header 4 | - +============+============+============+==============+ - | Column 1 | Column 2 | Column 3 & 4 span (Row 1) | - +------------+------------+------------+--------------+ - | Column 1 & 2 span | Column 3 | - Column 4 | - +------------+------------+------------+ - Row 2 & 3 | - | 1 | 2 | 3 | - span | - +------------+------------+------------+--------------+ - - Tables are described with a visual outline made up of the - characters '-', '=', '|', and '+': - - - The hyphen ('-') is used for horizontal lines (row separators). - - The equals sign ('=') is optionally used as a header separator - (as of version 1.5.24, this is not supported by the Emacs table - mode). - - The vertical bar ('|') is used for for vertical lines (column - separators). - - The plus sign ('+') is used for intersections of horizontal and - vertical lines. - - Row and column spans are possible simply by omitting the column or - row separators, respectively. The header row separator must be - complete; in other words, a header cell may not span into the table - body. Each cell contains body elements, and may have multiple - paragraphs, lists, etc. Initial spaces for a left margin are - allowed; the first line of text in a cell determines its left - margin. - -2. Below is a minimalist possibility. It may be better suited to - manual input than alternative #1, but there is no Emacs editing - mode available. One disadvantage is that it resembles section - titles; a one-column table would look exactly like section & - subsection titles. :: - - ============ ============ ============ ============== - Header 1 Header 2 Header 3 Header 4 - ============ ============ ============ ============== - Column 1 Column 2 Column 3 & 4 span (Row 1) - ------------ ------------ --------------------------- - Column 1 & 2 span Column 3 - Column 4 - ------------------------- ------------ - Row 2 & 3 - 1 2 3 - span - ============ ============ ============ ============== - - The table begins with a top border of equals signs with a space at - each column boundary (regardless of spans). Each row is - underlined. Internal row separators are underlines of '-', with - spaces at column boundaries. The last of the optional head rows is - underlined with '=', again with spaces at column boundaries. - Column spans have no spaces in their underline. Row spans simply - lack an underline at the row boundary. The bottom boundary of the - table consists of '=' underlines. A blank line is required - following a table. - -Alternative #1 is the choice adopted by reStructuredText. - - -Delimitation of Inline Markup -============================= - -StructuredText specifies that inline markup must begin with -whitespace, precluding such constructs as parenthesized or quoted -emphatic text:: - - "**What?**" she cried. (*exit stage left*) - -The `reStructuredText markup specification`_ allows for such -constructs and disambiguates inline markup through a set of -recognition rules. These recognition rules define the context of -markup start-strings and end-strings, allowing markup characters to be -used in most non-markup contexts without a problem (or a backslash). -So we can say, "Use asterisks (*) around words or phrases to -*emphasisze* them." The '(*)' will not be recognized as markup. This -reduces the need for markup escaping to the point where an escape -character is *almost* (but not quite!) unnecessary. - - -Underlining -=========== - -StructuredText uses '_text_' to indicate underlining. To quote David -Ascher in his 2000-01-21 Doc-SIG mailing list post, "Docstring -grammar: a very revised proposal": - - The tagging of underlined text with _'s is suboptimal. Underlines - shouldn't be used from a typographic perspective (underlines were - designed to be used in manuscripts to communicate to the - typesetter that the text should be italicized -- no well-typeset - book ever uses underlines), and conflict with double-underscored - Python variable names (__init__ and the like), which would get - truncated and underlined when that effect is not desired. Note - that while *complete* markup would prevent that truncation - ('__init__'), I think of docstring markups much like I think of - type annotations -- they should be optional and above all do no - harm. In this case the underline markup does harm. - -Underlining is not part of the reStructuredText specification. - - -Inline Literals -=============== - -StructuredText's markup for inline literals (text left as-is, -verbatim, usually in a monospaced font; as in HTML <TT>) is single -quotes ('literals'). The problem with single quotes is that they are -too often used for other purposes: - -- Apostrophes: "Don't blame me, 'cause it ain't mine, it's Chris'."; - -- Quoting text: - - First Bruce: "Well Bruce, I heard the prime minister use it. - 'S'hot enough to boil a monkey's bum in 'ere your Majesty,' he - said, and she smiled quietly to herself." - - In the UK, single quotes are used for dialogue in published works. - -- String literals: s = '' - -Alternatives:: - - 'text' \'text\' ''text'' "text" \"text\" ""text"" - #text# @text@ `text` ^text^ ``text'' ``text`` - -The examples below contain inline literals, quoted text, and -apostrophes. Each example should evaluate to the following HTML:: - - Some <TT>code</TT>, with a 'quote', "double", ain't it grand? - Does <TT>a[b] = 'c' + "d" + `2^3`</TT> work? - - 0. Some code, with a quote, double, ain't it grand? - Does a[b] = 'c' + "d" + `2^3` work? - 1. Some 'code', with a \'quote\', "double", ain\'t it grand? - Does 'a[b] = \'c\' + "d" + `2^3`' work? - 2. Some \'code\', with a 'quote', "double", ain't it grand? - Does \'a[b] = 'c' + "d" + `2^3`\' work? - 3. Some ''code'', with a 'quote', "double", ain't it grand? - Does ''a[b] = 'c' + "d" + `2^3`'' work? - 4. Some "code", with a 'quote', \"double\", ain't it grand? - Does "a[b] = 'c' + "d" + `2^3`" work? - 5. Some \"code\", with a 'quote', "double", ain't it grand? - Does \"a[b] = 'c' + "d" + `2^3`\" work? - 6. Some ""code"", with a 'quote', "double", ain't it grand? - Does ""a[b] = 'c' + "d" + `2^3`"" work? - 7. Some #code#, with a 'quote', "double", ain't it grand? - Does #a[b] = 'c' + "d" + `2^3`# work? - 8. Some @code@, with a 'quote', "double", ain't it grand? - Does @a[b] = 'c' + "d" + `2^3`@ work? - 9. Some `code`, with a 'quote', "double", ain't it grand? - Does `a[b] = 'c' + "d" + \`2^3\`` work? - 10. Some ^code^, with a 'quote', "double", ain't it grand? - Does ^a[b] = 'c' + "d" + `2\^3`^ work? - 11. Some ``code'', with a 'quote', "double", ain't it grand? - Does ``a[b] = 'c' + "d" + `2^3`'' work? - 12. Some ``code``, with a 'quote', "double", ain't it grand? - Does ``a[b] = 'c' + "d" + `2^3\``` work? - -Backquotes (#9 & #12) are the best choice. They are unobtrusive and -relatviely rarely used (more rarely than ' or ", anyhow). Backquotes -have the connotation of 'quotes', which other options (like carets, -#10) don't. - -Analogously with ``*emph*`` & ``**strong**``, double-backquotes (#12) -could be used for inline literals. If single-backquotes are used for -'interpreted text' (context-sensitive domain-specific descriptive -markup) such as function name hyperlinks in Python docstrings, then -double-backquotes could be used for absolute-literals, wherein no -processing whatsoever takes place. An advantage of double-backquotes -would be that backslash-escaping would no longer be necessary for -embedded single-backquotes; however, embedded double-backquotes (in an -end-string context) would be illegal. See `Backquotes in -Phrase-Links`__ in `Record of reStructuredText Syntax Alternatives`__. - -__ alternatives.html#backquotes-in-phrase-links -__ alternatives.html - -Alternative choices are carets (#10) and TeX-style quotes (#11). For -examples of TeX-style quoting, see -http://www.zope.org/Members/jim/StructuredTextWiki/CustomizingTheDocumentProcessor. - -Some existing uses of backquotes: - -1. As a synonym for repr() in Python. -2. For command-interpolation in shell scripts. -3. Used as open-quotes in TeX code (and carried over into plaintext - by TeXies). - -The inline markup start-string and end-string recognition rules -defined by the `reStructuredText markup specification`_ would allow -all of these cases inside inline literals, with very few exceptions. -As a fallback, literal blocks could handle all cases. - -Outside of inline literals, the above uses of backquotes would require -backslash-escaping. However, these are all prime examples of text -that should be marked up with inline literals. - -If either backquotes or straight single-quotes are used as markup, -TeX-quotes are too troublesome to support, so no special-casing of -TeX-quotes should be done (at least at first). If TeX-quotes have to -be used outside of literals, a single backslash-escaped would suffice: -\``TeX quote''. Ugly, true, but very infrequently used. - -Using literal blocks is a fallback option which removes the need for -backslash-escaping:: - - like this:: - - Here, we can do ``absolutely'' anything `'`'\|/|\ we like! - -No mechanism for inline literals is perfect, just as no escaping -mechanism is perfect. No matter what we use, complicated inline -expressions involving the inline literal quote and/or the backslash -will end up looking ugly. We can only choose the least often ugly -option. - -reStructuredText will use double backquotes for inline literals, and -single backqoutes for interpreted text. - - -Hyperlinks -========== - -There are three forms of hyperlink currently in StructuredText_: - -1. (Absolute & relative URIs.) Text enclosed by double quotes - followed by a colon, a URI, and concluded by punctuation plus white - space, or just white space, is treated as a hyperlink:: - - "Python":http://www.python.org/ - -2. (Absolute URIs only.) Text enclosed by double quotes followed by a - comma, one or more spaces, an absolute URI and concluded by - punctuation plus white space, or just white space, is treated as a - hyperlink:: - - "mail me", mailto:me@mail.com - -3. (Endnotes.) Text enclosed by brackets link to an endnote at the - end of the document: at the beginning of the line, two dots, a - space, and the same text in brackets, followed by the end note - itself:: - - Please refer to the fine manual [GVR2001]. - - .. [GVR2001] Python Documentation, Release 2.1, van Rossum, - Drake, et al., http://www.python.org/doc/ - -The problem with forms 1 and 2 is that they are neither intuitive nor -unobtrusive (they break design goals 5 & 2). They overload -double-quotes, which are too often used in ordinary text (potentially -breaking design goal 4). The brackets in form 3 are also too common -in ordinary text (such as [nested] asides and Python lists like [12]). - -Alternatives: - -1. Have no special markup for hyperlinks. - -2. A. Interpret and mark up hyperlinks as any contiguous text - containing '://' or ':...@' (absolute URI) or '@' (email - address) after an alphanumeric word. To de-emphasize the URI, - simply enclose it in parentheses: - - Python (http://www.python.org/) - - B. Leave special hyperlink markup as a domain-specific extension. - Hyperlinks in ordinary reStructuredText documents would be - required to be standalone (i.e. the URI text inline in the - document text). Processed hyperlinks (where the URI text is - hidden behind the link) are important enough to warrant syntax. - -3. The original Setext_ introduced a mechanism of indirect hyperlinks. - A source link word ('hot word') in the text was given a trailing - underscore:: - - Here is some text with a hyperlink_ built in. - - The hyperlink itself appeared at the end of the document on a line - by itself, beginning with two dots, a space, the link word with a - leading underscore, whitespace, and the URI itself:: - - .. _hyperlink http://www.123.xyz - - Setext used ``underscores_instead_of_spaces_`` for phrase links. - -With some modification, alternative 3 best satisfies the design goals. -It has the advantage of being readable and relatively unobtrusive. -Since each source link must match up to a target, the odd variable -ending in an underscore can be spared being marked up (although it -should generate a "no such link target" warning). The only -disadvantage is that phrase-links aren't possible without some -obtrusive syntax. - -We could achieve phrase-links if we enclose the link text: - -1. in double quotes:: - - "like this"_ - -2. in brackets:: - - [like this]_ - -3. or in backquotes:: - - `like this`_ - -Each gives us somewhat obtrusive markup, but that is unavoidable. The -bracketed syntax (#2) is reminiscent of links on many web pages -(intuitive), although it is somewhat obtrusive. Alternative #3 is -much less obtrusive, and is consistent with interpreted text: the -trailing underscore indicates the interpretation of the phrase, as a -hyperlink. #3 also disambiguates hyperlinks from footnote references. -Alternative #3 wins. - -The same trailing underscore markup can also be used for footnote and -citation references, removing the problem with ordinary bracketed text -and Python lists:: - - Please refer to the fine manual [GVR2000]_. - - .. [GVR2000] Python Documentation, van Rossum, Drake, et al., - http://www.python.org/doc/ - -The two-dots-and-a-space syntax was generalized by Setext for -comments, which are removed from the (visible) processed output. -reStructuredText uses this syntax for comments, footnotes, and link -target, collectively termed "explicit markup". For link targets, in -order to eliminate ambiguity with comments and footnotes, -reStructuredText specifies that a colon always follow the link target -word/phrase. The colon denotes 'maps to'. There is no reason to -restrict target links to the end of the document; they could just as -easily be interspersed. - -Internal hyperlinks (links from one point to another within a single -document) can be expressed by a source link as before, and a target -link with a colon but no URI. In effect, these targets 'map to' the -element immediately following. - -As an added bonus, we now have a perfect candidate for -reStructuredText directives, a simple extension mechanism: explicit -markup containing a single word followed by two colons and whitespace. -The interpretation of subsequent data on the directive line or -following is directive-dependent. - -To summarize:: - - .. This is a comment. - - .. The line below is an example of a directive. - .. version:: 1 - - This is a footnote [1]_. - - This internal hyperlink will take us to the footnotes_ area below. - - Here is a one-word_ external hyperlink. - - Here is `a hyperlink phrase`_. - - .. _footnotes: - .. [1] Footnote text goes here. - - .. external hyperlink target mappings: - .. _one-word: http://www.123.xyz - .. _a hyperlink phrase: http://www.123.xyz - -The presence or absence of a colon after the target link -differentiates an indirect hyperlink from a footnote, respectively. A -footnote requires brackets. Backquotes around a target link word or -phrase are required if the phrase contains a colon, optional -otherwise. - -Below are examples using no markup, the two StructuredText hypertext -styles, and the reStructuredText hypertext style. Each example -contains an indirect link, a direct link, a footnote/endnote, and -bracketed text. In HTML, each example should evaluate to:: - - <P>A <A HREF="http://spam.org">URI</A>, see <A HREF="#eggs2000"> - [eggs2000]</A> (in Bacon [Publisher]). Also see - <A HREF="http://eggs.org">http://eggs.org</A>.</P> - - <P><A NAME="eggs2000">[eggs2000]</A> "Spam, Spam, Spam, Eggs, - Bacon, and Spam"</P> - -1. No markup:: - - A URI http://spam.org, see eggs2000 (in Bacon [Publisher]). - Also see http://eggs.org. - - eggs2000 "Spam, Spam, Spam, Eggs, Bacon, and Spam" - -2. StructuredText absolute/relative URI syntax - ("text":http://www.url.org):: - - A "URI":http://spam.org, see [eggs2000] (in Bacon [Publisher]). - Also see "http://eggs.org":http://eggs.org. - - .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" - - Note that StructuredText does not recognize standalone URIs, - forcing doubling up as shown in the second line of the example - above. - -3. StructuredText absolute-only URI syntax - ("text", mailto:you@your.com):: - - A "URI", http://spam.org, see [eggs2000] (in Bacon - [Publisher]). Also see "http://eggs.org", http://eggs.org. - - .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" - -4. reStructuredText syntax:: - - 4. A URI_, see [eggs2000]_ (in Bacon [Publisher]). - Also see http://eggs.org. - - .. _URI: http:/spam.org - .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" - -The bracketed text '[Publisher]' may be problematic with -StructuredText (syntax 2 & 3). - -reStructuredText's syntax (#4) is definitely the most readable. The -text is separated from the link URI and the footnote, resulting in -cleanly readable text. - -.. _StructuredText: - http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage -.. _Setext: http://docutils.sourceforge.net/mirror/setext.html -.. _reStructuredText: http://docutils.sourceforge.net/rst.html -.. _detailed description: - http://www.tibsnjoan.demon.co.uk/STNG-format.html -.. _STMinus: http://www.cis.upenn.edu/~edloper/pydoc/stminus.html -.. _StructuredTextNG: - http://dev.zope.org/Members/jim/StructuredTextWiki/StructuredTextNG -.. _README: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/ - python/python/dist/src/README -.. _Emacs table mode: http://table.sourceforge.net/ -.. _reStructuredText Markup Specification: reStructuredText.html - - -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - End: diff --git a/docutils/docs/dev/semantics.txt b/docutils/docs/dev/semantics.txt deleted file mode 100644 index f6ec09ebd..000000000 --- a/docutils/docs/dev/semantics.txt +++ /dev/null @@ -1,47 +0,0 @@ -===================== - Docstring Semantics -===================== -:Author: David Goodger -:Contact: goodger@users.sourceforge.net -:Revision: $Revision$ -:Date: $Date$ - -These are notes for a possible future PEP providing the final piece of -the Python docstring puzzle. - -PythonDoc -========= - -A Python version of the JavaDoc semantics (not syntax). A set of -conventions which are understood by the Docutils. - -- Can we extract comments from parsed modules? Could be handy for - documenting function/method parameters:: - - def method(self, - source, # path of input file - dest # path of output file - ): - - This would save having to repeat parameter names in the docstring. - - Idea from Mark Hammond's 1998-06-23 Doc-SIG post, "Re: [Doc-SIG] - Documentation tool": - - it would be quite hard to add a new param to this method without - realising you should document it - -- Use field lists or definition lists for "tagged blocks". - -- Frederic Giacometti's "iPhrase Python documentation conventions" is - an attachment to his Doc-SIG post of 2001-05-30 - (http://mail.python.org/pipermail/doc-sig/2001-May/001840.html). - - -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - End: diff --git a/docutils/docs/dev/todo.txt b/docutils/docs/dev/todo.txt deleted file mode 100644 index 5c87d280d..000000000 --- a/docutils/docs/dev/todo.txt +++ /dev/null @@ -1,388 +0,0 @@ -================ - Docutils Notes -================ -:Date: $Date$ -:Revision: $Revision$ - -.. contents:: - -To Do -===== - -General -------- - -- Document! - - - Internal module documentation. - - - User docs. - - - Doctree nodes (DTD element) semantics: - - - External (public) attributes (node.attributes). - - Internal attributes (node.*). - - Linking mechanism. - -- Refactor - - - Rename methods & variables according to the `coding conventions`_ - below. - - - The name->id conversion and hyperlink resolution code needs to be - checked for correctness and refactored. I'm afraid it's a bit of - a spaghetti mess now. - - - Change application error exceptions to use "StandardError" as - their base class instead of "Exception". - -- Add validation? See http://pytrex.sourceforge.net, RELAX NG. - -- Ask Python-dev for opinions (GvR for a pronouncement) on special - variables (__author__, __version__, etc.): convenience vs. namespace - pollution. Ask opinions on whether or not Docutils should recognize - & use them. - -- Provide a mechanism to pass options to Readers, Writers, and Parsers - through docutils.core.publish/Publisher? Or create custom - Reader/Writer/Parser objects first, and pass *them* to - publish/Publisher? - -- In reader.get_reader_class (& parser & writer too), should we be - importing 'standalone' or 'docutils.readers.standalone'? (This would - avoid importing top-level modules if the module name is not in - docutils/readers. Potential nastiness.) - -- Perhaps store a name->id mapping file? This could be stored - permanently, read by subsequent processing runs, and updated with - new entries. ("Persistent ID mapping"?) - -- The "Docutils System Messages" section appears even if no actual - system messages are there. They must be below the threshold. The - transform should be fixed. - -- TOC transform: use alt-text for inline images. - -- Implement a PEP reader. - -- Add support for character set encodings on input & output, Unicode - internally. Need a Unicode -> HTML entities codec for HTML writer? - - -Specification -------------- - -- Complete PEP 258 Docutils Design Specification. - - - Fill in the blanks in API details. - - - Specify the nodes.py internal data structure implementation. - - [Tibs:] Eventually we need to have direct documentation in - there on how it all hangs together - the DTD is not enough - (indeed, is it still meant to be correct? [Yes, it is.]). - -- Rework PEP 257, separating style from spec from tools, wrt Docutils? - See Doc-SIG from 2001-06-19/20. - -- Add layout component to framework? Or part of the formatter? - -- Once doctree.txt is fleshed out, how about breaking (most of) it up - and putting it into nodes.py as docstrings? - - -reStructuredText Parser ------------------------ - -- Add motivation sections for constructs in spec. - -- Allow very long titles (on two or more lines)? - -- And for the sake of completeness, should definition list terms be - allowed to be very long (two or more lines) also? - -- Allow hyperlink references to targets in other documents? Not in an - HTML-centric way, though (it's trivial to say - ``http://www.whatever.com/doc#name``, and useless in non-HTML - contexts). XLink/XPointer? ``.. baseref::``? See Doc-SIG - 2001-08-10. - -- Add character processing? For example: - - - ``--`` -> em-dash (or ``--`` -> en-dash, and ``---`` -> em-dash). - (Look for pre-existing conventions.) - - Convert quotes to curly quote entities. (Essentially impossible - for HTML? Unnecessary for TeX. An output issue?) - - Various forms of ``:-)`` to smiley icons. - - ``"\ "`` -> . - - Escaped newlines -> <BR>. - - Escaped period or quote as a disappearing catalyst to allow - character-level inline markup? - - Others? - - How to represent character entities in the text though? Probably as - Unicode. - - Which component is responsible for this, the parser, the reader, or - the writer? - -- Implement the header row separator modification to table.el. (Wrote - to Takaaki Ota & the table.el mailing list on 2001-08-12, suggesting - support for '=====' header rows. On 2001-08-17 he replied, saying - he'd put it on his to-do list, but "don't hold your breath".) - -- Tony says inline markup rule 7 could do with a *little* more - exposition in the spec, to make clear what is going on for people - with head colds. - -- Alan Jaffray suggested (and I agree) that it would be sensible to: - - - have a directive to specify a default role for interpreted text - - allow the reST processor to take an argument for the default role - - issue a warning when processing documents with no default role - which contain interpreted text with no explicitly specified role - -- Fix the parser's indentation handling to conform with the stricter - definition in the spec. (Explicit markup blocks should be strict or - forgiving?) - -- Tighten up the spec for indentation of "constructs using complex - markers": field lists and option lists? Bodies may begin on the - same line as the marker or on a subsequent line (with blank lines - optional). Require that for bodies beginning on the same line as - the marker, all lines be in strict alignment. Currently, this is - acceptable:: - - :Field-name-of-medium-length: Field body beginning on the same - line as the field name. - - This proposal would make the above example illegal, instead - requiring strict alignment. A field body may either begin on the - same line:: - - :Field-name-of-medium-length: Field body beginning on the same - line as the field name. - - Or it may begin on a subsequent line:: - - :Field-name-of-medium-length: - Field body beginning on a line subsequent to that of the - field name. - - This would be especially relevant in degenerate cases like this:: - - :Number-of-African-swallows-requried-to-carry-a-coconut: - It would be very difficult to align the field body with - the left edge of the first line if it began on the same - line as the field name. - -- Allow syntax constructs to be added or disabled at run-time. - -- Make footnotes two-way, GNU-style? What if there are multiple - references to a single footnote? - -- Add RFC-2822 header parsing (for PEP, email Readers). - -- Change ``.. meta::`` to use a "pending" element, only activated for - HTML writers (done). Add reader/writer/parser attributes to pending - elements (done). Figure out a way for transforms to know what - reader/writer is in control (@@@ in docutils.transforms.components). - -- Allow for variant styles by interpreting indented lists as if they - weren't indented? For example, currently the list below will be - parsed as a list within a block quote:: - - paragraph - - * list item 1 - * list item 2 - - But a lot of people seem to write that way, and HTML browsers make - it look as if that's the way it should be. The parser could check - the contents of block quotes, and if they contain only a single - list, remove the block quote wrapper. There would be two problems: - - 1. What if we actually *do* want a list inside a block quote? - - 2. What if such a list comes immediately after an indented - construct, such as a literal block? - - Both could be solved using empty comments (problem 2 already exists - for a block quote after a literal block). But that's a hack. - - See the Doc-SIG discussion starting 2001-04-18 with Ed Loper's - "Structuring: a summary; and an attempt at EBNF", item 4. - -- Make the parser modular. Be able to turn on/off constructs, add - them at run time. Or is subclassing enough? - - -Directives -`````````` - -- Allow directives to be added at run-time. - -- Use the language module for directive attribute names? - -- Add more attributes to the image directive: align, border? - -- Implement directives: - - - html.imagemap - - - components.endnotes, .citations, .topic, .sectnum (section - numbering; add support to .contents; could be cmdline option also) - - - misc.raw - - - misc.include: ``#include`` one file in another. But how to - parse wrt sections, reference names, conflicts? - - - misc.exec: Execute Python code & insert the results. Perhaps - dangerous? - - - misc.eval: Evaluate an expression & insert the text. At parse - time or at substitution time? - - - block.qa: Questions & Answers. Implement as a generic two-column - marked list? Or as a standalone construct? - - - block.columns: Multi-column table/list, with number of columns as - argument. - - - block.verse: Paragraphs with linebreaks preserved. A directive - would be easy; what about a literal-block-like prefix, perhaps - ';;'? E.g.:: - - Take it away, Eric the orchestra leader! ;; - - Half a bee, - Philosophically, - Must ipso-facto - Half not be. - You see? - - ... - - - colorize.python: Colorize Python code. Fine for HTML output, but - what about other formats? Revert to a literal block? Do we need - some kind of "alternate" mechanism? Perhaps use a "pending" - transform, which could switch its output based on the "format" in - use. Use a factory function "transformFF()" which returns either - "HTMLTransform()" instance or "GenericTransform" instance? - - - text.date: Datestamp. For substitutions. - - - Combined with misc.include, implement canned macros? - - -Unimplemented Transforms ------------------------- - -- Footnote Gathering - - Collect and move footnotes to the end of a document. - -- Hyperlink Target Gathering - - It probably comes in two phases, because in a Python context we need - to *resolve* them on a per-docstring basis [do we? --DG], but if the - user is trying to do the callout form of presentation, they would - then want to group them all at the end of the document. - -- Reference Merging - - When merging two or more subdocuments (such as docstrings), - conflicting references may need to be resolved. There may be: - - - duplicate reference and/or substitution names that need to be made - unique; and/or - - duplicate footnote numbers that need to be renumbered. - - Should this be done before or after reference-resolving transforms - are applied? What about references from within one subdocument to - inside another? - -- Document Splitting - - If the processed document is written to multiple files (possibly in - a directory tree), it will need to be split up. References will - have to be adjusted. - - (HTML only?) - -- Navigation - - If a document is split up, each segment will need navigation links: - parent, children (small TOC), previous (preorder), next (preorder). - -- Index - - -HTML Writer ------------ - -- Considerations for an HTML Writer [#]_: - - - Boolean attributes. ``<element boolean>`` is good, ``<element - boolean="boolean">`` is bad. Use a special value in attribute - mappings, such as ``None``? - - - Escape double-dashes inside comments. - - - Put the language code into an appropriate element's LANG - attribute (<HTML>?). - - - Docutils identifiers (the "class" and "id" attributes) will - conform to the regular expression ``[a-z][-a-z0-9]*``. See - ``docutils.utils.id()``. - - .. [#] Source: `HTML 4.0 in Netscape and Explorer`__. - __ http://www.webreference.com/dev/html4nsie/index.html - -- Allow for style sheet info to be passed in, either as a <LINK>, or - as embedded style info. - -- Construct a templating system, as in ht2html/yaptu, using directives - and substitutions for dynamic stuff. - -- Improve the granularity of document parts in the HTML writer, so - that one could just grab the parts needed. - -- Return a string instead of writing to a file? Leave all I/O up to - the client? - -- Use lowercase element & attribute names for XHTML compatibility? - - -Coding Conventions -================== - -This project shall follow the generic coding conventions as specified -in the `Style Guide for Python Code`__ and `Docstring Conventions`__ -PEPs, with the following clarifications: - -- 4 spaces per indentation level. No tabs. -- No one-liner compound statements (i.e., no ``if x: return``: use two - lines & indentation), except for degenerate class or method - definitions (i.e., ``class X: pass`` is O.K.). -- Lines should be no more than 78 or 79 characters long. -- "CamelCase" shall be used for class names. -- Use "lowercase" or "lowercase_with_underscores" for function, - method, and variable names. For short names, maximum two joined - words, use lowercase (e.g. 'tagname'). For long names with three or - more joined words, or where it's hard to parse the split between two - words, use lowercase_with_underscores (e.g., 'note_explicit_target', - 'explicit_target'). - -__ http://www.python.org/peps/pep-0008.html -__ http://www.python.org/peps/pep-0257.html - - -.. - Local Variables: - mode: indented-text - indent-tabs-mode: nil - sentence-end-double-space: t - fill-column: 70 - End: |