diff options
Diffstat (limited to 'docs/dev/rst')
-rw-r--r-- | docs/dev/rst/alternatives.txt | 3129 | ||||
-rw-r--r-- | docs/dev/rst/problems.txt | 872 |
2 files changed, 4001 insertions, 0 deletions
diff --git a/docs/dev/rst/alternatives.txt b/docs/dev/rst/alternatives.txt new file mode 100644 index 000000000..12874c5fb --- /dev/null +++ b/docs/dev/rst/alternatives.txt @@ -0,0 +1,3129 @@ +================================================== + A Record of reStructuredText Syntax Alternatives +================================================== + +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +The following are ideas, alternatives, and justifications that were +considered for reStructuredText syntax, which did not originate with +Setext_ or StructuredText_. For an analysis of constructs which *did* +originate with StructuredText or Setext, please see `Problems With +StructuredText`_. See the `reStructuredText Markup Specification`_ +for full details of the established syntax. + +The ideas are divided into sections: + +* Implemented_: already done. The issues and alternatives are + recorded here for posterity. + +* `Not Implemented`_: these ideas won't be implemented. + +* Tabled_: these ideas should be revisited in the future. + +* `To Do`_: these ideas should be implemented. They're just waiting + for a champion to resolve issues and get them done. + +* `... Or Not To Do?`_: possible but questionable. These probably + won't be implemented, but you never know. + +.. _Setext: http://docutils.sourceforge.net/mirror/setext.html +.. _StructuredText: + http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/FrontPage +.. _Problems with StructuredText: problems.html +.. _reStructuredText Markup Specification: + ../../ref/rst/restructuredtext.html + + +.. contents:: + +------------- + Implemented +------------- + +Field Lists +=========== + +Prior to the syntax for field lists being finalized, several +alternatives were proposed. + +1. Unadorned RFC822_ everywhere:: + + Author: Me + Version: 1 + + Advantages: clean, precedent (RFC822-compliant). Disadvantage: + ambiguous (these paragraphs are a prime example). + + Conclusion: rejected. + +2. Special case: use unadorned RFC822_ for the very first or very last + text block of a document:: + + """ + Author: Me + Version: 1 + + The rest of the document... + """ + + Advantages: clean, precedent (RFC822-compliant). Disadvantages: + special case, flat (unnested) field lists only, still ambiguous:: + + """ + Usage: cmdname [options] arg1 arg2 ... + + We obviously *don't* want the like above to be interpreted as a + field list item. Or do we? + """ + + Conclusion: rejected for the general case, accepted for specific + contexts (PEPs, email). + +3. Use a directive:: + + .. fields:: + + Author: Me + Version: 1 + + Advantages: explicit and unambiguous, RFC822-compliant. + Disadvantage: cumbersome. + + Conclusion: rejected for the general case (but such a directive + could certainly be written). + +4. Use Javadoc-style:: + + @Author: Me + @Version: 1 + @param a: integer + + Advantages: unambiguous, precedent, flexible. Disadvantages: + non-intuitive, ugly, not RFC822-compliant. + + Conclusion: rejected. + +5. Use leading colons:: + + :Author: Me + :Version: 1 + + Advantages: unambiguous, obvious (*almost* RFC822-compliant), + flexible, perhaps even elegant. Disadvantages: no precedent, not + quite RFC822-compliant. + + Conclusion: accepted! + +6. Use double colons:: + + Author:: Me + Version:: 1 + + Advantages: unambiguous, obvious? (*almost* RFC822-compliant), + flexible, similar to syntax already used for literal blocks and + directives. Disadvantages: no precedent, not quite + RFC822-compliant, similar to syntax already used for literal blocks + and directives. + + Conclusion: rejected because of the syntax similarity & conflicts. + +Why is RFC822 compliance important? It's a universal Internet +standard, and super obvious. Also, I'd like to support the PEP format +(ulterior motive: get PEPs to use reStructuredText as their standard). +But it *would* be easy to get used to an alternative (easy even to +convert PEPs; probably harder to convert python-deviants ;-). + +Unfortunately, without well-defined context (such as in email headers: +RFC822 only applies before any blank lines), the RFC822 format is +ambiguous. It is very common in ordinary text. To implement field +lists unambiguously, we need explicit syntax. + +The following question was posed in a footnote: + + Should "bibliographic field lists" be defined at the parser level, + or at the DPS transformation level? In other words, are they + reStructuredText-specific, or would they also be applicable to + another (many/every other?) syntax? + +The answer is that bibliographic fields are a +reStructuredText-specific markup convention. Other syntaxes may +implement the bibliographic elements explicitly. For example, there +would be no need for such a transformation for an XML-based markup +syntax. + +.. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt + + +Interpreted Text "Roles" +======================== + +The original purpose of interpreted text was as a mechanism for +descriptive markup, to describe the nature or role of a word or +phrase. For example, in XML we could say "<function>len</function>" +to mark up "len" as a function. It is envisaged that within Python +docstrings (inline documentation in Python module source files, the +primary market for reStructuredText) the role of a piece of +interpreted text can be inferred implicitly from the context of the +docstring within the program source. For other applications, however, +the role may have to be indicated explicitly. + +Interpreted text is enclosed in single backquotes (`). + +1. Initially, it was proposed that an explicit role could be indicated + as a word or phrase within the enclosing backquotes: + + - As a prefix, separated by a colon and whitespace:: + + `role: interpreted text` + + - As a suffix, separated by whitespace and a colon:: + + `interpreted text :role` + + There are problems with the initial approach: + + - There could be ambiguity with interpreted text containing colons. + For example, an index entry of "Mission: Impossible" would + require a backslash-escaped colon. + + - The explicit role is descriptive markup, not content, and will + not be visible in the processed output. Putting it inside the + backquotes doesn't feel right; the *role* isn't being quoted. + +2. Tony Ibbs suggested that the role be placed outside the + backquotes:: + + role:`prefix` or `suffix`:role + + This removes the embedded-colons ambiguity, but limits the role + identifier to be a single word (whitespace would be illegal). + Since roles are not meant to be visible after processing, the lack + of whitespace support is not important. + + The suggested syntax remains ambiguous with respect to ratios and + some writing styles. For example, suppose there is a "signal" + identifier, and we write:: + + ...calculate the `signal`:noise ratio. + + "noise" looks like a role. + +3. As an improvement on #2, we can bracket the role with colons:: + + :role:`prefix` or `suffix`:role: + + This syntax is similar to that of field lists, which is fine since + both are doing similar things: describing. + + This is the syntax chosen for reStructuredText. + +4. Another alternative is two colons instead of one:: + + role::`prefix` or `suffix`::role + + But this is used for analogies ("A:B::C:D": "A is to B as C is to + D"). + + Both alternative #2 and #4 lack delimiters on both sides of the + role, making it difficult to parse (by the reader). + +5. Some kind of bracketing could be used: + + - Parentheses:: + + (role)`prefix` or `suffix`(role) + + - Braces:: + + {role}`prefix` or `suffix`{role} + + - Square brackets:: + + [role]`prefix` or `suffix`[role] + + - Angle brackets:: + + <role>`prefix` or `suffix`<role> + + (The overlap of \*ML tags with angle brackets would be too + confusing and precludes their use.) + +Syntax #3 was chosen for reStructuredText. + + +Comments +======== + +A problem with comments (actually, with all indented constructs) is +that they cannot be followed by an indented block -- a block quote -- +without swallowing it up. + +I thought that perhaps comments should be one-liners only. But would +this mean that footnotes, hyperlink targets, and directives must then +also be one-liners? Not a good solution. + +Tony Ibbs suggested a "comment" directive. I added that we could +limit a comment to a single text block, and that a "multi-block +comment" could use "comment-start" and "comment-end" directives. This +would remove the indentation incompatibility. A "comment" directive +automatically suggests "footnote" and (hyperlink) "target" directives +as well. This could go on forever! Bad choice. + +Garth Kidd suggested that an "empty comment", a ".." explicit markup +start with nothing on the first line (except possibly whitespace) and +a blank line immediately following, could serve as an "unindent". An +empty comment does **not** swallow up indented blocks following it, +so block quotes are safe. "A tiny but practical wart." Accepted. + + +Anonymous Hyperlinks +==================== + +Alan Jaffray came up with this idea, along with the following syntax:: + + Search the `Python DOC-SIG mailing list archives`{}_. + + .. _: http://mail.python.org/pipermail/doc-sig/ + +The idea is sound and useful. I suggested a "double underscore" +syntax:: + + Search the `Python DOC-SIG mailing list archives`__. + + .. __: http://mail.python.org/pipermail/doc-sig/ + +But perhaps single underscores are okay? The syntax looks better, but +the hyperlink itself doesn't explicitly say "anonymous":: + + Search the `Python DOC-SIG mailing list archives`_. + + .. _: http://mail.python.org/pipermail/doc-sig/ + +Mixing anonymous and named hyperlinks becomes confusing. The order of +targets is not significant for named hyperlinks, but it is for +anonymous hyperlinks:: + + Hyperlinks: anonymous_, named_, and another anonymous_. + + .. _named: named + .. _: anonymous1 + .. _: anonymous2 + +Without the extra syntax of double underscores, determining which +hyperlink references are anonymous may be difficult. We'd have to +check which references don't have corresponding targets, and match +those up with anonymous targets. Keeping to a simple consistent +ordering (as with auto-numbered footnotes) seems simplest. + +reStructuredText will use the explicit double-underscore syntax for +anonymous hyperlinks. An alternative (see `Reworking Explicit Markup +(Round 1)`_ below) for the somewhat awkward ".. __:" syntax is "__":: + + An anonymous__ reference. + + __ http://anonymous + + +Reworking Explicit Markup (Round 1) +=================================== + +Alan Jaffray came up with the idea of `anonymous hyperlinks`_, added +to reStructuredText. Subsequently it was asserted that hyperlinks +(especially anonymous hyperlinks) would play an increasingly important +role in reStructuredText documents, and therefore they require a +simpler and more concise syntax. This prompted a review of the +current and proposed explicit markup syntaxes with regards to +improving usability. + +1. Original syntax:: + + .. _blah: internal hyperlink target + .. _blah: http://somewhere external hyperlink target + .. _blah: blahblah_ indirect hyperlink target + .. __: anonymous internal target + .. __: http://somewhere anonymous external target + .. __: blahblah_ anonymous indirect target + .. [blah] http://somewhere footnote + .. blah:: http://somewhere directive + .. blah: http://somewhere comment + + .. Note:: + + The comment text was intentionally made to look like a hyperlink + target. + + Origins: + + * Except for the colon (a delimiter necessary to allow for + phrase-links), hyperlink target ``.. _blah:`` comes from Setext. + * Comment syntax from Setext. + * Footnote syntax from StructuredText ("named links"). + * Directives and anonymous hyperlinks original to reStructuredText. + + Advantages: + + + Consistent explicit markup indicator: "..". + + Consistent hyperlink syntax: ".. _" & ":". + + Disadvantages: + + - Anonymous target markup is awkward: ".. __:". + - The explicit markup indicator ("..") is excessively overloaded? + - Comment text is limited (can't look like a footnote, hyperlink, + or directive). But this is probably not important. + +2. Alan Jaffray's proposed syntax #1:: + + __ _blah internal hyperlink target + __ blah: http://somewhere external hyperlink target + __ blah: blahblah_ indirect hyperlink target + __ anonymous internal target + __ http://somewhere anonymous external target + __ blahblah_ anonymous indirect target + __ [blah] http://somewhere footnote + .. blah:: http://somewhere directive + .. blah: http://somewhere comment + + The hyperlink-connoted underscores have become first-level syntax. + + Advantages: + + + Anonymous targets are simpler. + + All hyperlink targets are one character shorter. + + Disadvantages: + + - Inconsistent internal hyperlink targets. Unlike all other named + hyperlink targets, there's no colon. There's an extra leading + underscore, but we can't drop it because without it, "blah" looks + like a relative URI. Unless we restore the colon:: + + __ blah: internal hyperlink target + + - Obtrusive markup? + +3. Alan Jaffray's proposed syntax #2:: + + .. _blah internal hyperlink target + .. blah: http://somewhere external hyperlink target + .. blah: blahblah_ indirect hyperlink target + .. anonymous internal target + .. http://somewhere anonymous external target + .. blahblah_ anonymous indirect target + .. [blah] http://somewhere footnote + !! blah: http://somewhere directive + ## blah: http://somewhere comment + + Leading underscores have been (almost) replaced by "..", while + comments and directives have gained their own syntax. + + Advantages: + + + Anonymous hyperlinks are simpler. + + Unique syntax for comments. Connotation of "comment" from + some programming languages (including our favorite). + + Unique syntax for directives. Connotation of "action!". + + Disadvantages: + + - Inconsistent internal hyperlink targets. Again, unlike all other + named hyperlink targets, there's no colon. There's a leading + underscore, matching the trailing underscores of references, + which no other hyperlink targets have. We can't drop that one + leading underscore though: without it, "blah" looks like a + relative URI. Again, unless we restore the colon:: + + .. blah: internal hyperlink target + + - All (except for internal) hyperlink targets lack their leading + underscores, losing the "hyperlink" connotation. + + - Obtrusive syntax for comments. Alternatives:: + + ;; blah: http://somewhere + (also comment syntax in Lisp & others) + ,, blah: http://somewhere + ("comma comma": sounds like "comment"!) + + - Iffy syntax for directives. Alternatives? + +4. Tony Ibbs' proposed syntax:: + + .. _blah: internal hyperlink target + .. _blah: http://somewhere external hyperlink target + .. _blah: blahblah_ indirect hyperlink target + .. anonymous internal target + .. http://somewhere anonymous external target + .. blahblah_ anonymous indirect target + .. [blah] http://somewhere footnote + .. blah:: http://somewhere directive + .. blah: http://somewhere comment + + This is the same as the current syntax, except for anonymous + targets which drop their "__: ". + + Advantage: + + + Anonymous targets are simpler. + + Disadvantages: + + - Anonymous targets lack their leading underscores, losing the + "hyperlink" connotation. + - Anonymous targets are almost indistinguishable from comments. + (Better to know "up front".) + +5. David Goodger's proposed syntax: Perhaps going back to one of + Alan's earlier suggestions might be the best solution. How about + simply adding "__ " as a synonym for ".. __: " in the original + syntax? These would become equivalent:: + + .. __: anonymous internal target + .. __: http://somewhere anonymous external target + .. __: blahblah_ anonymous indirect target + + __ anonymous internal target + __ http://somewhere anonymous external target + __ blahblah_ anonymous indirect target + +Alternative 5 has been adopted. + + +Backquotes in Phrase-Links +========================== + +[From a 2001-06-05 Doc-SIG post in reply to questions from Doug +Hellmann.] + +The first draft of the spec, posted to the Doc-SIG in November 2000, +used square brackets for phrase-links. I changed my mind because: + +1. In the first draft, I had already decided on single-backquotes for + inline literal text. + +2. However, I wanted to minimize the necessity for backslash escapes, + for example when quoting Python repr-equivalent syntax that uses + backquotes. + +3. The processing of identifiers (function/method/attribute/module + etc. names) into hyperlinks is a useful feature. PyDoc recognizes + identifiers heuristically, but it doesn't take much imagination to + come up with counter-examples where PyDoc's heuristics would result + in embarassing failure. I wanted to do it deterministically, and + that called for syntax. I called this construct "interpreted + text". + +4. Leveraging off the ``*emphasis*/**strong**`` syntax, lead to the + idea of using double-backquotes as syntax. + +5. I worked out some rules for inline markup recognition. + +6. In combination with #5, double backquotes lent themselves to inline + literals, neatly satisfying #2, minimizing backslash escapes. In + fact, the spec says that no interpretation of any kind is done + within double-backquote inline literal text; backslashes do *no* + escaping within literal text. + +7. Single backquotes are then freed up for interpreted text. + +8. I already had square brackets required for footnote references. + +9. Since interpreted text will typically turn into hyperlinks, it was + a natural fit to use backquotes as the phrase-quoting syntax for + trailing-underscore hyperlinks. + +The original inspiration for the trailing underscore hyperlink syntax +was Setext. But for phrases Setext used a very cumbersome +``underscores_between_words_like_this_`` syntax. + +The underscores can be viewed as if they were right-pointing arrows: +``-->``. So ``hyperlink_`` points away from the reference, and +``.. _hyperlink:`` points toward the target. + + +Substitution Mechanism +====================== + +Substitutions arose out of a Doc-SIG thread begun on 2001-10-28 by +Alan Jaffray, "reStructuredText inline markup". It reminded me of a +missing piece of the reStructuredText puzzle, first referred to in my +contribution to "Documentation markup & processing / PEPs" (Doc-SIG +2001-06-21). + +Substitutions allow the power and flexibility of directives to be +shared by inline text. They are a way to allow arbitrarily complex +inline objects, while keeping the details out of the flow of text. +They are the equivalent of SGML/XML's named entities. For example, an +inline image (using reference syntax alternative 4d (vertical bars) +and definition alternative 3, the alternatives chosen for inclusion in +the spec):: + + The |biohazard| symbol must be used on containers used to dispose + of medical waste. + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + +The ``|biohazard|`` substitution reference will be replaced in-line by +whatever the ``.. |biohazard|`` substitution definition generates (in +this case, an image). A substitution definition contains the +substitution text bracketed with vertical bars, followed by a an +embedded inline-compatible directive, such as "image". A transform is +required to complete the substitution. + +Syntax alternatives for the reference: + +1. Use the existing interpreted text syntax, with a predefined role + such as "sub":: + + The `biohazard`:sub: symbol... + + Advantages: existing syntax, explicit. Disadvantages: verbose, + obtrusive. + +2. Use a variant of the interpreted text syntax, with a new suffix + akin to the underscore in phrase-link references:: + + (a) `name`@ + (b) `name`# + (c) `name`& + (d) `name`/ + (e) `name`< + (f) `name`:: + (g) `name`: + + + Due to incompatibility with other constructs and ordinary text + usage, (f) and (g) are not possible. + +3. Use interpreted text syntax with a fixed internal format:: + + (a) `:name:` + (b) `name:` + (c) `name::` + (d) `::name::` + (e) `%name%` + (f) `#name#` + (g) `/name/` + (h) `&name&` + (i) `|name|` + (j) `[name]` + (k) `<name>` + (l) `&name;` + (m) `'name'` + + To avoid ML confusion (k) and (l) are definitely out. Square + brackets (j) won't work in the target (the substitution definition + would be indistinguishable from a footnote). + + The ```/name/``` syntax (g) is reminiscent of "s/find/sub" + substitution syntax in ed-like languages. However, it may have a + misleading association with regexps, and looks like an absolute + POSIX path. (i) is visually equivalent and lacking the + connotations. + + A disadvantage of all of these is that they limit interpreted text, + albeit only slightly. + +4. Use specialized syntax, something new:: + + (a) #name# + (b) @name@ + (c) /name/ + (d) |name| + (e) <<name>> + (f) //name// + (g) ||name|| + (h) ^name^ + (i) [[name]] + (j) ~name~ + (k) !name! + (l) =name= + (m) ?name? + (n) >name< + + "#" (a) and "@" (b) are obtrusive. "/" (c) without backquotes + looks just like a POSIX path; it is likely for such usage to appear + in text. + + "|" (d) and "^" (h) are feasible. + +5. Redefine the trailing underscore syntax. See definition syntax + alternative 4, below. + +Syntax alternatives for the definition: + +1. Use the existing directive syntax, with a predefined directive such + as "sub". It contains a further embedded directive resolving to an + inline-compatible object:: + + .. sub:: biohazard + .. image:: biohazard.png + [height=20 width=20] + + .. sub:: parrot + That bird wouldn't *voom* if you put 10,000,000 volts + through it! + + The advantages and disadvantages are the same as in inline + alternative 1. + +2. Use syntax as in #1, but with an embedded directivecompressed:: + + .. sub:: biohazard image:: biohazard.png + [height=20 width=20] + + This is a bit better than alternative 1, but still too much. + +3. Use a variant of directive syntax, incorporating the substitution + text, obviating the need for a special "sub" directive name. If we + assume reference alternative 4d (vertical bars), the matching + definition would look like this:: + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + +4. (Suggested by Alan Jaffray on Doc-SIG from 2001-11-06.) + + Instead of adding new syntax, redefine the trailing underscore + syntax to mean "substitution reference" instead of "hyperlink + reference". Alan's example:: + + I had lunch with Jonathan_ today. We talked about Zope_. + + .. _Jonathan: lj [user=jhl] + .. _Zope: http://www.zope.org/ + + A problem with the proposed syntax is that URIs which look like + simple reference names (alphanum plus ".", "-", "_") would be + indistinguishable from substitution directive names. A more + consistent syntax would be:: + + I had lunch with Jonathan_ today. We talked about Zope_. + + .. _Jonathan: lj:: user=jhl + .. _Zope: http://www.zope.org/ + + (``::`` after ``.. _Jonathan: lj``.) + + The "Zope" target is a simple external hyperlink, but the + "Jonathan" target contains a directive. Alan proposed is that the + reference text be replaced by whatever the referenced directive + (the "directive target") produces. A directive reference becomes a + hyperlink reference if the contents of the directive target resolve + to a hyperlink. If the directive target resolves to an icon, the + reference is replaced by an inline icon. If the directive target + resolves to a hyperlink, the directive reference becomes a + hyperlink reference. + + This seems too indirect and complicated for easy comprehension. + + The reference in the text will sometimes become a link, sometimes + not. Sometimes the reference text will remain, sometimes not. We + don't know *at the reference*:: + + This is a `hyperlink reference`_; its text will remain. + This is an `inline icon`_; its text will disappear. + + That's a problem. + +The syntax that has been incorporated into the spec and parser is +reference alternative 4d with definition alternative 3:: + + The |biohazard| symbol... + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + +We can also combine substitution references with hyperlink references, +by appending a "_" (named hyperlink reference) or "__" (anonymous +hyperlink reference) suffix to the substitution reference. This +allows us to click on an image-link:: + + The |biohazard|_ symbol... + + .. |biohazard| image:: biohazard.png + [height=20 width=20] + .. _biohazard: http://www.cdc.gov/ + +There have been several suggestions for the naming of these +constructs, originally called "substitution references" and +"substitutions". + +1. Candidate names for the reference construct: + + (a) substitution reference + (b) tagging reference + (c) inline directive reference + (d) directive reference + (e) indirect inline directive reference + (f) inline directive placeholder + (g) inline directive insertion reference + (h) directive insertion reference + (i) insertion reference + (j) directive macro reference + (k) macro reference + (l) substitution directive reference + +2. Candidate names for the definition construct: + + (a) substitution + (b) substitution directive + (c) tag + (d) tagged directive + (e) directive target + (f) inline directive + (g) inline directive definition + (h) referenced directive + (i) indirect directive + (j) indirect directive definition + (k) directive definition + (l) indirect inline directive + (m) named directive definition + (n) inline directive insertion definition + (o) directive insertion definition + (p) insertion definition + (q) insertion directive + (r) substitution definition + (s) directive macro definition + (t) macro definition + (u) substitution directive definition + (v) substitution definition + +"Inline directive reference" (1c) seems to be an appropriate term at +first, but the term "inline" is redundant in the case of the +reference. Its counterpart "inline directive definition" (2g) is +awkward, because the directive definition itself is not inline. + +"Directive reference" (1d) and "directive definition" (2k) are too +vague. "Directive definition" could be used to refer to any +directive, not just those used for inline substitutions. + +One meaning of the term "macro" (1k, 2s, 2t) is too +programming-language-specific. Also, macros are typically simple text +substitution mechanisms: the text is substituted first and evaluated +later. reStructuredText substitution definitions are evaluated in +place at parse time and substituted afterwards. + +"Insertion" (1h, 1i, 2n-2q) is almost right, but it implies that +something new is getting added rather than one construct being +replaced by another. + +Which brings us back to "substitution". The overall best names are +"substitution reference" (1a) and "substitution definition" (2v). A +long way to go to add one word! + + +Inline External Targets +======================= + +Currently reStructuredText has two hyperlink syntax variations: + +* Named hyperlinks:: + + This is a named reference_ of one word ("reference"). Here is + a `phrase reference`_. Phrase references may even cross `line + boundaries`_. + + .. _reference: http://www.example.org/reference/ + .. _phrase reference: http://www.example.org/phrase_reference/ + .. _line boundaries: http://www.example.org/line_boundaries/ + + + Advantages: + + - The plaintext is readable. + - Each target may be reused multiple times (e.g., just write + ``"reference_"`` again). + - No syncronized ordering of references and targets is necessary. + + + Disadvantages: + + - The reference text must be repeated as target names; could lead + to mistakes. + - The target URLs may be located far from the references, and hard + to find in the plaintext. + +* Anonymous hyperlinks (in current reStructuredText):: + + This is an anonymous reference__. Here is an anonymous + `phrase reference`__. Phrase references may even cross `line + boundaries`__. + + __ http://www.example.org/reference/ + __ http://www.example.org/phrase_reference/ + __ http://www.example.org/line_boundaries/ + + + Advantages: + + - The plaintext is readable. + - The reference text does not have to be repeated. + + + Disadvantages: + + - References and targets must be kept in sync. + - Targets cannot be reused. + - The target URLs may be located far from the references. + +For comparison and historical background, StructuredText also has two +syntaxes for hyperlinks: + +* First, ``"reference text":URL``:: + + This is a "reference":http://www.example.org/reference/ + of one word ("reference"). Here is a "phrase + reference":http://www.example.org/phrase_reference/. + +* Second, ``"reference text", http://example.com/absolute_URL``:: + + This is a "reference", http://www.example.org/reference/ + of one word ("reference"). Here is a "phrase reference", + http://www.example.org/phrase_reference/. + +Both syntaxes share advantages and disadvantages: + ++ Advantages: + + - The target is specified immediately adjacent to the reference. + ++ Disadvantages: + + - Poor plaintext readability. + - Targets cannot be reused. + - Both syntaxes use double quotes, common in ordinary text. + - In the first syntax, the URL and the last word are stuck + together, exacerbating the line wrap problem. + - The second syntax is too magical; text could easily be written + that way by accident (although only absolute URLs are recognized + here, perhaps because of the potential for ambiguity). + +A new type of "inline external hyperlink" has been proposed. + +1. On 2002-06-28, Simon Budig proposed__ a new syntax for + reStructuredText hyperlinks:: + + This is a reference_(http://www.example.org/reference/) of one + word ("reference"). Here is a `phrase + reference`_(http://www.example.org/phrase_reference/). Are + these examples, (single-underscore), named? If so, `anonymous + references`__(http://www.example.org/anonymous/) using two + underscores would probably be preferable. + + __ http://mail.python.org/pipermail/doc-sig/2002-June/002648.html + + The syntax, advantages, and disadvantages are similar to those of + StructuredText. + + + Advantages: + + - The target is specified immediately adjacent to the reference. + + + Disadvantages: + + - Poor plaintext readability. + - Targets cannot be reused (unless named, but the semantics are + unclear). + + + Problems: + + - The ``"`ref`_(URL)"`` syntax forces the last word of the + reference text to be joined to the URL, making a potentially + very long word that can't be wrapped (URLs can be very long). + The reference and the URL should be separate. This is a + symptom of the following point: + + - The syntax produces a single compound construct made up of two + equally important parts, *with syntax in the middle*, *between* + the reference and the target. This is unprecedented in + reStructuredText. + + - The "inline hyperlink" text is *not* a named reference (there's + no lookup by name), so it shouldn't look like one. + + - According to the IETF standards RFC 2396 and RFC 2732, + parentheses are legal URI characters and curly braces are legal + email characters, making their use prohibitively difficult. + + - The named/anonymous semantics are unclear. + +2. After an analysis__ of the syntax of (1) above, we came up with the + following compromise syntax:: + + This is an anonymous reference__ + __<http://www.example.org/reference/> of one word + ("reference"). Here is a `phrase reference`__ + __<http://www.example.org/phrase_reference/>. `Named + references`_ _<http://www.example.org/anonymous/> use single + underscores. + + __ http://mail.python.org/pipermail/doc-sig/2002-July/002670.html + + The syntax builds on that of the existing "inline internal + targets": ``an _`inline internal target`.`` + + + Advantages: + + - The target is specified immediately adjacent to the reference, + improving maintainability: + + - References and targets are easily kept in sync. + - The reference text does not have to be repeated. + + - The construct is executed in two parts: references identical to + existing references, and targets that are new but not too big a + stretch from current syntax. + + - There's overwhelming precedent for quoting URLs with angle + brackets [#]_. + + + Disadvantages: + + - Poor plaintext readability. + - Lots of "line noise". + - Targets cannot be reused (unless named; see below). + + To alleviate the readability issue slightly, we could allow the + target to appear later, such as after the end of the sentence:: + + This is a named reference__ of one word ("reference"). + __<http://www.example.org/reference/> Here is a `phrase + reference`__. __<http://www.example.org/phrase_reference/> + + Problem: this could only work for one reference at a time + (reference/target pairs must be proximate [refA trgA refB trgB], + not interleaved [refA refB trgA trgB] or nested [refA refB trgB + trgA]). This variation is too problematic; references and inline + external targets will have to be kept imediately adjacent (see (3) + below). + + The ``"reference__ __<target>"`` syntax is actually for "anonymous + inline external targets", emphasized by the double underscores. It + follows that single trailing and leading underscores would lead to + *implicitly named* inline external targets. This would allow the + reuse of targets by name. So after ``"reference_ _<target>"``, + another ``"reference_"`` would point to the same target. + + .. [#] + From RFC 2396 (URI syntax): + + The angle-bracket "<" and ">" and double-quote (") + characters are excluded [from URIs] because they are often + used as the delimiters around URI in text documents and + protocol fields. + + Using <> angle brackets around each URI is especially + recommended as a delimiting style for URI that contain + whitespace. + + From RFC 822 (email headers): + + Angle brackets ("<" and ">") are generally used to indicate + the presence of a one machine-usable reference (e.g., + delimiting mailboxes), possibly including source-routing to + the machine. + +3. If it is best for references and inline external targets to be + immediately adjacent, then they might as well be integrated. + Here's an alternative syntax embedding the target URL in the + reference:: + + This is an anonymous `reference <http://www.example.org + /reference/>`__ of one word ("reference"). Here is a `phrase + reference <http://www.example.org/phrase_reference/>`__. + + Advantages and disadvantages are similar to those in (2). + Readability is still an issue, but the syntax is a bit less + heavyweight (reduced line noise). Backquotes are required, even + for one-word references; the target URL is included within the + reference text, forcing a phrase context. + + We'll call this variant "embedded URIs". + + Problem: how to refer to a title like "HTML Anchors: <a>" (which + ends with an HTML/SGML/XML tag)? We could either require more + syntax on the target (like ``"`reference text + __<http://example.com/>`__"``), or require the odd conflicting + title to be escaped (like ``"`HTML Anchors: \<a>`__"``). The + latter seems preferable, and not too onerous. + + Similarly to (2) above, a single trailing underscore would convert + the reference & inline external target from anonymous to implicitly + named, allowing reuse of targets by name. + + I think this is the least objectionable of the syntax alternatives. + +Other syntax variations have been proposed (by Brett Cannon and Benja +Fallenstein):: + + `phrase reference`->http://www.example.com + + `phrase reference`@http://www.example.com + + `phrase reference`__ ->http://www.example.com + + `phrase reference` [-> http://www.example.com] + + `phrase reference`__ [-> http://www.example.com] + + `phrase reference` <http://www.example.com>_ + +None of these variations are clearly superior to #3 above. Some have +problems that exclude their use. + +With any kind of inline external target syntax it comes down to the +conflict between maintainability and plaintext readability. I don't +see a major problem with reStructuredText's maintainability, and I +don't want to sacrifice plaintext readability to "improve" it. + +The proponents of inline external targets want them for easily +maintainable web pages. The arguments go something like this: + +- Named hyperlinks are difficult to maintain because the reference + text is duplicated as the target name. + + To which I said, "So use anonymous hyperlinks." + +- Anonymous hyperlinks are difficult to maintain becuase the + references and targets have to be kept in sync. + + "So keep the targets close to the references, grouped after each + paragraph. Maintenance is trivial." + +- But targets grouped after paragraphs break the flow of text. + + "Surely less than URLs embedded in the text! And if the intent is + to produce web pages, not readable plaintext, then who cares about + the flow of text?" + +Many participants have voiced their objections to the proposed syntax: + + Garth Kidd: "I strongly prefer the current way of doing it. + Inline is spectactularly messy, IMHO." + + Tony Ibbs: "I vehemently agree... that the inline alternatives + being suggested look messy - there are/were good reasons they've + been taken out... I don't believe I would gain from the new + syntaxes." + + Paul Moore: "I agree as well. The proposed syntax is far too + punctuation-heavy, and any of the alternatives discussed are + ambiguous or too subtle." + +Others have voiced their support: + + fantasai: "I agree with Simon. In many cases, though certainly + not in all, I find parenthesizing the url in plain text flows + better than relegating it to a footnote." + + Ken Manheimer: "I'd like to weigh in requesting some kind of easy, + direct inline reference link." + +(Interesting that those *against* the proposal have been using +reStructuredText for a while, and those *for* the proposal are either +new to the list ["fantasai", background unknown] or longtime +StructuredText users [Ken Manheimer].) + +I was initially ambivalent/against the proposed "inline external +targets". I value reStructuredText's readability very highly, and +although the proposed syntax offers convenience, I don't know if the +convenience is worth the cost in ugliness. Does the proposed syntax +compromise readability too much, or should the choice be left up to +the author? Perhaps if the syntax is *allowed* but its use strongly +*discouraged*, for aesthetic/readability reasons? + +After a great deal of thought and much input from users, I've decided +that there are reasonable use cases for this construct. The +documentation should strongly caution against its use in most +situations, recommending independent block-level targets instead. +Syntax #3 above ("embedded URIs") will be used. + + +Doctree Representation of Transitions +===================================== + +(Although not reStructuredText-specific, this section fits best in +this document.) + +Having added the "horizontal rule" construct to the `reStructuredText +Markup Specification`_, a decision had to be made as to how to reflect +the construct in the implementation of the document tree. Given this +source:: + + Document + ======== + + Paragraph 1 + + -------- + + Paragraph 2 + +The horizontal rule indicates a "transition" (in prose terms) or the +start of a new "division". Before implementation, the parsed document +tree would be:: + + <document> + <section names="document"> + <title> + Document + <paragraph> + Paragraph 1 + -------- <--- error here + <paragraph> + Paragraph 2 + +There are several possibilities for the implementation: + +1. Implement horizontal rules as "divisions" or segments. A + "division" is a title-less, non-hierarchical section. The first + try at an implementation looked like this:: + + <document> + <section names="document"> + <title> + Document + <paragraph> + Paragraph 1 + <division> + <paragraph> + Paragraph 2 + + But the two paragraphs are really at the same level; they shouldn't + appear to be at different levels. There's really an invisible + "first division". The horizontal rule splits the document body + into two segments, which should be treated uniformly. + +2. Treating "divisions" uniformly brings us to the second + possibility:: + + <document> + <section names="document"> + <title> + Document + <division> + <paragraph> + Paragraph 1 + <division> + <paragraph> + Paragraph 2 + + With this change, documents and sections will directly contain + divisions and sections, but not body elements. Only divisions will + directly contain body elements. Even without a horizontal rule + anywhere, the body elements of a document or section would be + contained within a division element. This makes the document tree + deeper. This is similar to the way HTML_ treats document contents: + grouped within a ``<body>`` element. + +3. Implement them as "transitions", empty elements:: + + <document> + <section names="document"> + <title> + Document + <paragraph> + Paragraph 1 + <transition> + <paragraph> + Paragraph 2 + + A transition would be a "point element", not containing anything, + only identifying a point within the document structure. This keeps + the document tree flatter, but the idea of a "point element" like + "transition" smells bad. A transition isn't a thing itself, it's + the space between two divisions. However, transitions are a + practical solution. + +Solution 3 was chosen for incorporation into the document tree model. + +.. _HTML: http://www.w3.org/MarkUp/ + + +Syntax for Line Blocks +====================== + +* An early idea: How about a literal-block-like prefix, perhaps + "``;;``"? (It is, after all, a *semi-literal* literal block, no?) + Example:: + + Take it away, Eric the Orchestra Leader! ;; + + A one, two, a one two three four + + Half a bee, philosophically, + must, *ipso facto*, half not be. + But half the bee has got to be, + *vis a vis* its entity. D'you see? + + But can a bee be said to be + or not to be an entire bee, + when half the bee is not a bee, + due to some ancient injury? + + Singing... + + Kinda lame. + +* Another idea: in an ordinary paragraph, if the first line ends with + a backslash (escaping the newline), interpret the entire paragraph + as a verse block? For example:: + + Add just one backslash\ + And this paragraph becomes + An awful haiku + + (Awful, and arguably invalid, since in Japanese the word "haiku" + contains three syllables not two.) + + This idea was superceded by the rules for escaped whitespace, useful + for `character-level inline markup`_. + +* In a `2004-02-22 docutils-develop message`__, Jarno Elonen proposed + a "plain list" syntax (and also provided a patch):: + + | John Doe + | President, SuperDuper Corp. + | jdoe@example.org + + __ http://thread.gmane.org/gmane.text.docutils.devel/1187 + + This syntax is very natural. However, these "plain lists" seem very + similar to line blocks, and I see so little intrinsic "list-ness" + that I'm loathe to add a new object. I used the term "blurbs" to + remove the "list" connotation from the originally proposed name. + Perhaps line blocks could be refined to add the two properties they + currently lack: + + A) long lines wrap nicely + B) HTML output doesn't look like program code in non-CSS web + browsers + + (A) is an issue of all 3 aspects of Docutils: syntax (construct + behaviour), internal representation, and output. (B) is partly an + issue of internal representation but mostly of output. + +ReStructuredText will redefine line blocks with the "|"-quoting +syntax. The following is my current thinking. + + +Syntax +------ + +Perhaps line block syntax like this would do:: + + | M6: James Bond + | MIB: Mr. J. + | IMF: not decided yet, but probably one of the following: + | Ethan Hunt + | Jim Phelps + | Claire Phelps + | CIA: Felix Leiter + +Note that the "nested" list does not have nested syntax (the "|" are +not further indented); the leading whitespace would still be +significant somehow (more below). As for long lines in the input, +this could suffice:: + + | John Doe + | Founder, President, Chief Executive Officer, Cook, Bottle + Washer, and All-Round Great Guy + | SuperDuper Corp. + | jdoe@example.org + +The lack of "|" on the third line indicates that it's a continuation +of the second line, wrapped. + +I don't see much point in allowing arbitrary nested content. Multiple +paragraphs or bullet lists inside a "blurb" doesn't make sense to me. +Simple nested line blocks should suffice. + + +Internal Representation +----------------------- + +Line blocks are currently represented as text blobs as follows:: + + <!ELEMENT line_block %text.model;> + <!ATTLIST line_block + %basic.atts; + %fixedspace.att;> + +Instead, we could represent each line by a separate element:: + + <!ELEMENT line_block (line+)> + <!ATTLIST line_block %basic.atts;> + + <!ELEMENT line %text.model;> + <!ATTLIST line %basic.atts;> + +We'd keep the significance of the leading whitespace of each line +either by converting it to non-breaking spaces at output, or with a +per-line margin. Non-breaking spaces are simpler (for HTML, anyway) +but kludgey, and wouldn't support indented long lines that wrap. But +should inter-word whitespace (i.e., not leading whitespace) be +preserved? Currently it is preserved in line blocks. + +Representing a more complex line block may be tricky:: + + | But can a bee be said to be + | or not to be an entire bee, + | when half the bee is not a bee, + | due to some ancient injury? + +Perhaps the representation could allow for nested line blocks:: + + <!ELEMENT line_block (line | line_block)+> + +With this model, leading whitespace would no longer be significant. +Instead, left margins are implied by the nesting. The example above +could be represented as follows:: + + <line_block> + <line> + But can a bee be said to be + <line_block> + <line> + or not to be an entire bee, + <line_block> + <line> + when half the bee is not a bee, + <line_block> + <line> + due to some ancient injury? + +I wasn't sure what to do about even more complex line blocks:: + + | Indented + | Not indented + | Indented a bit + | A bit more + | Only one space + +How should that be parsed and nested? Should the first line have +the same nesting level (== indentation in the output) as the fourth +line, or the same as the last line? Mark Nodine suggested that such +line blocks be parsed similarly to complexly-nested block quotes, +which seems reasonable. In the example above, this would result in +the nesting of first line matching the last line's nesting. In +other words, the nesting would be relative to neighboring lines +only. + + +Output +------ + +In HTML, line blocks are currently output as "<pre>" blocks, which +gives us significant whitespace and line breaks, but doesn't allow +long lines to wrap and causes monospaced output without stylesheets. +Instead, we could output "<div>" elements parallelling the +representation above, where each nested <div class="line_block"> would +have an increased left margin (specified in the stylesheet). + +Jarno suggested the following HTML output:: + + <div class="line_block"> + <span class="line">First, top level line</span><br class="hidden"/> + <div class="line_block"><span class="hidden"> </span> + <span class="line">Second, once nested</span><br class="hidden"/> + <span class="line">Third, once nested</span><br class="hidden"/> + ... + </div> + ... + </div> + +The ``<br class="hidden" />`` and ``<span +class="hidden"> </span>`` are meant to support non-CSS and +non-graphical browsers. I understand the case for "br", but I'm not +so sure about hidden " ". I question how much effort should be +put toward supporting non-graphical and especially non-CSS browsers, +at least for html4css1.py output. + +Should the lines themselves be ``<span>`` or ``<div>``? I don't like +mixing inline and block-level elements. + + +Implementation Plan +------------------- + +We'll leave the old implementation in place (via the "line-block" +directive only) until all Writers have been updated to support the new +syntax & implementation. The "line-block" directive can then be +updated to use the new internal representation, and its documentation +will be updated to recommend the new syntax. + + +List-Driven Tables +================== + +The original idea came from Dylan Jay: + + ... to use a two level bulleted list with something to + indicate it should be rendered as a table ... + +It's an interesting idea. It could be implemented in as a directive +which transforms a uniform two-level list into a table. Using a +directive would allow the author to explicitly set the table's +orientation (by column or by row), the presence of row headers, etc. + +Alternatives: + +1. (Implemented in Docutils 0.3.8). + + Bullet-list-tables might look like this:: + + .. list-table:: + + * - Treat + - Quantity + - Description + * - Albatross! + - 299 + - On a stick! + * - Crunchy Frog! + - 1499 + - If we took the bones out, it wouldn't be crunchy, + now would it? + * - Gannet Ripple! + - 199 + - On a stick! + + This list must be written in two levels. This wouldn't work:: + + .. list-table:: + + * Treat + * Albatross! + * Gannet! + * Crunchy Frog! + + * Quantity + * 299 + * 199 + * 1499 + + * Description + * On a stick! + * On a stick! + * If we took the bones out... + + The above is a single list of 12 items. The blank lines are not + significant to the markup. We'd have to explicitly specify how + many columns or rows to use, which isn't a good idea. + +2. Beni Cherniavsky suggested a field list alternative. It could look + like this:: + + .. field-list-table:: + :headrows: 1 + + - :treat: Treat + :quantity: Quantity + :descr: Description + + - :treat: Albatross! + :quantity: 299 + :descr: On a stick! + + - :treat: Crunchy Frog! + :quantity: 1499 + :descr: If we took the bones out, it wouldn't be + crunchy, now would it? + + Column order is determined from the order of fields in the first + row. Field order in all other rows is ignored. As a side-effect, + this allows trivial re-arrangement of columns. By using named + fields, it becomes possible to omit fields in some rows without + losing track of things, which is important for spans. + +3. An alternative to two-level bullet lists would be to use enumerated + lists for the table cells:: + + .. list-table:: + + * 1. Treat + 2. Quantity + 3. Description + * 1. Albatross! + 2. 299 + 3. On a stick! + * 1. Crunchy Frog! + 2. 1499 + 3. If we took the bones out, it wouldn't be crunchy, + now would it? + + That provides better correspondence between cells in the same + column than does bullet-list syntax, but not as good as field list + syntax. I think that were only field-list-tables available, a lot + of users would use the equivalent degenerate case:: + + .. field-list-table:: + - :1: Treat + :2: Quantity + :3: Description + ... + +4. Another natural variant is to allow a description list with field + lists as descriptions:: + + .. list-table:: + :headrows: 1 + + Treat + :quantity: Quantity + :descr: Description + Albatross! + :quantity: 299 + :descr: On a stick! + Crunchy Frog! + :quantity: 1499 + :descr: If we took the bones out, it wouldn't be + crunchy, now would it? + + This would make the whole first column a header column ("stub"). + It's limited to a single column and a single paragraph fitting on + one source line. Also it wouldn't allow for empty cells or row + spans in the first column. But these are limitations that we could + live with, like those of simple tables. + +The List-driven table feature could be done in many ways. Each user +will have their preferred usage. Perhaps a single "list-table" +directive could handle them all, depending on which options and +content are present. + +Issues: + +* How to indicate that there's 1 header row? Perhaps two lists? :: + + .. list-table:: + + + - Treat + - Quantity + - Description + + * - Albatross! + - 299 + - On a stick! + + This is probably too subtle though. Better would be a directive + option, like ``:headrows: 1``. An early suggestion for the header + row(s) was to use a directive option:: + + .. field-list-table:: + :header: + - :treat: Treat + :quantity: Quantity + :descr: Description + - :treat: Albatross! + :quantity: 299 + :descr: On a stick! + + But the table data is at two levels and looks inconsistent. + + In general, we cannot extract the header row from field lists' field + names because field names cannot contain everything one might put in + a table cell. A separate header row also allows shorter field names + and doesn't force one to rewrite the whole table when the header + text changes. But for simpler cases, we can offer a ":header: + fields" option, which does extract header cells from field names:: + + .. field-list-table:: + :header: fields + + - :Treat: Albatross! + :Quantity: 299 + :Description: On a stick! + +* How to indicate the column widths? A directive option? :: + + .. list-table:: + :widths: 15 10 35 + + Automatic defaults from the text used? + +* How to handle row and/or column spans? + + In a field list, column-spans can be indicated by specifying the + first and last fields, separated by space-dash-space or ellipsis:: + + - :foo - baz: quuux + - :foo ... baz: quuux + + Commas were proposed for column spans:: + + - :foo, bar: quux + + But non-adjacent columns become problematic. Should we report an + error, or duplicate the value into each span of adjacent columns (as + was suggested)? The latter suggestion is appealing but may be too + clever. Best perhaps to simply specify the two ends. + + It was suggested that comma syntax should be allowed, too, in order + to allow the user to avoid trouble when changing the column order. + But changing the column order of a table with spans is not trivial; + we shouldn't make it easier to mess up. + + One possible syntax for row-spans is to simply treat any row where a + field is missing as a row-span from the last row where it appeared. + Leaving a field empty would still be possible by writing a field + with empty content. But this is too implicit. + + Another way would be to require an explicit continuation marker + (``...``/``-"-``/``"``?) in all but the first row of a spanned + field. Empty comments could work (".."). If implemented, the same + marker could also be supported in simple tables, which lack + row-spanning abilities. + + Explicit markup like ":rowspan:" and ":colspan:" was also suggested. + + Sometimes in a table, the first header row contains spans. It may + be necessary to provide a way to specify the column field names + independently of data rows. A directive option would do it. + +* We could specify "column-wise" or "row-wise" ordering, with the same + markup structure. For example, with definition data:: + + .. list-table:: + :column-wise: + + Treat + - Albatross! + - Crunchy Frog! + Quantity + - 299 + - 1499 + Description + - On a stick! + - If we took the bones out, it wouldn't be + crunchy, now would it? + +* A syntax for _`stubs in grid tables` is easy to imagine:: + + +------------------------++------------+----------+ + | Header row, column 1 || Header 2 | Header 3 | + +========================++============+==========+ + | body row 1, column 1 || column 2 | column 3 | + +------------------------++------------+----------+ + + Or this idea from Nick Moffitt:: + + +-----+---+---+ + | XOR # T | F | + +=====+===+===+ + | T # F | T | + +-----+---+---+ + | F # T | F | + +-----+---+---+ + + +Auto-Enumerated Lists +===================== + +Implemented 2005-03-24: combination of variation 1 & 2. + +The advantage of auto-numbered enumerated lists would be similar to +that of auto-numbered footnotes: lists could be written and rearranged +without having to manually renumber them. The disadvantages are also +the same: input and output wouldn't match exactly; the markup may be +ugly or confusing (depending on which alternative is chosen). + +1. Use the "#" symbol. Example:: + + #. Item 1. + #. Item 2. + #. Item 3. + + Advantages: simple, explicit. Disadvantage: enumeration sequence + cannot be specified (limited to arabic numerals); ugly. + +2. As a variation on #1, first initialize the enumeration sequence? + For example:: + + a) Item a. + #) Item b. + #) Item c. + + Advantages: simple, explicit, any enumeration sequence possible. + Disadvantages: ugly; perhaps confusing with mixed concrete/abstract + enumerators. + +3. Alternative suggested by Fred Bremmer, from experience with MoinMoin:: + + 1. Item 1. + 1. Item 2. + 1. Item 3. + + Advantages: enumeration sequence is explicit (could be multiple + "a." or "(I)" tokens). Disadvantages: perhaps confusing; otherwise + erroneous input (e.g., a duplicate item "1.") would pass silently, + either causing a problem later in the list (if no blank lines + between items) or creating two lists (with blanks). + + Take this input for example:: + + 1. Item 1. + + 1. Unintentional duplicate of item 1. + + 2. Item 2. + + Currently the parser will produce two list, "1" and "1,2" (no + warnings, because of the presence of blank lines). Using Fred's + notation, the current behavior is "1,1,2 -> 1 1,2" (without blank + lines between items, it would be "1,1,2 -> 1 [WARNING] 1,2"). What + should the behavior be with auto-numbering? + + Fred has produced a patch__, whose initial behavior is as follows:: + + 1,1,1 -> 1,2,3 + 1,2,2 -> 1,2,3 + 3,3,3 -> 3,4,5 + 1,2,2,3 -> 1,2,3 [WARNING] 3 + 1,1,2 -> 1,2 [WARNING] 2 + + (After the "[WARNING]", the "3" would begin a new list.) + + I have mixed feelings about adding this functionality to the spec & + parser. It would certainly be useful to some users (myself + included; I often have to renumber lists). Perhaps it's too + clever, asking the parser to guess too much. What if you *do* want + three one-item lists in a row, each beginning with "1."? You'd + have to use empty comments to force breaks. Also, I question + whether "1,2,2 -> 1,2,3" is optimal behavior. + + In response, Fred came up with "a stricter and more explicit rule + [which] would be to only auto-number silently if *all* the + enumerators of a list were identical". In that case:: + + 1,1,1 -> 1,2,3 + 1,2,2 -> 1,2 [WARNING] 2 + 3,3,3 -> 3,4,5 + 1,2,2,3 -> 1,2 [WARNING] 2,3 + 1,1,2 -> 1,2 [WARNING] 2 + + Should any start-value be allowed ("3,3,3"), or should + auto-numbered lists be limited to begin with ordinal-1 ("1", "A", + "a", "I", or "i")? + + __ http://sourceforge.net/tracker/index.php?func=detail&aid=548802 + &group_id=38414&atid=422032 + +4. Alternative proposed by Tony Ibbs:: + + #1. First item. + #3. Aha - I edited this in later. + #2. Second item. + + The initial proposal required unique enumerators within a list, but + this limits the convenience of a feature of already limited + applicability and convenience. Not a useful requirement; dropped. + + Instead, simply prepend a "#" to a standard list enumerator to + indicate auto-enumeration. The numbers (or letters) of the + enumerators themselves are not significant, except: + + - as a sequence indicator (arabic, roman, alphabetic; upper/lower), + + - and perhaps as a start value (first list item). + + Advantages: explicit, any enumeration sequence possible. + Disadvantages: a bit ugly. + + +----------------- + Not Implemented +----------------- + +Reworking Footnotes +=================== + +As a further wrinkle (see `Reworking Explicit Markup (Round 1)`_ +above), in the wee hours of 2002-02-28 I posted several ideas for +changes to footnote syntax: + + - Change footnote syntax from ``.. [1]`` to ``_[1]``? ... + - Differentiate (with new DTD elements) author-date "citations" + (``[GVR2002]``) from numbered footnotes? ... + - Render footnote references as superscripts without "[]"? ... + +These ideas are all related, and suggest changes in the +reStructuredText syntax as well as the docutils tree model. + +The footnote has been used for both true footnotes (asides expanding +on points or defining terms) and for citations (references to external +works). Rather than dealing with one amalgam construct, we could +separate the current footnote concept into strict footnotes and +citations. Citations could be interpreted and treated differently +from footnotes. Footnotes would be limited to numerical labels: +manual ("1") and auto-numbered (anonymous "#", named "#label"). + +The footnote is the only explicit markup construct (starts with ".. ") +that directly translates to a visible body element. I've always been +a little bit uncomfortable with the ".. " marker for footnotes because +of this; ".. " has a connotation of "special", but footnotes aren't +especially "special". Printed texts often put footnotes at the bottom +of the page where the reference occurs (thus "foot note"). Some HTML +designs would leave footnotes to be rendered the same positions where +they're defined. Other online and printed designs will gather +footnotes into a section near the end of the document, converting them +to "endnotes" (perhaps using a directive in our case); but this +"special processing" is not an intrinsic property of the footnote +itself, but a decision made by the document author or processing +system. + +Citations are almost invariably collected in a section at the end of a +document or section. Citations "disappear" from where they are +defined and are magically reinserted at some well-defined point. +There's more of a connection to the "special" connotation of the ".. " +syntax. The point at which the list of citations is inserted could be +defined manually by a directive (e.g., ".. citations::"), and/or have +default behavior (e.g., a section automatically inserted at the end of +the document) that might be influenced by options to the Writer. + +Syntax proposals: + ++ Footnotes: + + - Current syntax:: + + .. [1] Footnote 1 + .. [#] Auto-numbered footnote. + .. [#label] Auto-labeled footnote. + + - The syntax proposed in the original 2002-02-28 Doc-SIG post: + remove the ".. ", prefix a "_":: + + _[1] Footnote 1 + _[#] Auto-numbered footnote. + _[#label] Auto-labeled footnote. + + The leading underscore syntax (earlier dropped because + ``.. _[1]:`` was too verbose) is a useful reminder that footnotes + are hyperlink targets. + + - Minimal syntax: remove the ".. [" and "]", prefix a "_", and + suffix a ".":: + + _1. Footnote 1. + _#. Auto-numbered footnote. + _#label. Auto-labeled footnote. + + ``_1.``, ``_#.``, and ``_#label.`` are markers, + like list markers. + + Footnotes could be rendered something like this in HTML + + | 1. This is a footnote. The brackets could be dropped + | from the label, and a vertical bar could set them + | off from the rest of the document in the HTML. + + Two-way hyperlinks on the footnote marker ("1." above) would also + help to differentiate footnotes from enumerated lists. + + If converted to endnotes (by a directive/transform), a horizontal + half-line might be used instead. Page-oriented output formats + would typically use the horizontal line for true footnotes. + ++ Footnote references: + + - Current syntax:: + + [1]_, [#]_, [#label]_ + + - Minimal syntax to match the minimal footnote syntax above:: + + 1_, #_, #label_ + + As a consequence, pure-numeric hyperlink references would not be + possible; they'd be interpreted as footnote references. + ++ Citation references: no change is proposed from the current footnote + reference syntax:: + + [GVR2001]_ + ++ Citations: + + - Current syntax (footnote syntax):: + + .. [GVR2001] Python Documentation; van Rossum, Drake, et al.; + http://www.python.org/doc/ + + - Possible new syntax:: + + _[GVR2001] Python Documentation; van Rossum, Drake, et al.; + http://www.python.org/doc/ + + _[DJG2002] + Docutils: Python Documentation Utilities project; Goodger + et al.; http://docutils.sourceforge.net/ + + Without the ".. " marker, subsequent lines would either have to + align as in one of the above, or we'd have to allow loose + alignment (I'd rather not):: + + _[GVR2001] Python Documentation; van Rossum, Drake, et al.; + http://www.python.org/doc/ + +I proposed adopting the "minimal" syntax for footnotes and footnote +references, and adding citations and citation references to +reStructuredText's repertoire. The current footnote syntax for +citations is better than the alternatives given. + +From a reply by Tony Ibbs on 2002-03-01: + + However, I think easier with examples, so let's create one:: + + Fans of Terry Pratchett are perhaps more likely to use + footnotes [1]_ in their own writings than other people + [2]_. Of course, in *general*, one only sees footnotes + in academic or technical writing - it's use in fiction + and letter writing is not normally considered good + style [4]_, particularly in emails (not a medium that + lends itself to footnotes). + + .. [1] That is, little bits of referenced text at the + bottom of the page. + .. [2] Because Terry himself does, of course [3]_. + .. [3] Although he has the distinction of being + *funny* when he does it, and his fans don't always + achieve that aim. + .. [4] Presumably because it detracts from linear + reading of the text - this is, of course, the point. + + and look at it with the second syntax proposal:: + + Fans of Terry Pratchett are perhaps more likely to use + footnotes [1]_ in their own writings than other people + [2]_. Of course, in *general*, one only sees footnotes + in academic or technical writing - it's use in fiction + and letter writing is not normally considered good + style [4]_, particularly in emails (not a medium that + lends itself to footnotes). + + _[1] That is, little bits of referenced text at the + bottom of the page. + _[2] Because Terry himself does, of course [3]_. + _[3] Although he has the distinction of being + *funny* when he does it, and his fans don't always + achieve that aim. + _[4] Presumably because it detracts from linear + reading of the text - this is, of course, the point. + + (I note here that if I have gotten the indentation of the + footnotes themselves correct, this is clearly not as nice. And if + the indentation should be to the left margin instead, I like that + even less). + + and the third (new) proposal:: + + Fans of Terry Pratchett are perhaps more likely to use + footnotes 1_ in their own writings than other people + 2_. Of course, in *general*, one only sees footnotes + in academic or technical writing - it's use in fiction + and letter writing is not normally considered good + style 4_, particularly in emails (not a medium that + lends itself to footnotes). + + _1. That is, little bits of referenced text at the + bottom of the page. + _2. Because Terry himself does, of course 3_. + _3. Although he has the distinction of being + *funny* when he does it, and his fans don't always + achieve that aim. + _4. Presumably because it detracts from linear + reading of the text - this is, of course, the point. + + I think I don't, in practice, mind the targets too much (the use + of a dot after the number helps a lot here), but I do have a + problem with the body text, in that I don't naturally separate out + the footnotes as different than the rest of the text - instead I + keep wondering why there are numbers interspered in the text. The + use of brackets around the numbers ([ and ]) made me somehow parse + the footnote references as "odd" - i.e., not part of the body text + - and thus both easier to skip, and also (paradoxically) easier to + pick out so that I could follow them. + + Thus, for the moment (and as always susceptable to argument), I'd + say -1 on the new form of footnote reference (i.e., I much prefer + the existing ``[1]_`` over the proposed ``1_``), and ambivalent + over the proposed target change. + + That leaves David's problem of wanting to distinguish footnotes + and citations - and the only thing I can propose there is that + footnotes are numeric or # and citations are not (which, as a + human being, I can probably cope with!). + +From a reply by Paul Moore on 2002-03-01: + + I think the current footnote syntax ``[1]_`` is *exactly* the + right balance of distinctness vs unobtrusiveness. I very + definitely don't think this should change. + + On the target change, it doesn't matter much to me. + +From a further reply by Tony Ibbs on 2002-03-01, referring to the +"[1]" form and actual usage in email: + + Clearly this is a form people are used to, and thus we should + consider it strongly (in the same way that the usage of ``*..*`` + to mean emphasis was taken partly from email practise). + + Equally clearly, there is something "magical" for people in the + use of a similar form (i.e., ``[1]``) for both footnote reference + and footnote target - it seems natural to keep them similar. + + ... + + I think that this established plaintext usage leads me to strongly + believe we should retain square brackets at both ends of a + footnote. The markup of the reference end (a single trailing + underscore) seems about as minimal as we can get away with. The + markup of the target end depends on how one envisages the thing - + if ".." means "I am a target" (as I tend to see it), then that's + good, but one can also argue that the "_[1]" syntax has a neat + symmetry with the footnote reference itself, if one wishes (in + which case ".." presumably means "hidden/special" as David seems + to think, which is why one needs a ".." *and* a leading underline + for hyperlink targets. + +Given the persuading arguments voiced, we'll leave footnote & footnote +reference syntax alone. Except that these discussions gave rise to +the "auto-symbol footnote" concept, which has been added. Citations +and citation references have also been added. + + +Syntax for Questions & Answers +============================== + +Implement as a generic two-column marked list? As a standalone +(non-directive) construct? (Is the markup ambiguous?) Add support to +parts.contents? + +New elements would be required. Perhaps:: + + <!ELEMENT question_list (question_list_item+)> + <!ATTLIST question_list + numbering (none | local | global) + #IMPLIED + start NUMBER #IMPLIED> + <!ELEMENT question_list_item (question, answer*)> + <!ELEMENT question %text.model;> + <!ELEMENT answer (%body.elements;)+> + +Originally I thought of implementing a Q&A list with special syntax:: + + Q: What am I? + + A: You are a question-and-answer + list. + + Q: What are you? + + A: I am the omniscient "we". + +Where each "Q" and "A" could also be numbered (e.g., "Q1"). However, +a simple enumerated or bulleted list will do just fine for syntax. A +directive could treat the list specially; e.g. the first paragraph +could be treated as a question, the remainder as the answer (multiple +answers could be represented by nested lists). Without special +syntax, this directive becomes low priority. + +As described in the FAQ__, no special syntax or directive is needed +for this application. + +__ http://docutils.sf.net/FAQ.html + #how-can-i-mark-up-a-faq-or-other-list-of-questions-answers + + +-------- + Tabled +-------- + +Reworking Explicit Markup (Round 2) +=================================== + +See `Reworking Explicit Markup (Round 1)`_ for an earlier discussion. + +In April 2004, a new thread becan on docutils-develop: `Inconsistency +in RST markup`__. Several arguments were made; the first argument +begat later arguments. Below, the arguments are paraphrased "in +quotes", with responses. + +__ http://thread.gmane.org/gmane.text.docutils.devel/1386 + +1. References and targets take this form:: + + targetname_ + + .. _targetname: stuff + + But footnotes, "which generate links just like targets do", are + written as:: + + [1]_ + + .. [1] stuff + + "Footnotes should be written as":: + + [1]_ + + .. _[1]: stuff + + But they're not the same type of animal. That's not a "footnote + target", it's a *footnote*. Being a target is not a footnote's + primary purpose (an arguable point). It just happens to grow a + target automatically, for convenience. Just as a section title:: + + Title + ===== + + isn't a "title target", it's a *title*, which happens to grow a + target automatically. The consistency is there, it's just deeper + than at first glance. + + Also, ".. [1]" was chosen for footnote syntax because it closely + resembles one form of actual footnote rendering. ".. _[1]:" is too + verbose; excessive punctuation is required to get the job done. + + For more of the reasoning behind the syntax, see `Problems With + StructuredText (Hyperlinks) <problems.html#hyperlinks>`__ and + `Reworking Footnotes`_. + +2. "I expect directives to also look like ``.. this:`` [one colon] + because that also closely parallels the link and footnote target + markup." + + There are good reasons for the two-colon syntax: + + Two colons are used after the directive type for these reasons: + + - Two colons are distinctive, and unlikely to be used in common + text. + + - Two colons avoids clashes with common comment text like:: + + .. Danger: modify at your own risk! + + - If an implementation of reStructuredText does not recognize a + directive (i.e., the directive-handler is not installed), a + level-3 (error) system message is generated, and the entire + directive block (including the directive itself) will be + included as a literal block. Thus "::" is a natural choice. + + -- `restructuredtext.html#directives + <../../ref/rst/restructuredtext.html#directives>`__ + + The last reason is not particularly compelling; it's more of a + convenient coincidence or mnemonic. + +3. "Comments always seemed too easy. I almost never write comments. + I'd have no problem writing '.. comment:' in front of my comments. + In fact, it would probably be more readable, as comments *should* + be set off strongly, because they are very different from normal + text." + + Many people do use comments though, and some applications of + reStructuredText require it. For example, all reStructuredText + PEPs (and this document!) have an Emacs stanza at the bottom, in a + comment. Having to write ".. comment::" would be very obtrusive. + + Comments *should* be dirt-easy to do. It should be easy to + "comment out" a block of text. Comments in programming languages + and other markup languages are invariably easy. + + Any author is welcome to preface their comments with "Comment:" or + "Do Not Print" or "Note to Editor" or anything they like. A + "comment" directive could easily be implemented. It might be + confused with admonition directives, like "note" and "caution" + though. In unrelated (and unpublished and unfinished) work, adding + a "comment" directive as a true document element was considered:: + + If structure is necessary, we could use a "comment" directive + (to avoid nonsensical DTD changes, the "comment" directive + could produce an untitled topic element). + +4. "One of the goals of reStructuredText is to be *readable* by people + who don't know it. This construction violates that: it is not at + all obvious to the uninitiated that text marked by '..' is a + comment. On the other hand, '.. comment:' would be totally + transparent." + + Totally transparent, perhaps, but also very obtrusive. Another of + `reStructuredText's goals`_ is to be unobtrusive, and + ".. comment::" would violate that. The goals of reStructuredText + are many, and they conflict. Determining the right set of goals + and finding solutions that best fit is done on a case-by-case + basis. + + Even readability is has two aspects. Being readable without any + prior knowledge is one. Being as easily read in raw form as in + processed form is the other. ".." may not contribute to the former + aspect, but ".. comment::" would certainly detract from the latter. + + .. _author's note: + .. _reStructuredText's goals: ../../ref/rst/introduction.html#goals + +5. "Recently I sent someone an rst document, and they got confused; I + had to explain to them that '..' marks comments, *unless* it's a + directive, etc..." + + The explanation of directives *is* roundabout, defining comments in + terms of not being other things. That's definitely a wart. + +6. "Under the current system, a mistyped directive (with ':' instead + of '::') will be silently ignored. This is an error that could + easily go unnoticed." + + A parser option/setting like "--comments-on-stderr" would help. + +7. "I'd prefer to see double-dot-space / command / double-colon as the + standard Docutils markup-marker. It's unusual enough to avoid + being accidently used. Everything that starts with a double-dot + should end with a double-colon." + + That would increase the punctuation verbosity of some constructs + considerably. + +8. Edward Loper proposed the following plan for backwards + compatibility: + + 1. ".. foo" will generate a deprecation warning to stderr, and + nothing in the output (no system messages). + 2. ".. foo: bar" will be treated as a directive foo. If there + is no foo directive, then do the normal error output. + 3. ".. foo:: bar" will generate a deprecation warning to + stderr, and be treated as a directive. Or leave it valid? + + So some existing documents might start printing deprecation + warnings, but the only existing documents that would *break* + would be ones that say something like:: + + .. warning: this should be a comment + + instead of:: + + .. warning:: this should be a comment + + Here, we're trading fairly common a silent error (directive + falsely treated as a comment) for a fairly uncommon explicitly + flagged error (comment falsely treated as directive). To make + things even easier, we could add a sentence to the + unknown-directive error. Something like "If you intended to + create a comment, please use '.. comment:' instead". + +On one hand, I understand and sympathize with the points raised. On +the other hand, I think the current syntax strikes the right balance +(but I acknowledge a possible lack of objectivity). On the gripping +hand, the comment and directive syntax has become well established, so +even if it's a wart, it may be a wart we have to live with. + +Making any of these changes would cause a lot of breakage or at least +deprecation warnings. I'm not sure the benefit is worth the cost. + +For now, we'll treat this as an unresolved legacy issue. + + +------- + To Do +------- + +Nested Inline Markup +==================== + +These are collected notes on a long-discussed issue. The original +mailing list messages should be referred to for details. + +* In a 2001-10-31 discussion I wrote: + + Try, for example, `Ed Loper's 2001-03-21 post`_, which details + some rules for nested inline markup. I think the complexity is + prohibitive for the marginal benefit. (And if you can understand + that tree without going mad, you're a better man than I. ;-) + + Inline markup is already fragile. Allowing nested inline markup + would only be asking for trouble IMHO. If it proves absolutely + necessary, it can be added later. The rules for what can appear + inside what must be well thought out first though. + + .. _Ed Loper's 2001-03-21 post: + http://mail.python.org/pipermail/doc-sig/2001-March/001487.html + + -- http://mail.python.org/pipermail/doc-sig/2001-October/002354.html + +* In a 2001-11-09 Doc-SIG post, I wrote: + + The problem is that in the + what-you-see-is-more-or-less-what-you-get markup language that + is reStructuredText, the symbols used for inline markup ("*", + "**", "`", "``", etc.) may preclude nesting. + + I've rethought this position. Nested markup is not precluded, just + tricky. People and software parse "double and 'single' quotes" all + the time. Continuing, + + I've thought over how we might implement nested inline + markup. The first algorithm ("first identify the outer inline + markup as we do now, then recursively scan for nested inline + markup") won't work; counterexamples were given in my `last post + <http://mail.python.org/pipermail/doc-sig/2001-November/002363.html>`__. + + The second algorithm makes my head hurt:: + + while 1: + scan for start-string + if found: + push on stack + scan for start or end string + if new start string found: + recurse + elif matching end string found: + pop stack + elif non-matching end string found: + if its a markup error: + generate warning + elif the initial start-string was misinterpreted: + # e.g. in this case: ***strong** in emphasis* + restart with the other interpretation + # but it might be several layers back ... + ... + + This is similar to how the parser does section title + recognition, but sections are much more regular and + deterministic. + + Bottom line is, I don't think the benefits are worth the effort, + even if it is possible. I'm not going to try to write the code, + at least not now. If somebody codes up a consistent, working, + general solution, I'll be happy to consider it. + + -- http://mail.python.org/pipermail/doc-sig/2001-November/002388.html + +* In a `2003-05-06 Docutils-Users post`__ Paul Tremblay proposed a new + syntax to allow for easier nesting. It eventually evolved into + this:: + + :role:[inline text] + + The duplication with the existing interpreted text syntax is + problematic though. + + __ http://article.gmane.org/gmane.text.docutils.user/317 + +* Could the parser be extended to parse nested interpreted text? :: + + :emphasis:`Some emphasized text with :strong:`some more + emphasized text` in it and **perhaps** :reference:`a link`` + +* In a `2003-06-18 Docutils-Develop post`__, Mark Nodine reported on + his implementation of a form of nested inline markup in his + Perl-based parser (unpublished). He brought up some interesting + ideas. The implementation was flawed, however, by the change in + semantics required for backslash escapes. + + __ http://article.gmane.org/gmane.text.docutils.devel/795 + +* Docutils-develop threads between David Abrahams, David Goodger, and + Mark Nodine (beginning 2004-01-16__ and 2004-01-19__) hashed out + many of the details of a potentially successful implementation, as + described below. David Abrahams checked in code to the "nesting" + branch of CVS, awaiting thorough review. + + __ http://thread.gmane.org/gmane.text.docutils.devel/1102 + __ http://thread.gmane.org/gmane.text.docutils.devel/1125 + +It may be possible to accomplish nested inline markup in general with +a more powerful inline markup parser. There may be some issues, but +I'm not averse to the idea of nested inline markup in general. I just +don't have the time or inclination to write a new parser now. Of +course, a good patch would be welcome! + +I envisage something like this. Explicit-role interpreted text must +be nestable. Prefix-based is probably preferred, since suffix-based +will look like inline literals:: + + ``text`:role1:`:role2: + +But it can be disambiguated, so it ought to be left up to the author:: + + `\ `text`:role1:`:role2: + +In addition, other forms of inline markup may be nested if +unambiguous:: + + *emphasized ``literal`` and |substitution ref| and link_* + +IOW, the parser ought to be as permissive as possible. + + +Index Entries & Indexes +======================= + +Were I writing a book with an index, I guess I'd need two +different kinds of index targets: inline/implicit and +out-of-line/explicit. For example:: + + In this `paragraph`:index:, several words are being + `marked`:index: inline as implicit `index`:index: + entries. + + .. index:: markup + .. index:: syntax + + The explicit index directives above would refer to + this paragraph. It might also make sense to allow multiple + entries in an ``index`` directive: + + .. index:: + markup + syntax + +The words "paragraph", "marked", and "index" would become index +entries pointing at the words in the first paragraph. The index +entry words appear verbatim in the text. (Don't worry about the +ugly ":index:" part; if indexing is the only/main application of +interpreted text in your documents, it can be implicit and +omitted.) The two directives provide manual indexing, where the +index entry words ("markup" and "syntax") do not appear in the +main text. We could combine the two directives into one:: + + .. index:: markup; syntax + +Semicolons instead of commas because commas could *be* part of the +index target, like:: + + .. index:: van Rossum, Guido + +Another reason for index directives is because other inline markup +wouldn't be possible within inline index targets. + +Sometimes index entries have multiple levels. Given:: + + .. index:: statement syntax: expression statements + +In a hypothetical index, combined with other entries, it might +look like this:: + + statement syntax + expression statements ..... 56 + assignment ................ 57 + simple statements ......... 58 + compound statements ....... 60 + +Inline multi-level index targets could be done too. Perhaps +something like:: + + When dealing with `expression statements <statement syntax:>`, + we must remember ... + +The opposite sense could also be possible:: + + When dealing with `index entries <:multi-level>`, there are + many permutations to consider. + +Also "see / see also" index entries. + +Given:: + + Here's a paragraph. + + .. index:: paragraph + +(The "index" directive above actually targets the *preceding* +object.) The directive should produce something like this XML:: + + <paragraph> + <index_entry text="paragraph"/> + Here's a paragraph. + </paragraph> + +This kind of content model would also allow true inline +index-entries:: + + Here's a `paragraph`:index:. + +If the "index" role were the default for the application, it could be +dropped:: + + Here's a `paragraph`. + +Both of these would result in this XML:: + + <paragraph> + Here's a <index_entry>paragraph</index_entry>. + </paragraph> + + +from 2002-06-24 docutils-develop posts +-------------------------------------- + + If all of your index entries will appear verbatim in the text, + this should be sufficient. If not (e.g., if you want "Van Rossum, + Guido" in the index but "Guido van Rossum" in the text), we'll + have to figure out a supplemental mechanism, perhaps using + substitutions. + +I've thought a bit more on this, and I came up with two possibilities: + +1. Using interpreted text, embed the index entry text within the + interpreted text:: + + ... by `Guido van Rossum [Van Rossum, Guido]` ... + + The problem with this is obvious: the text becomes cluttered and + hard to read. The processed output would drop the text in + brackets, which goes against the spirit of interpreted text. + +2. Use substitutions:: + + ... by |Guido van Rossum| ... + + .. |Guido van Rossum| index:: Van Rossum, Guido + + A problem with this is that each substitution definition must have + a unique name. A subsequent ``.. |Guido van Rossum| index:: BDFL`` + would be illegal. Some kind of anonymous substitution definition + mechanism would be required, but I think that's going too far. + +Both of these alternatives are flawed. Any other ideas? + + +------------------- + ... Or Not To Do? +------------------- + +This is the realm of the possible but questionably probable. These +ideas are kept here as a record of what has been proposed, for +posterity and in case any of them prove to be useful. + + +Compound Enumerated Lists +========================= + +Allow for compound enumerators, such as "1.1." or "1.a." or "1(a)", to +allow for nested enumerated lists without indentation? + + +Indented Lists +============== + +Allow for variant styles by interpreting indented lists as if they +weren't indented? For example, currently the list below will be +parsed as a list within a block quote:: + + paragraph + + * list item 1 + * list item 2 + +But a lot of people seem to write that way, and HTML browsers make it +look as if that's the way it should be. The parser could check the +contents of block quotes, and if they contain only a single list, +remove the block quote wrapper. There would be two problems: + +1. What if we actually *do* want a list inside a block quote? + +2. What if such a list comes immediately after an indented construct, + such as a literal block? + +Both could be solved using empty comments (problem 2 already exists +for a block quote after a literal block). But that's a hack. + +Perhaps a runtime setting, allowing or disabling this convenience, +would be appropriate. But that raises issues too: + + User A, who writes lists indented (and their config file is set up + to allow it), sends a file to user B, who doesn't (and their + config file disables indented lists). The result of processing by + the two users will be different. + +It may seem minor, but it adds ambiguity to the parser, which is bad. + +See the `Doc-SIG discussion starting 2001-04-18`__ with Ed Loper's +"Structuring: a summary; and an attempt at EBNF", item 4 (and +follow-ups, here__ and here__). Also `docutils-users, 2003-02-17`__ +and `beginning 2003-08-04`__. + +__ http://mail.python.org/pipermail/doc-sig/2001-April/001776.html +__ http://mail.python.org/pipermail/doc-sig/2001-April/001789.html +__ http://mail.python.org/pipermail/doc-sig/2001-April/001793.html +__ http://sourceforge.net/mailarchive/message.php?msg_id=3838913 +__ http://sf.net/mailarchive/forum.php?thread_id=2957175&forum_id=11444 + + +Sloppy Indentation of List Items +================================ + +Perhaps the indentation shouldn't be so strict. Currently, this is +required:: + + 1. First line, + second line. + +Anything wrong with this? :: + + 1. First line, + second line. + +Problem? :: + + 1. First para. + + Block quote. (no good: requires some indent relative to first + para) + + Second Para. + + 2. Have to carefully define where the literal block ends:: + + Literal block + + Literal block? + +Hmm... Non-strict indentation isn't such a good idea. + + +Lazy Indentation of List Items +============================== + +Another approach: Going back to the first draft of reStructuredText +(2000-11-27 post to Doc-SIG):: + + - This is the fourth item of the main list (no blank line above). + The second line of this item is not indented relative to the + bullet, which precludes it from having a second paragraph. + +Change that to *require* a blank line above and below, to reduce +ambiguity. This "loosening" may be added later, once the parser's +been nailed down. However, a serious drawback of this approach is to +limit the content of each list item to a single paragraph. + + +David's Idea for Lazy Indentation +--------------------------------- + +Consider a paragraph in a word processor. It is a single logical line +of text which ends with a newline, soft-wrapped arbitrarily at the +right edge of the page or screen. We can think of a plaintext +paragraph in the same way, as a single logical line of text, ending +with two newlines (a blank line) instead of one, and which may contain +arbitrary line breaks (newlines) where it was accidentally +hard-wrapped by an application. We can compensate for the accidental +hard-wrapping by "unwrapping" every unindented second and subsequent +line. The indentation of the first line of a paragraph or list item +would determine the indentation for the entire element. Blank lines +would be required between list items when using lazy indentation. + +The following example shows the lazy indentation of multiple body +elements:: + + - This is the first paragraph + of the first list item. + + Here is the second paragraph + of the first list item. + + - This is the first paragraph + of the second list item. + + Here is the second paragraph + of the second list item. + +A more complex example shows the limitations of lazy indentation:: + + - This is the first paragraph + of the first list item. + + Next is a definition list item: + + Term + Definition. The indentation of the term is + required, as is the indentation of the definition's + first line. + + When the definition extends to more than + one line, lazy indentation may occur. (This is the second + paragraph of the definition.) + + - This is the first paragraph + of the second list item. + + - Here is the first paragraph of + the first item of a nested list. + + So this paragraph would be outside of the nested list, + but inside the second list item of the outer list. + + But this paragraph is not part of the list at all. + +And the ambiguity remains:: + + - Look at the hyphen at the beginning of the next line + - is it a second list item marker, or a dash in the text? + + Similarly, we may want to refer to numbers inside enumerated + lists: + + 1. How many socks in a pair? There are + 2. How many pants in a pair? Exactly + 1. Go figure. + +Literal blocks and block quotes would still require consistent +indentation for all their lines. For block quotes, we might be able +to get away with only requiring that the first line of each contained +element be indented. For example:: + + Here's a paragraph. + + This is a paragraph inside a block quote. + Second and subsequent lines need not be indented at all. + + - A bullet list inside + the block quote. + + Second paragraph of the + bullet list inside the block quote. + +Although feasible, this form of lazy indentation has problems. The +document structure and hierarchy is not obvious from the indentation, +making the source plaintext difficult to read. This will also make +keeping track of the indentation while writing difficult and +error-prone. However, these problems may be acceptable for Wikis and +email mode, where we may be able to rely on less complex structure +(few nested lists, for example). + + +Multiple Roles in Interpreted Text +================================== + +In reStructuredText, inline markup cannot be nested (yet; `see +above`__). This also applies to interpreted text. In order to +simultaneously combine multiple roles for a single piece of text, a +syntax extension would be necessary. Ideas: + +1. Initial idea:: + + `interpreted text`:role1,role2: + +2. Suggested by Jason Diamond:: + + `interpreted text`:role1:role2: + +If a document is so complex as to require nested inline markup, +perhaps another markup system should be considered. By design, +reStructuredText does not have the flexibility of XML. + +__ `Nested Inline Markup`_ + + +Parameterized Interpreted Text +============================== + +In some cases it may be expedient to pass parameters to interpreted +text, analogous to function calls. Ideas: + +1. Parameterize the interpreted text role itself (suggested by Jason + Diamond):: + + `interpreted text`:role1(foo=bar): + + Positional parameters could also be supported:: + + `CSS`:acronym(Cascading Style Sheets): is used for HTML, and + `CSS`:acronym(Content Scrambling System): is used for DVDs. + + Technical problem: current interpreted text syntax does not + recognize roles containing whitespace. Design problem: this smells + like programming language syntax, but reStructuredText is not a + programming language. + +2. Put the parameters inside the interpreted text:: + + `CSS (Cascading Style Sheets)`:acronym: is used for HTML, and + `CSS (Content Scrambling System)`:acronym: is used for DVDs. + + Although this could be defined on an individual basis (per role), + we ought to have a standard. Hyperlinks with embedded URIs already + use angle brackets; perhaps they could be used here too:: + + `CSS <Cascading Style Sheets>`:acronym: is used for HTML, and + `CSS <Content Scrambling System>`:acronym: is used for DVDs. + + Do angle brackets connote URLs too much for this to be acceptable? + How about the "tag" connotation -- does it save them or doom them? + +3. `Nested inline markup`_ could prove useful here:: + + `CSS :def:`Cascading Style Sheets``:acronym: is used for HTML, + and `CSS :def:`Content Scrambling System``:acronym: is used for + DVDs. + + Inline markup roles could even define the default roles of nested + inline markup, allowing this cleaner syntax:: + + `CSS `Cascading Style Sheets``:acronym: is used for HTML, and + `CSS `Content Scrambling System``:acronym: is used for DVDs. + +Does this push inline markup too far? Readability becomes a serious +issue. Substitutions may provide a better alternative (at the expense +of verbosity and duplication) by pulling the details out of the text +flow:: + + |CSS| is used for HTML, and |CSS-DVD| is used for DVDs. + + .. |CSS| acronym:: Cascading Style Sheets + .. |CSS-DVD| acronym:: Content Scrambling System + :text: CSS + +---------------------------------------------------------------------- + +This whole idea may be going beyond the scope of reStructuredText. +Documents requiring this functionality may be better off using XML or +another markup system. + +This argument comes up regularly when pushing the envelope of +reStructuredText syntax. I think it's a useful argument in that it +provides a check on creeping featurism. In many cases, the resulting +verbosity produces such unreadable plaintext that there's a natural +desire *not* to use it unless absolutely necessary. It's a matter of +finding the right balance. + + +Syntax for Interpreted Text Role Bindings +========================================= + +The following syntax (idea from Jeffrey C. Jacobs) could be used to +associate directives with roles:: + + .. :rewrite: class:: rewrite + + `She wore ribbons in her hair and it lay with streaks of + grey`:rewrite: + +The syntax is similar to that of substitution declarations, and the +directive/role association may resolve implementation issues. The +semantics, ramifications, and implementation details would need to be +worked out. + +The example above would implement the "rewrite" role as adding a +``class="rewrite"`` attribute to the interpreted text ("inline" +element). The stylesheet would then pick up on the "class" attribute +to do the actual formatting. + +The advantage of the new syntax would be flexibility. Uses other than +"class" may present themselves. The disadvantage is complexity: +having to implement new syntax for a relatively specialized operation, +and having new semantics in existing directives ("class::" would do +something different). + +The `"role" directive`__ has been implemented. + +__ ../../ref/rst/directives.html#role + + +Character Processing +==================== + +Several people have suggested adding some form of character processing +to reStructuredText: + +* Some sort of automated replacement of ASCII sequences: + + - ``--`` to em-dash (or ``--`` to en-dash, and ``---`` to em-dash). + - Convert quotes to curly quote entities. (Essentially impossible + for HTML? Unnecessary for TeX.) + - Various forms of ``:-)`` to smiley icons. + - ``"\ "`` to . Problem with line-wrapping though: it could + end up escaping the newline. + - Escaped newlines to <BR>. + - Escaped period or quote or dash as a disappearing catalyst to + allow character-level inline markup? + +* XML-style character entities, such as "©" for the copyright + symbol. + +Docutils has no need of a character entity subsystem. Supporting +Unicode and text encodings, character entities should be directly +represented in the text: a copyright symbol should be represented by +the copyright symbol character. If this is not possible in an +authoring environment, a pre-processing stage can be added, or a table +of substitution definitions can be devised. + +A "unicode" directive has been implemented to allow direct +specification of esoteric characters. In combination with the +substitution construct, "include" files defining common sets of +character entities can be defined and used. `A set of character +entity set definition files have been defined`__ (`tarball`__). +There's also `a description and instructions for use`__. + +__ http://docutils.sf.net/tmp/charents/ +__ http://docutils.sf.net/tmp/charents.tgz +__ http://docutils.sf.net/tmp/charents/README.html + +To allow for `character-level inline markup`_, a limited form of +character processing has been added to the spec and parser: escaped +whitespace characters are removed from the processed document. Any +further character processing will be of this functional type, rather +than of the character-encoding type. + +.. _character-level inline markup: + ../../ref/rst/restructuredtext.html#character-level-inline-markup + +* Directive idea:: + + .. text-replace:: "pattern" "replacement" + + - Support Unicode "U+XXXX" codes. + - Support regexps, perhaps with alternative "regexp-replace" + directive. + - Flags for regexps; ":flags:" option, or individuals. + - Specifically, should the default be case-sensistive or + -insensitive? + + +Page Or Line Breaks +=================== + +* Should ^L (or something else in reST) be defined to mean + force/suggest page breaks in whatever output we have? + + A "break" or "page-break" directive would be easy to add. A new + doctree element would be required though (perhaps "break"). The + final behavior would be up to the Writer. The directive argument + could be one of page/column/recto/verso for added flexibility. + + Currently ^L (Python's ``\f``) characters are treated as whitespace. + They're converted to single spaces, actually, as are vertical tabs + (^K, Python's ``\v``). It would be possible to recognize form feeds + as markup, but it requires some thought and discussion first. Are + there any downsides? Many editing environments do not allow the + insertion of control characters. Will it cause any harm? It would + be useful as a shorthand for the directive. + + It's common practice to use ^L before Emacs "Local Variables" + lists:: + + ^L + .. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: + + These are already present in many PEPs and Docutils project + documents. From the Emacs manual (info): + + A "local variables list" goes near the end of the file, in the + last page. (It is often best to put it on a page by itself.) + + It would be unfortunate if this construct caused a final blank page + to be generated (for those Writers that recognize the page breaks). + We'll have to add a transform that looks for a "break" plus zero or + more comments at the end of a document, and removes them. + + Probably a bad idea because there is no such thing as a page in a + generic document format. + +* Could the "break" concept above be extended to inline forms? + E.g. "^L" in the middle of a sentence could cause a line break. + Only recognize it at the end of a line (i.e., ``\f\n``)? + + Or is formfeed inappropriate? Perhaps vertical tab (``\v``), but + even that's a stretch. Can't use carriage returns, since they're + commonly used for line endings. + + Probably a bad idea as well because we do not want to use control + characters for well-readable and well-writable markup, and after all + we have the line block syntax for line breaks. + + +Superscript Markup +================== + +Add ``^superscript^`` inline markup? The only common non-markup uses +of "^" I can think of are as short hand for "superscript" itself and +for describing control characters ("^C to cancel"). The former +supports the proposed syntax, and it could be argued that the latter +ought to be literal text anyhow (e.g. "``^C`` to cancel"). + +However, superscripts are seldom needed, and new syntax would break +existing documents. When it's needed, the ``:superscript:`` +(``:sup:``) role can we used as well. + + +Code Execution +============== + +Add the following directives? + +- "exec": Execute Python code & insert the results. Call it + "python" to allow for other languages? + +- "system": Execute an ``os.system()`` call, and insert the results + (possibly as a literal block). Definitely dangerous! How to make + it safe? Perhaps such processing should be left outside of the + document, in the user's production system (a makefile or a script or + whatever). Or, the directive could be disabled by default and only + enabled with an explicit command-line option or config file setting. + Even then, an interactive prompt may be useful, such as: + + The file.txt document you are processing contains a "system" + directive requesting that the ``sudo rm -rf /`` command be + executed. Allow it to execute? (y/N) + +- "eval": Evaluate an expression & insert the text. At parse + time or at substitution time? Dangerous? Perhaps limit to canned + macros; see text.date_. + + .. _text.date: ../todo.html#text-date + +It's too dangerous (or too complicated in the case of "eval"). We do +not want to have such things in the core. + + +``encoding`` Directive +====================== + +Add an "encoding" directive to specify the character encoding of the +input data? Not a good idea for the following reasons: + +- When it sees the directive, the parser will already have read the + input data, and encoding determination will already have been done. + +- If a file with an "encoding" directive is edited and saved with + a different encoding, the directive may cause data corruption. + + +Support for Annotations +======================= + +Add an "annotation" role, as the equivalent of the HTML "title" +attribute? This is secondary information that may "pop up" when the +pointer hovers over the main text. A corresponding directive would be +required to associate annotations with the original text (by name, or +positionally as in anonymous targets?). + +There have not been many requests for such feature, though. Also, +cluttering WYSIWYG plaintext with annotations may not seem like a good +idea, and there is no "tool tip" in formats other than HTML. + + +``term`` Role +============= + +Add a "term" role for unfamiliar or specialized terminology? Probably +not; there is no real use case, and emphasis is enough for most cases. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/docs/dev/rst/problems.txt b/docs/dev/rst/problems.txt new file mode 100644 index 000000000..bc0101cbf --- /dev/null +++ b/docs/dev/rst/problems.txt @@ -0,0 +1,872 @@ +============================== + Problems With StructuredText +============================== +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:Revision: $Revision$ +:Date: $Date$ +:Copyright: This document has been placed in the public domain. + +There are several problems, unresolved issues, and areas of +controversy within StructuredText_ (Classic and Next Generation). In +order to resolve all these issues, this analysis brings all of the +issues out into the open, enumerates all the alternatives, and +proposes solutions to be incorporated into the reStructuredText_ +specification. + + +.. contents:: + + +Formal Specification +==================== + +The description in the original StructuredText.py has been criticized +for being vague. For practical purposes, "the code *is* the spec." +Tony Ibbs has been working on deducing a `detailed description`_ from +the documentation and code of StructuredTextNG_. Edward Loper's +STMinus_ is another attempt to formalize a spec. + +For this kind of a project, the specification should always precede +the code. Otherwise, the markup is a moving target which can never be +adopted as a standard. Of course, a specification may be revised +during lifetime of the code, but without a spec there is no visible +control and thus no confidence. + + +Understanding and Extending the Code +==================================== + +The original StructuredText_ is a dense mass of sparsely commented +code and inscrutable regular expressions. It was not designed to be +extended and is very difficult to understand. StructuredTextNG_ has +been designed to allow input (syntax) and output extensions, but its +documentation (both internal [comments & docstrings], and external) is +inadequate for the complexity of the code itself. + +For reStructuredText to become truly useful, perhaps even part of +Python's standard library, it must have clear, understandable +documentation and implementation code. For the implementation of +reStructuredText to be taken seriously, it must be a sterling example +of the potential of docstrings; the implementation must practice what +the specification preaches. + + +Section Structure via Indentation +================================= + +Setext_ required that body text be indented by 2 spaces. The original +StructuredText_ and StructuredTextNG_ require that section structure +be indicated through indentation, as "inspired by Python". For +certain structures with a very limited, local extent (such as lists, +block quotes, and literal blocks), indentation naturally indicates +structure or hierarchy. For sections (which may have a very large +extent), structure via indentation is unnecessary, unnatural and +ambiguous. Rather, the syntax of the section title *itself* should +indicate that it is a section title. + +The original StructuredText states that "A single-line paragraph whose +immediately succeeding paragraphs are lower level is treated as a +header." Requiring indentation in this way is: + +- Unnecessary. The vast majority of docstrings and standalone + documents will have no more than one level of section structure. + Requiring indentation for such docstrings is unnecessary and + irritating. + +- Unnatural. Most published works use title style (type size, face, + weight, and position) and/or section/subsection numbering rather + than indentation to indicate hierarchy. This is a tradition with a + very long history. + +- Ambiguous. A StructuredText header is indistinguishable from a + one-line paragraph followed by a block quote (precluding the use of + block quotes). Enumerated section titles are ambiguous (is it a + header? is it a list item?). Some additional adornment must be + required to confirm the line's role as a title, both to a parser and + to the human reader of the source text. + +Python's use of significant whitespace is a wonderful (if not +original) innovation, however requiring indentation in ordinary +written text is hypergeneralization. + +reStructuredText_ indicates section structure through title adornment +style (as exemplified by this document). This is far more natural. +In fact, it is already in widespread use in plain text documents, +including in Python's standard distribution (such as the toplevel +README_ file). + + +Character Escaping Mechanism +============================ + +No matter what characters are chosen for markup, some day someone will +want to write documentation *about* that markup or using markup +characters in a non-markup context. Therefore, any complete markup +language must have an escaping or encoding mechanism. For a +lightweight markup system, encoding mechanisms like SGML/XML's '*' +are out. So an escaping mechanism is in. However, with carefully +chosen markup, it should be necessary to use the escaping mechanism +only infrequently. + +reStructuredText_ needs an escaping mechanism: a way to treat +markup-significant characters as the characters themselves. Currently +there is no such mechanism (although ZWiki uses '!'). What are the +candidates? + +1. ``!`` + (http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/NGEscaping) +2. ``\`` +3. ``~`` +4. doubling of characters + +The best choice for this is the backslash (``\``). It's "the single +most popular escaping character in the world!", therefore familiar and +unsurprising. Since characters only need to be escaped under special +circumstances, which are typically those explaining technical +programming issues, the use of the backslash is natural and +understandable. Python docstrings can be raw (prefixed with an 'r', +as in 'r""'), which would obviate the need for gratuitous doubling-up +of backslashes. + +(On 2001-03-29 on the Doc-SIG mailing list, GvR endorsed backslash +escapes, saying, "'nuff said. Backslash it is." Although neither +legally binding nor irrevocable nor any kind of guarantee of anything, +it is a good sign.) + +The rule would be: An unescaped backslash followed by any markup +character escapes the character. The escaped character represents the +character itself, and is prevented from playing a role in any markup +interpretation. The backslash is removed from the output. A literal +backslash is represented by an "escaped backslash," two backslashes in +a row. + +A carefully constructed set of recognition rules for inline markup +will obviate the need for backslash-escapes in almost all cases; see +`Delimitation of Inline Markup`_ below. + +When an expression (requiring backslashes and other characters used +for markup) becomes too complicated and therefore unreadable, a +literal block may be used instead. Inside literal blocks, no markup +is recognized, therefore backslashes (for the purpose of escaping +markup) become unnecessary. + +We could allow backslashes preceding non-markup characters to remain +in the output. This would make describing regular expressions and +other uses of backslashes easier. However, this would complicate the +markup rules and would be confusing. + + +Blank Lines in Lists +==================== + +Oft-requested in Doc-SIG (the earliest reference is dated 1996-08-13) +is the ability to write lists without requiring blank lines between +items. In docstrings, space is at a premium. Authors want to convey +their API or usage information in as compact a form as possible. +StructuredText_ requires blank lines between all body elements, +including list items, even when boundaries are obvious from the markup +itself. + +In reStructuredText, blank lines are optional between list items. +However, in order to eliminate ambiguity, a blank line is required +before the first list item and after the last. Nested lists also +require blank lines before the list start and after the list end. + + +Bullet List Markup +================== + +StructuredText_ includes 'o' as a bullet character. This is dangerous +and counter to the language-independent nature of the markup. There +are many languages in which 'o' is a word. For example, in Spanish:: + + Llamame a la casa + o al trabajo. + + (Call me at home or at work.) + +And in Japanese (when romanized):: + + Senshuu no doyoubi ni tegami + o kakimashita. + + ([I] wrote a letter on Saturday last week.) + +If a paragraph containing an 'o' word wraps such that the 'o' is the +first text on a line, or if a paragraph begins with such a word, it +could be misinterpreted as a bullet list. + +In reStructuredText_, 'o' is not used as a bullet character. '-', +'*', and '+' are the possible bullet characters. + + +Enumerated List Markup +====================== + +StructuredText enumerated lists are allowed to begin with numbers and +letters followed by a period or right-parenthesis, then whitespace. +This has surprising consequences for writing styles. For example, +this is recognized as an enumerated list item by StructuredText:: + + Mr. Creosote. + +People will write enumerated lists in all different ways. It is folly +to try to come up with the "perfect" format for an enumerated list, +and limit the docstring parser's recognition to that one format only. + +Rather, the parser should recognize a variety of enumerator styles. +It is also recommended that the enumerator of the first list item be +ordinal-1 ('1', 'A', 'a', 'I', or 'i'), as output formats may not be +able to begin a list at an arbitrary enumeration. + +An initial idea was to require two or more consistent enumerated list +items in a row. This idea proved impractical and was dropped. In +practice, the presence of a proper enumerator is enough to reliably +recognize an enumerated list item; any ambiguities are reported by the +parser. Here's the original idea for posterity: + + The parser should recognize a variety of enumerator styles, mark + each block as a potential enumerated list item (PELI), and + interpret the enumerators of adjacent PELIs to decide whether they + make up a consistent enumerated list. + + If a PELI is labeled with a "1.", and is immediately followed by a + PELI labeled with a "2.", we've got an enumerated list. Or "(A)" + followed by "(B)". Or "i)" followed by "ii)", etc. The chances + of accidentally recognizing two adjacent and consistently labeled + PELIs, are acceptably small. + + For an enumerated list to be recognized, the following must be + true: + + - the list must consist of multiple adjacent list items (2 or + more) + - the enumerators must all have the same format + - the enumerators must be sequential + + +Definition List Markup +====================== + +StructuredText uses ' -- ' (whitespace, two hyphens, whitespace) on +the first line of a paragraph to indicate a definition list item. The +' -- ' serves to separate the term (on the left) from the definition +(on the right). + +Many people use ' -- ' as an em-dash in their text, conflicting with +the StructuredText usage. Although the Chicago Manual of Style says +that spaces should not be used around an em-dash, Peter Funk pointed +out that this is standard usage in German (according to the Duden, the +official German reference), and possibly in other languages as well. +The widespread use of ' -- ' precludes its use for definition lists; +it would violate the "unsurprising" criterion. + +A simpler, and at least equally visually distinctive construct +(proposed by Guido van Rossum, who incidentally is a frequent user of +' -- ') would do just as well:: + + term 1 + Definition. + + term 2 + Definition 2, paragraph 1. + + Definition 2, paragraph 2. + +A reStructuredText definition list item consists of a term and a +definition. A term is a simple one-line paragraph. A definition is a +block indented relative to the term, and may contain multiple +paragraphs and other body elements. No blank line precedes a +definition (this distinguishes definition lists from block quotes). + + +Literal Blocks +============== + +The StructuredText_ specification has literal blocks indicated by +'example', 'examples', or '::' ending the preceding paragraph. STNG +only recognizes '::'; 'example'/'examples' are not implemented. This +is good; it fixes an unnecessary language dependency. The problem is +what to do with the sometimes- unwanted '::'. + +In reStructuredText_ '::' at the end of a paragraph indicates that +subsequent *indented* blocks are treated as literal text. No further +markup interpretation is done within literal blocks (not even +backslash-escapes). If the '::' is preceded by whitespace, '::' is +omitted from the output; if '::' was the sole content of a paragraph, +the entire paragraph is removed (no 'empty' paragraph remains). If +'::' is preceded by a non-whitespace character, '::' is replaced by +':' (i.e., the extra colon is removed). + +Thus, a section could begin with a literal block as follows:: + + Section Title + ------------- + + :: + + print "this is example literal" + + +Tables +====== + +The table markup scheme in classic StructuredText was horrible. Its +omission from StructuredTextNG is welcome, and its markup will not be +repeated here. However, tables themselves are useful in +documentation. Alternatives: + +1. This format is the most natural and obvious. It was independently + invented (no great feat of creation!), and later found to be the + format supported by the `Emacs table mode`_:: + + +------------+------------+------------+--------------+ + | Header 1 | Header 2 | Header 3 | Header 4 | + +============+============+============+==============+ + | Column 1 | Column 2 | Column 3 & 4 span (Row 1) | + +------------+------------+------------+--------------+ + | Column 1 & 2 span | Column 3 | - Column 4 | + +------------+------------+------------+ - Row 2 & 3 | + | 1 | 2 | 3 | - span | + +------------+------------+------------+--------------+ + + Tables are described with a visual outline made up of the + characters '-', '=', '|', and '+': + + - The hyphen ('-') is used for horizontal lines (row separators). + - The equals sign ('=') is optionally used as a header separator + (as of version 1.5.24, this is not supported by the Emacs table + mode). + - The vertical bar ('|') is used for for vertical lines (column + separators). + - The plus sign ('+') is used for intersections of horizontal and + vertical lines. + + Row and column spans are possible simply by omitting the column or + row separators, respectively. The header row separator must be + complete; in other words, a header cell may not span into the table + body. Each cell contains body elements, and may have multiple + paragraphs, lists, etc. Initial spaces for a left margin are + allowed; the first line of text in a cell determines its left + margin. + +2. Below is a simpler table structure. It may be better suited to + manual input than alternative #1, but there is no Emacs editing + mode available. One disadvantage is that it resembles section + titles; a one-column table would look exactly like section & + subsection titles. :: + + ============ ============ ============ ============== + Header 1 Header 2 Header 3 Header 4 + ============ ============ ============ ============== + Column 1 Column 2 Column 3 & 4 span (Row 1) + ------------ ------------ --------------------------- + Column 1 & 2 span Column 3 - Column 4 + ------------------------- ------------ - Row 2 & 3 + 1 2 3 - span + ============ ============ ============ ============== + + The table begins with a top border of equals signs with a space at + each column boundary (regardless of spans). Each row is + underlined. Internal row separators are underlines of '-', with + spaces at column boundaries. The last of the optional head rows is + underlined with '=', again with spaces at column boundaries. + Column spans have no spaces in their underline. Row spans simply + lack an underline at the row boundary. The bottom boundary of the + table consists of '=' underlines. A blank line is required + following a table. + +3. A minimalist alternative is as follows:: + + ==== ===== ======== ======== ======= ==== ===== ===== + Old State Input Action New State Notes + ----------- -------- ----------------- ----------- + ids types new type sys.msg. dupname ids types + ==== ===== ======== ======== ======= ==== ===== ===== + -- -- explicit -- -- new True + -- -- implicit -- -- new False + None False explicit -- -- new True + old False explicit implicit old new True + None True explicit explicit new None True + old True explicit explicit new,old None True [1] + None False implicit implicit new None False + old False implicit implicit new,old None False + None True implicit implicit new None True + old True implicit implicit new old True + ==== ===== ======== ======== ======= ==== ===== ===== + + The table begins with a top border of equals signs with one or more + spaces at each column boundary (regardless of spans). There must + be at least two columns in the table (to differentiate it from + section headers). Each line starts a new row. The rightmost + column is unbounded; text may continue past the edge of the table. + Each row/line must contain spaces at column boundaries, except for + explicit column spans. Underlines of '-' can be used to indicate + column spans, but should be used sparingly if at all. Lines + containing column span underlines may not contain any other text. + The last of the optional head rows is underlined with '=', again + with spaces at column boundaries. The bottom boundary of the table + consists of '=' underlines. A blank line is required following a + table. + + This table sums up the features. Using all the features in such a + small space is not pretty though:: + + ======== ======== ======== + Header 2 & 3 Span + ------------------ + Header 1 Header 2 Header 3 + ======== ======== ======== + Each line is a new row. + Each row consists of one line only. + Row spans are not possible. + The last column may spill over to the right. + Column spans are possible with an underline joining columns. + ---------------------------- + The span is limited to the row above the underline. + ======== ======== ======== + +4. As a variation of alternative 3, bullet list syntax in the first + column could be used to indicate row starts. Multi-line rows are + possible, but row spans are not. For example:: + + ===== ===== + col 1 col 2 + ===== ===== + - 1 Second column of row 1. + - 2 Second column of row 2. + Second line of paragraph. + - 3 Second column of row 3. + + Second paragraph of row 3, + column 2 + ===== ===== + + Column spans would be indicated on the line after the last line of + the row. To indicate a real bullet list within a first-column + cell, simply nest the bullets. + +5. In a further variation, we could simply assume that whitespace in + the first column implies a multi-line row; the text in other + columns is continuation text. For example:: + + ===== ===== + col 1 col 2 + ===== ===== + 1 Second column of row 1. + 2 Second column of row 2. + Second line of paragraph. + 3 Second column of row 3. + + Second paragraph of row 3, + column 2 + ===== ===== + + Limitations of this approach: + + - Cells in the first column are limited to one line of text. + + - Cells in the first column *must* contain some text; blank cells + would lead to a misinterpretation. An empty comment ("..") is + sufficient. + +6. Combining alternative 3 and 4, a bullet list in the first column + could mean multi-line rows, and no bullet list means single-line + rows only. + +Alternatives 1 and 5 has been adopted by reStructuredText. + + +Delimitation of Inline Markup +============================= + +StructuredText specifies that inline markup must begin with +whitespace, precluding such constructs as parenthesized or quoted +emphatic text:: + + "**What?**" she cried. (*exit stage left*) + +The `reStructuredText markup specification`_ allows for such +constructs and disambiguates inline markup through a set of +recognition rules. These recognition rules define the context of +markup start-strings and end-strings, allowing markup characters to be +used in most non-markup contexts without a problem (or a backslash). +So we can say, "Use asterisks (*) around words or phrases to +*emphasisze* them." The '(*)' will not be recognized as markup. This +reduces the need for markup escaping to the point where an escape +character is *almost* (but not quite!) unnecessary. + + +Underlining +=========== + +StructuredText uses '_text_' to indicate underlining. To quote David +Ascher in his 2000-01-21 Doc-SIG mailing list post, "Docstring +grammar: a very revised proposal": + + The tagging of underlined text with _'s is suboptimal. Underlines + shouldn't be used from a typographic perspective (underlines were + designed to be used in manuscripts to communicate to the + typesetter that the text should be italicized -- no well-typeset + book ever uses underlines), and conflict with double-underscored + Python variable names (__init__ and the like), which would get + truncated and underlined when that effect is not desired. Note + that while *complete* markup would prevent that truncation + ('__init__'), I think of docstring markups much like I think of + type annotations -- they should be optional and above all do no + harm. In this case the underline markup does harm. + +Underlining is not part of the reStructuredText specification. + + +Inline Literals +=============== + +StructuredText's markup for inline literals (text left as-is, +verbatim, usually in a monospaced font; as in HTML <TT>) is single +quotes ('literals'). The problem with single quotes is that they are +too often used for other purposes: + +- Apostrophes: "Don't blame me, 'cause it ain't mine, it's Chris'."; + +- Quoting text: + + First Bruce: "Well Bruce, I heard the prime minister use it. + 'S'hot enough to boil a monkey's bum in 'ere your Majesty,' he + said, and she smiled quietly to herself." + + In the UK, single quotes are used for dialogue in published works. + +- String literals: s = '' + +Alternatives:: + + 'text' \'text\' ''text'' "text" \"text\" ""text"" + #text# @text@ `text` ^text^ ``text'' ``text`` + +The examples below contain inline literals, quoted text, and +apostrophes. Each example should evaluate to the following HTML:: + + Some <TT>code</TT>, with a 'quote', "double", ain't it grand? + Does <TT>a[b] = 'c' + "d" + `2^3`</TT> work? + + 0. Some code, with a quote, double, ain't it grand? + Does a[b] = 'c' + "d" + `2^3` work? + 1. Some 'code', with a \'quote\', "double", ain\'t it grand? + Does 'a[b] = \'c\' + "d" + `2^3`' work? + 2. Some \'code\', with a 'quote', "double", ain't it grand? + Does \'a[b] = 'c' + "d" + `2^3`\' work? + 3. Some ''code'', with a 'quote', "double", ain't it grand? + Does ''a[b] = 'c' + "d" + `2^3`'' work? + 4. Some "code", with a 'quote', \"double\", ain't it grand? + Does "a[b] = 'c' + "d" + `2^3`" work? + 5. Some \"code\", with a 'quote', "double", ain't it grand? + Does \"a[b] = 'c' + "d" + `2^3`\" work? + 6. Some ""code"", with a 'quote', "double", ain't it grand? + Does ""a[b] = 'c' + "d" + `2^3`"" work? + 7. Some #code#, with a 'quote', "double", ain't it grand? + Does #a[b] = 'c' + "d" + `2^3`# work? + 8. Some @code@, with a 'quote', "double", ain't it grand? + Does @a[b] = 'c' + "d" + `2^3`@ work? + 9. Some `code`, with a 'quote', "double", ain't it grand? + Does `a[b] = 'c' + "d" + \`2^3\`` work? + 10. Some ^code^, with a 'quote', "double", ain't it grand? + Does ^a[b] = 'c' + "d" + `2\^3`^ work? + 11. Some ``code'', with a 'quote', "double", ain't it grand? + Does ``a[b] = 'c' + "d" + `2^3`'' work? + 12. Some ``code``, with a 'quote', "double", ain't it grand? + Does ``a[b] = 'c' + "d" + `2^3\``` work? + +Backquotes (#9 & #12) are the best choice. They are unobtrusive and +relatviely rarely used (more rarely than ' or ", anyhow). Backquotes +have the connotation of 'quotes', which other options (like carets, +#10) don't. + +Analogously with ``*emph*`` & ``**strong**``, double-backquotes (#12) +could be used for inline literals. If single-backquotes are used for +'interpreted text' (context-sensitive domain-specific descriptive +markup) such as function name hyperlinks in Python docstrings, then +double-backquotes could be used for absolute-literals, wherein no +processing whatsoever takes place. An advantage of double-backquotes +would be that backslash-escaping would no longer be necessary for +embedded single-backquotes; however, embedded double-backquotes (in an +end-string context) would be illegal. See `Backquotes in +Phrase-Links`__ in `Record of reStructuredText Syntax Alternatives`__. + +__ alternatives.html#backquotes-in-phrase-links +__ alternatives.html + +Alternative choices are carets (#10) and TeX-style quotes (#11). For +examples of TeX-style quoting, see +http://www.zope.org/Members/jim/StructuredTextWiki/CustomizingTheDocumentProcessor. + +Some existing uses of backquotes: + +1. As a synonym for repr() in Python. +2. For command-interpolation in shell scripts. +3. Used as open-quotes in TeX code (and carried over into plaintext + by TeXies). + +The inline markup start-string and end-string recognition rules +defined by the `reStructuredText markup specification`_ would allow +all of these cases inside inline literals, with very few exceptions. +As a fallback, literal blocks could handle all cases. + +Outside of inline literals, the above uses of backquotes would require +backslash-escaping. However, these are all prime examples of text +that should be marked up with inline literals. + +If either backquotes or straight single-quotes are used as markup, +TeX-quotes are too troublesome to support, so no special-casing of +TeX-quotes should be done (at least at first). If TeX-quotes have to +be used outside of literals, a single backslash-escaped would suffice: +\``TeX quote''. Ugly, true, but very infrequently used. + +Using literal blocks is a fallback option which removes the need for +backslash-escaping:: + + like this:: + + Here, we can do ``absolutely'' anything `'`'\|/|\ we like! + +No mechanism for inline literals is perfect, just as no escaping +mechanism is perfect. No matter what we use, complicated inline +expressions involving the inline literal quote and/or the backslash +will end up looking ugly. We can only choose the least often ugly +option. + +reStructuredText will use double backquotes for inline literals, and +single backqoutes for interpreted text. + + +Hyperlinks +========== + +There are three forms of hyperlink currently in StructuredText_: + +1. (Absolute & relative URIs.) Text enclosed by double quotes + followed by a colon, a URI, and concluded by punctuation plus white + space, or just white space, is treated as a hyperlink:: + + "Python":http://www.python.org/ + +2. (Absolute URIs only.) Text enclosed by double quotes followed by a + comma, one or more spaces, an absolute URI and concluded by + punctuation plus white space, or just white space, is treated as a + hyperlink:: + + "mail me", mailto:me@mail.com + +3. (Endnotes.) Text enclosed by brackets link to an endnote at the + end of the document: at the beginning of the line, two dots, a + space, and the same text in brackets, followed by the end note + itself:: + + Please refer to the fine manual [GVR2001]. + + .. [GVR2001] Python Documentation, Release 2.1, van Rossum, + Drake, et al., http://www.python.org/doc/ + +The problem with forms 1 and 2 is that they are neither intuitive nor +unobtrusive (they break design goals 5 & 2). They overload +double-quotes, which are too often used in ordinary text (potentially +breaking design goal 4). The brackets in form 3 are also too common +in ordinary text (such as [nested] asides and Python lists like [12]). + +Alternatives: + +1. Have no special markup for hyperlinks. + +2. A. Interpret and mark up hyperlinks as any contiguous text + containing '://' or ':...@' (absolute URI) or '@' (email + address) after an alphanumeric word. To de-emphasize the URI, + simply enclose it in parentheses: + + Python (http://www.python.org/) + + B. Leave special hyperlink markup as a domain-specific extension. + Hyperlinks in ordinary reStructuredText documents would be + required to be standalone (i.e. the URI text inline in the + document text). Processed hyperlinks (where the URI text is + hidden behind the link) are important enough to warrant syntax. + +3. The original Setext_ introduced a mechanism of indirect hyperlinks. + A source link word ('hot word') in the text was given a trailing + underscore:: + + Here is some text with a hyperlink_ built in. + + The hyperlink itself appeared at the end of the document on a line + by itself, beginning with two dots, a space, the link word with a + leading underscore, whitespace, and the URI itself:: + + .. _hyperlink http://www.123.xyz + + Setext used ``underscores_instead_of_spaces_`` for phrase links. + +With some modification, alternative 3 best satisfies the design goals. +It has the advantage of being readable and relatively unobtrusive. +Since each source link must match up to a target, the odd variable +ending in an underscore can be spared being marked up (although it +should generate a "no such link target" warning). The only +disadvantage is that phrase-links aren't possible without some +obtrusive syntax. + +We could achieve phrase-links if we enclose the link text: + +1. in double quotes:: + + "like this"_ + +2. in brackets:: + + [like this]_ + +3. or in backquotes:: + + `like this`_ + +Each gives us somewhat obtrusive markup, but that is unavoidable. The +bracketed syntax (#2) is reminiscent of links on many web pages +(intuitive), although it is somewhat obtrusive. Alternative #3 is +much less obtrusive, and is consistent with interpreted text: the +trailing underscore indicates the interpretation of the phrase, as a +hyperlink. #3 also disambiguates hyperlinks from footnote references. +Alternative #3 wins. + +The same trailing underscore markup can also be used for footnote and +citation references, removing the problem with ordinary bracketed text +and Python lists:: + + Please refer to the fine manual [GVR2000]_. + + .. [GVR2000] Python Documentation, van Rossum, Drake, et al., + http://www.python.org/doc/ + +The two-dots-and-a-space syntax was generalized by Setext for +comments, which are removed from the (visible) processed output. +reStructuredText uses this syntax for comments, footnotes, and link +target, collectively termed "explicit markup". For link targets, in +order to eliminate ambiguity with comments and footnotes, +reStructuredText specifies that a colon always follow the link target +word/phrase. The colon denotes 'maps to'. There is no reason to +restrict target links to the end of the document; they could just as +easily be interspersed. + +Internal hyperlinks (links from one point to another within a single +document) can be expressed by a source link as before, and a target +link with a colon but no URI. In effect, these targets 'map to' the +element immediately following. + +As an added bonus, we now have a perfect candidate for +reStructuredText directives, a simple extension mechanism: explicit +markup containing a single word followed by two colons and whitespace. +The interpretation of subsequent data on the directive line or +following is directive-dependent. + +To summarize:: + + .. This is a comment. + + .. The line below is an example of a directive. + .. version:: 1 + + This is a footnote [1]_. + + This internal hyperlink will take us to the footnotes_ area below. + + Here is a one-word_ external hyperlink. + + Here is `a hyperlink phrase`_. + + .. _footnotes: + .. [1] Footnote text goes here. + + .. external hyperlink target mappings: + .. _one-word: http://www.123.xyz + .. _a hyperlink phrase: http://www.123.xyz + +The presence or absence of a colon after the target link +differentiates an indirect hyperlink from a footnote, respectively. A +footnote requires brackets. Backquotes around a target link word or +phrase are required if the phrase contains a colon, optional +otherwise. + +Below are examples using no markup, the two StructuredText hypertext +styles, and the reStructuredText hypertext style. Each example +contains an indirect link, a direct link, a footnote/endnote, and +bracketed text. In HTML, each example should evaluate to:: + + <P>A <A HREF="http://spam.org">URI</A>, see <A HREF="#eggs2000"> + [eggs2000]</A> (in Bacon [Publisher]). Also see + <A HREF="http://eggs.org">http://eggs.org</A>.</P> + + <P><A NAME="eggs2000">[eggs2000]</A> "Spam, Spam, Spam, Eggs, + Bacon, and Spam"</P> + +1. No markup:: + + A URI http://spam.org, see eggs2000 (in Bacon [Publisher]). + Also see http://eggs.org. + + eggs2000 "Spam, Spam, Spam, Eggs, Bacon, and Spam" + +2. StructuredText absolute/relative URI syntax + ("text":http://www.url.org):: + + A "URI":http://spam.org, see [eggs2000] (in Bacon [Publisher]). + Also see "http://eggs.org":http://eggs.org. + + .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" + + Note that StructuredText does not recognize standalone URIs, + forcing doubling up as shown in the second line of the example + above. + +3. StructuredText absolute-only URI syntax + ("text", mailto:you@your.com):: + + A "URI", http://spam.org, see [eggs2000] (in Bacon + [Publisher]). Also see "http://eggs.org", http://eggs.org. + + .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" + +4. reStructuredText syntax:: + + 4. A URI_, see [eggs2000]_ (in Bacon [Publisher]). + Also see http://eggs.org. + + .. _URI: http:/spam.org + .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" + +The bracketed text '[Publisher]' may be problematic with +StructuredText (syntax 2 & 3). + +reStructuredText's syntax (#4) is definitely the most readable. The +text is separated from the link URI and the footnote, resulting in +cleanly readable text. + +.. _StructuredText: + http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/FrontPage +.. _Setext: http://docutils.sourceforge.net/mirror/setext.html +.. _reStructuredText: http://docutils.sourceforge.net/rst.html +.. _detailed description: + http://homepage.ntlworld.com/tibsnjoan/docutils/STNG-format.html +.. _STMinus: http://www.cis.upenn.edu/~edloper/pydoc/stminus.html +.. _StructuredTextNG: + http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/StructuredTextNG +.. _README: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/ + python/python/dist/src/README +.. _Emacs table mode: http://table.sourceforge.net/ +.. _reStructuredText Markup Specification: + ../../ref/rst/restructuredtext.html + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: |