================ Docutils Notes ================ :Date: $Date$ :Revision: $Revision$ .. contents:: To Do ===== General ------- - Document! - Internal module documentation. - User docs. - Doctree nodes (DTD element) semantics: - External (public) attributes (node.attributes). - Internal attributes (node.*). - Linking mechanism. - Refactor - Rename methods & variables according to the `coding conventions`_ below. - The name->id conversion and hyperlink resolution code needs to be checked for correctness and refactored. I'm afraid it's a bit of a spaghetti mess now. - Add validation? See http://pytrex.sourceforge.net, RELAX NG. - Ask Python-dev for opinions (GvR for a pronouncement) on special variables (__author__, __version__, etc.): convenience vs. namespace pollution. Ask opinions on whether or not Docutils should recognize & use them. - Provide a mechanism to pass options to Readers, Writers, and Parsers through docutils.core.publish/Publisher? Or create custom Reader/Writer/Parser objects first, and pass *them* to publish/Publisher? - In reader.get_reader_class (& parser & writer too), should we be importing 'standalone' or 'docutils.readers.standalone'? (This would avoid importing top-level modules if the module name is not in docutils/readers. Potential nastiness.) - Perhaps store a name->id mapping file? This could be stored permanently, read by subsequent processing runs, and updated with new entries. ("Persistent ID mapping"?) - The "Docutils System Messages" section appears even if no actual system messages are there. They must be below the threshold. The transform should be fixed. - TOC transform: use alt-text for inline images. Specification ------------- - Complete PEP 258 Docutils Design Specification. - Fill in the blanks in API details. - Specify the nodes.py internal data structure implementation. [Tibs:] Eventually we need to have direct documentation in there on how it all hangs together - the DTD is not enough (indeed, is it still meant to be correct? [Yes, it is.]). - Rework PEP 257, separating style from spec from tools, wrt Docutils? See Doc-SIG from 2001-06-19/20. - Add layout component to framework? Or part of the formatter? - Once doctree.txt is fleshed out, how about breaking (most of) it up and putting it into nodes.py as docstrings? reStructuredText Parser ----------------------- - Add motivation sections for constructs in spec. - Allow very long titles (on two or more lines)? - And for the sake of completeness, should definition list terms be allowed to be very long (two or more lines) also? - Allow hyperlink references to targets in other documents? Not in an HTML-centric way, though (it's trivial to say ``http://www.whatever.com/doc#name``, and useless in non-HTML contexts). XLink/XPointer? ``.. baseref::``? See Doc-SIG 2001-08-10. - Add character processing? For example: - ``--`` -> em-dash (or ``--`` -> en-dash, and ``---`` -> em-dash). (Look for pre-existing conventions.) - Convert quotes to curly quote entities. (Essentially impossible for HTML? Unnecessary for TeX. An output issue?) - Various forms of ``:-)`` to smiley icons. - ``"\ "`` ->  . - Escaped newlines ->
. - Escaped period or quote as a disappearing catalyst to allow character-level inline markup? - Others? How to represent character entities in the text though? Probably as Unicode. Which component is responsible for this, the parser, the reader, or the writer? - Implement the header row separator modification to table.el. (Wrote to Takaaki Ota & the table.el mailing list on 2001-08-12, suggesting support for '=====' header rows. On 2001-08-17 he replied, saying he'd put it on his to-do list, but "don't hold your breath".) - Tony says inline markup rule 7 could do with a *little* more exposition in the spec, to make clear what is going on for people with head colds. - Alan Jaffray suggested (and I agree) that it would be sensible to: - have a directive to specify a default role for interpreted text - allow the reST processor to take an argument for the default role - issue a warning when processing documents with no default role which contain interpreted text with no explicitly specified role - Fix the parser's indentation handling to conform with the stricter definition in the spec. (Explicit markup blocks should be strict or forgiving?) - Tighten up the spec for indentation of "constructs using complex markers": field lists and option lists? Bodies may begin on the same line as the marker or on a subsequent line (with blank lines optional). Require that for bodies beginning on the same line as the marker, all lines be in strict alignment. Currently, this is acceptable:: :Field-name-of-medium-length: Field body beginning on the same line as the field name. This proposal would make the above example illegal, instead requiring strict alignment. A field body may either begin on the same line:: :Field-name-of-medium-length: Field body beginning on the same line as the field name. Or it may begin on a subsequent line:: :Field-name-of-medium-length: Field body beginning on a line subsequent to that of the field name. This would be especially relevant in degenerate cases like this:: :Number-of-African-swallows-requried-to-carry-a-coconut: It would be very difficult to align the field body with the left edge of the first line if it began on the same line as the field name. - Allow syntax constructs to be added or disabled at run-time. - Make footnotes two-way, GNU-style? What if there are multiple references to a single footnote? - Add RFC-2822 header parsing (for PEP, email Readers). - Change ``.. meta::`` to use a "pending" element, only activated for HTML writers. - Allow for variant styles by interpreting indented lists as if they weren't indented? For example, currently the list below will be parsed as a list within a block quote:: paragraph * list item 1 * list item 2 But a lot of people seem to write that way, and HTML browsers make it look as if that's the way it should be. The parser could check the contents of block quotes, and if they contain only a single list, remove the block quote wrapper. There would be two problems: 1. What if we actually *do* want a list inside a block quote? 2. What if such a list comes immediately after an indented construct, such as a literal block? Both could be solved using empty comments (problem 2 already exists for a block quote after a literal block). But that's a hack. See the Doc-SIG discussion starting 2001-04-18 with Ed Loper's "Structuring: a summary; and an attempt at EBNF", item 4. - Produce a better system message when a list ends abruptly. Input:: -1 Option "1" -2 Produces:: Reporter: WARNING (2) Unindent without blank line at line 2. But it should produce:: Reporter: WARNING (2) List ends without blank line at line 2. Directives `````````` - Allow directives to be added at run-time. - Use the language module for directive attribute names? - Add more attributes to the image directive: align, border? - Implement directives: - html.imagemap - components.endnotes, .citations, .topic, .sectnum (section numbering; add support to .contents; could be cmdline option also) - misc.raw - misc.include: ``#include`` one file in another. But how to parse wrt sections, reference names, conflicts? - misc.exec: Execute Python code & insert the results. Perhaps dangerous? - misc.eval: Evaluate an expression & insert the text. At parse time or at substitution time? - block.qa: Questions & Answers. Implement as a generic two-column marked list? Or as a standalone construct? - block.columns: Multi-column table/list, with number of columns as argument. - block.verse: Paragraphs with linebreaks preserved. A directive would be easy; what about a literal-block-like prefix, perhaps ';;'? E.g.:: Take it away, Eric the orchestra leader! ;; Half a bee, Philosophically, Must ipso-facto Half not be. You see? ... - colorize.python: Colorize Python code. Fine for HTML output, but what about other formats? Revert to a literal block? Do we need some kind of "alternate" mechanism? Perhaps use a "pending" transform, which could switch its output based on the "format" in use. Use a factory function "transformFF()" which returns either "HTMLTransform()" instance or "GenericTransform" instance? - text.date: Datestamp. For substitutions. - Combined with misc.include, implement canned macros? Unimplemented Transforms ------------------------ - Footnote Gathering Collect and move footnotes to the end of a document. - Hyperlink Target Gathering It probably comes in two phases, because in a Python context we need to *resolve* them on a per-docstring basis [do we? --DG], but if the user is trying to do the callout form of presentation, they would then want to group them all at the end of the document. - Reference Merging When merging two or more subdocuments (such as docstrings), conflicting references may need to be resolved. There may be: - duplicate reference and/or substitution names that need to be made unique; and/or - duplicate footnote numbers that need to be renumbered. Should this be done before or after reference-resolving transforms are applied? What about references from within one subdocument to inside another? - Document Splitting If the processed document is written to multiple files (possibly in a directory tree), it will need to be split up. References will have to be adjusted. (HTML only?) - Navigation If a document is split up, each segment will need navigation links: parent, children (small TOC), previous (preorder), next (preorder). - Index HTML Writer ----------- - Considerations for an HTML Writer [#]_: - Boolean attributes. ```` is good, ```` is bad. Use a special value in attribute mappings, such as ``None``? - Escape double-dashes inside comments. - Put the language code into an appropriate element's LANG attribute (?). - Docutils identifiers (the "class" and "id" attributes) will conform to the regular expression ``[a-z][-a-z0-9]*``. See ``docutils.utils.id()``. .. _HTML 4.01 spec: http://www.w3.org/TR/html401 .. _CSS1 spec: http://www.w3.org/TR/REC-CSS1 .. [#] Source: `HTML 4.0 in Netscape and Explorer`__. __ http://www.webreference.com/dev/html4nsie/index.html - Allow for style sheet info to be passed in, either as a , or as embedded style info. - Construct a templating system, as in ht2html/yaptu, using directives and substitutions for dynamic stuff. - Improve the granularity of document parts in the HTML writer, so that one could just grab the parts needed. Coding Conventions ================== This project shall follow the generic coding conventions as specified in the `Style Guide for Python Code`__ and `Docstring Conventions`__ PEPs, with the following clarifications: - 4 spaces per indentation level. No tabs. - No one-liner compound statements (i.e., no ``if x: return``: use two lines & indentation), except for degenerate class or method definitions (i.e., ``class X: pass`` is O.K.). - Lines should be no more than 78 or 79 characters long. - "CamelCase" shall be used for class names. - Use "lowercase" or "lowercase_with_underscores" for function, method, and variable names. For short names, maximum two joined words, use lowercase (e.g. 'tagname'). For long names with three or more joined words, or where it's hard to parse the split between two words, use lowercase_with_underscores (e.g., 'note_explicit_target', 'explicit_target'). __ http://www.python.org/peps/pep-0008.html __ http://www.python.org/peps/pep-0257.html .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: