summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authormilde <milde@929543f6-e4f2-0310-98a6-ba3bd3dd1d04>2023-04-08 21:09:08 +0000
committermilde <milde@929543f6-e4f2-0310-98a6-ba3bd3dd1d04>2023-04-08 21:09:08 +0000
commit814f72615050102f1c3b24de16ae21bc653371ac (patch)
tree2e7ca6243b7930dd0fa9dad5059412a4f5178141
parent3514794f41613e1624affaa99fe1898e3d149f6c (diff)
downloaddocutils-814f72615050102f1c3b24de16ae21bc653371ac.tar.gz
Update "Publisher" documentation.
git-svn-id: https://svn.code.sf.net/p/docutils/code/trunk@9340 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
-rw-r--r--docutils/RELEASE-NOTES.txt6
-rw-r--r--docutils/docs/api/publisher.txt280
-rw-r--r--docutils/docs/index.txt9
3 files changed, 170 insertions, 125 deletions
diff --git a/docutils/RELEASE-NOTES.txt b/docutils/RELEASE-NOTES.txt
index 148a24b04..d084965bf 100644
--- a/docutils/RELEASE-NOTES.txt
+++ b/docutils/RELEASE-NOTES.txt
@@ -100,7 +100,7 @@ Drop support for Python 3.7 and 3.8 in Docutils 0.21.
- Change the default input encoding from ``None`` (auto-detect) to
"utf-8" in Docutils 0.22.
- - Remove the input encoding auto-detection code in Docutils 1.0.
+ - Remove the input encoding auto-detection code in Docutils 1.0 or later.
* "html5" writer:
@@ -194,8 +194,8 @@ Release 0.20 (unpublished)
* The new function argument `auto_encode` for `core.publish_string()` and
`core.publish_programmatically()` selects whether the output document is
- encoded and returned as `bytes` instance. The default will change to
- ``False`` in Docutils 0.22.
+ encoded and returned as `bytes` instance. The default is ``True`` (for
+ backwards compatibility) and will change to ``False`` in Docutils 0.22.
* Bugfixes and improvements (see HISTORY_).
diff --git a/docutils/docs/api/publisher.txt b/docutils/docs/api/publisher.txt
index 312aed739..c4dd44d73 100644
--- a/docutils/docs/api/publisher.txt
+++ b/docutils/docs/api/publisher.txt
@@ -17,7 +17,7 @@ The ``docutils.core.Publisher`` class is the core of Docutils,
managing all the processing and relationships between components. See
`PEP 258`_ for an overview of Docutils components.
-The ``docutils.core.publish_*`` convenience functions are the normal
+The ``docutils.core.publish_*()`` convenience functions are the normal
entry points for using Docutils as a library.
See `Inside A Docutils Command-Line Front-End Tool`_ for an overview
@@ -31,157 +31,102 @@ class is used.
Publisher Convenience Functions
===============================
-Each of these functions set up a ``docutils.core.Publisher`` object,
-then call its ``publish`` method. ``docutils.core.Publisher.publish``
+Each of these functions sets up a `docutils.core.Publisher` object,
+then calls its ``publish()`` method. ``docutils.core.Publisher.publish()``
handles everything else. There are several convenience functions in
the ``docutils.core`` module:
-:_`publish_cmdline()`: for command-line front-end tools, like
- ``rst2html.py``. There are several examples in the ``tools/``
- directory. A detailed analysis of one such tool is in `Inside A
- Docutils Command-Line Front-End Tool`_
-:_`publish_file()`: for programmatic use with file-like I/O. In
- addition to writing the encoded output to a file, also returns the
- encoded output as a `bytes` instance.
+publish_cmdline()
+-----------------
-:_`publish_string()`: for programmatic use with `string I/O`_. Returns
- the encoded output as a string [#string-output]_.
-
-:_`publish_parts()`: for programmatic use with string input [#string-input]_;
- returns a dictionary of document parts. Dictionary keys are the names of
- parts, and values are `str` instances; encoding is up to the client.
- Useful when only portions of the processed document are desired.
- See `publish_parts() Details`_ below.
-
- There are usage examples in the `docutils/examples.py`_ module.
-
-:_`publish_doctree()`: for programmatic use with string input [#string-input]_;
- returns a Docutils document tree data structure (doctree).
- The doctree can be modified, pickled & unpickled, etc., and then
- reprocessed with `publish_from_doctree()`_.
-
-:_`publish_from_doctree()`: for programmatic use to render from an
- existing document tree data structure (doctree); returns the encoded
- output as a string [#string-output]_.
-
-:_`publish_programmatically()`: for custom programmatic use. This
- function implements common code and is used by ``publish_file``,
- ``publish_string``, and ``publish_parts``. It returns a 2-tuple:
- the encoded string output [#string-output]_ and the Publisher object.
+Function for command-line front-end tools, like ``rst2html.py``. There are
+several examples in the ``tools/`` directory. A detailed analysis of one
+such tool is in `Inside A Docutils Command-Line Front-End Tool`_.
.. _Inside A Docutils Command-Line Front-End Tool: ../howto/cmdline-tool.html
-.. _docutils/examples.py: ../../docutils/examples.py
-.. _String I/O:
-.. [#string-input] Input can be a `str` or `bytes` instance.
- `bytes` are decoded with input_encoding_.
-.. [#string-output] Output is a `bytes` instance unless
- output_encoding_ is set to the special value ``"unicode"``.
+publish_file()
+--------------
+For programmatic use with file-like I/O.
+In addition to writing the output document to a file, also returns it as
+a `bytes` instance.
-Configuration
--------------
-To pass application-specific setting defaults to the Publisher
-convenience functions, use the ``settings_overrides`` parameter. Pass
-a dictionary of setting names & values, like this::
+publish_string()
+----------------
- overrides = {'input_encoding': 'ascii',
- 'output_encoding': 'latin-1'}
- output = publish_string(..., settings_overrides=overrides)
+For programmatic use with _`string I/O`:
-Settings from command-line options override configuration file
-settings, and they override application defaults. For details, see
-`Docutils Runtime Settings`_. See `Docutils Configuration`_ for
-details about individual settings.
-
-.. _Docutils Runtime Settings: ./runtime-settings.html
-.. _Docutils Configuration: ../user/config.html
-
-
-Encodings
----------
-
-The default **input encoding** is UTF-8 (codec 'utf-8-sig').
-A different encoding can be specified with the `input_encoding`_ setting
-or an `explicit encoding declaration`_ (BOM or special comment).
-If the encoding is unspecified and decoding with UTF-8 fails,
-the `preferred encoding`_ is used as a fallback
-(if it maps to a valid codec and differs from UTF-8).
+Input
+ can be a `str` or `bytes` instance.
+ `bytes` are decoded with input_encoding_.
-The default behaviour differs from Python's `open()`:
+Output
+ is a memory object:
-- The UTF-8 encoding is tried before the `preferred encoding`_.
- (This is almost sure to fail if the actual source encoding differs.)
-- An `explicit encoding declaration`_ in the source takes precedence
- over the `preferred encoding`_.
-- An optional BOM_ is removed from UTF-8 encoded sources.
+ * a `str` instance [#]_, if the "encode_output" function argument is
+ ``False`` or output_encoding_ is set to the special value
+ ``"unicode"``.
-The default **output encoding** of Docutils is UTF-8.
-A different encoding can be specified with the `output_encoding`_ setting.
-Docutils may introduce some non-ASCII text if you use
-`auto-symbol footnotes`_ or the `"contents" directive`_.
+ * a `bytes` instance, if the "encode_output" argument is ``True`` and
+ output_encoding_ is set to an encoding registerd with
+ Python's "codecs_" module (default: "utf-8").
-Explicit encoding declaration
-`````````````````````````````
+ Calling ``output = bytes(publish_string(…))`` ensures that
+ ``output`` is a `bytes` instance encoded with output_encoding_.
-A `Unicode byte order mark` (BOM_) in the source is interpreted as
-encoding declaration.
+.. [#] Actually an instance of a `str` sub-class with the
+ output_encoding_ and output_encoding_error_handler_ configuration
+ settings stored as "encoding" and "errors" attributes.
-The encoding of a reStructuredText source file can also be given by a
-"magic comment" similar to :PEP:`263`.
-This makes the input encoding both *visible* and *changeable*
-on a per-source file basis.
+.. _codecs: https://docs.python.org/3/library/codecs.html
-To declare the input encoding, a comment like ::
- .. text encoding: <encoding name>
+publish_doctree()
+-----------------
-must be placed into the source file either as first or second line.
+Parse string input (cf. `string I/O`_) into a `Docutils document tree`_ data
+structure (doctree). The doctree can be modified, pickled & unpickled,
+etc., and then reprocessed with `publish_from_doctree()`_.
-Examples: (using formats recognized by popular editors) ::
+.. _Docutils document tree: ../ref/doctree.html
- .. -*- mode: rst -*-
- -*- coding: latin1 -*-
-or::
+publish_from_doctree()
+----------------------
- .. vim: set fileencoding=cp737 :
+Render from an existing document tree data structure (doctree).
+Returns the output document as a memory object (cf. `string I/O`_).
-More precisely, the first and second line are searched for the following
-regular expression::
- coding[:=]\s*([-\w.]+)
+publish_programmatically()
+--------------------------
-The first group of this expression is then interpreted as encoding name.
-If the first line matches the second line is ignored.
+This function implements common code and is used by `publish_file()`_,
+`publish_string()`_, and `publish_parts()`_.
+It returns a 2-tuple: the output document as memory object (cf. `string
+I/O`_) and the Publisher object.
-.. _input_encoding: ../user/config.html#input-encoding
-.. _preferred encoding:
- https://docs.python.org/3/library/locale.html#locale.getpreferredencoding
-.. _BOM: https://docs.python.org/3/library/codecs.html#codecs.BOM
-.. _output_encoding: ../user/config.html#output-encoding
-.. _auto-symbol footnotes:
- ../ref/rst/restructuredtext.html#auto-symbol-footnotes
-.. _"contents" directive:
- ../ref/rst/directives.html#table-of-contents
+publish_parts()
+---------------
-``publish_parts()`` Details
-===========================
+For programmatic use with string input (cf. `string I/O`_).
+Returns a dictionary of document parts. Dictionary keys are the names of
+parts, and values are `str` instances; encoding is up to the client.
+Useful when only portions of the processed document are desired.
-The ``docutils.core.publish_parts()`` convenience function returns a
-dictionary of document parts. Dictionary keys are the names of parts,
-and values are `str` instances.
+There are usage examples in the `docutils/examples.py`_ module.
Each Writer component may publish a different set of document parts,
described below. Not all writers implement all parts.
Parts Provided By All Writers
------------------------------
+`````````````````````````````
_`encoding`
The output encoding setting.
@@ -194,10 +139,10 @@ _`whole`
Parts Provided By the HTML Writers
-----------------------------------
+``````````````````````````````````
HTML4 Writer
-````````````
+^^^^^^^^^^^^
_`body`
``parts['body']`` is equivalent to parts['fragment_']. It is
@@ -319,7 +264,7 @@ _`title`
PEP/HTML Writer
-```````````````
+^^^^^^^^^^^^^^^
The PEP/HTML writer provides the same parts as the `HTML4 writer`_,
plus the following:
@@ -332,21 +277,21 @@ _`pepnum`
S5/HTML Writer
-``````````````
+^^^^^^^^^^^^^^
The S5/HTML writer provides the same parts as the `HTML4 writer`_.
HTML5 Writer
-````````````
+^^^^^^^^^^^^
The HTML5 writer provides the same parts as the `HTML4 writer`_.
However, it uses semantic HTML5 elements for the document, header and
footer.
-Parts Provided by the LaTeX2e Writer
-------------------------------------
+Parts Provided by the "LaTeX2e" and "XeTeX" Writers
+```````````````````````````````````````````````````
See the template files default.tex_, titlepage.tex_, titlingpage.tex_,
and xelatex.tex_ for examples how these parts can be combined
@@ -427,3 +372,100 @@ titledata
https://docutils.sourceforge.io/docutils/writers/latex2e/titlingpage.tex
.. _xelatex.tex:
https://docutils.sourceforge.io/docutils/writers/latex2e/xelatex.tex
+
+
+.. _docutils/examples.py: ../../docutils/examples.py
+
+
+Configuration
+=============
+
+To pass application-specific setting defaults to the Publisher
+convenience functions, use the ``settings_overrides`` parameter. Pass
+a dictionary of setting names & values, like this::
+
+ overrides = {'input_encoding': 'ascii',
+ 'output_encoding': 'latin-1'}
+ output = publish_string(..., settings_overrides=overrides)
+
+Settings from command-line options override configuration file
+settings, and they override application defaults. For details, see
+`Docutils Runtime Settings`_. See `Docutils Configuration`_ for
+details about individual settings.
+
+.. _Docutils Runtime Settings: ./runtime-settings.html
+.. _Docutils Configuration: ../user/config.html
+
+
+Encodings
+=========
+
+.. important:: Details will change over the next Docutils versions.
+ See RELEASE-NOTES_
+
+The default **input encoding** is UTF-8. A different encoding can be
+specified with the `input_encoding`_ setting.
+
+The encoding of a reStructuredText source can also be given by a
+`Unicode byte order mark` (BOM_) or a "magic comment" [#magic-comment]_
+similar to :PEP:`263`. This makes the input encoding both *visible* and
+*changeable* on a per-source basis.
+
+If the encoding is unspecified and decoding with UTF-8 fails, the locale's
+`preferred encoding`_ is used as a fallback (if it maps to a valid codec
+and differs from UTF-8).
+
+The default behaviour differs from Python's `open()`:
+
+- The UTF-8 encoding is tried before the `preferred encoding`_.
+ (This is almost sure to fail if the actual source encoding differs.)
+- An `explicit encoding declaration` [#magic-comment]_ in the source
+ takes precedence over the `preferred encoding`_.
+- An optional BOM_ is removed from UTF-8 encoded sources.
+
+The default **output encoding** is UTF-8.
+A different encoding can be specified with the `output_encoding`_ setting.
+
+.. Caution:: Docutils may introduce non-ASCII text if you use
+ `auto-symbol footnotes`_ or the `"contents" directive`_.
+
+.. [#magic-comment] A comment like ::
+
+ .. text encoding: <encoding name>
+
+ on the first or second line of a reStructuredText source
+ defines `<encoding name>` as the source's input encoding.
+
+ Examples: (using formats recognized by popular editors) ::
+
+ .. -*- mode: rst -*-
+ -*- coding: latin1 -*-
+
+ or::
+
+ .. vim: set fileencoding=cp737 :
+
+ More precisely, the first and second line are searched for the following
+ regular expression::
+
+ coding[:=]\s*([-\w.]+)
+
+ The first group of this expression is then interpreted as encoding name.
+ If the first line matches the second line is ignored.
+
+ This feature is scheduled to be removed in Docutils 1.0.
+ See the `inspecting_codecs`_ package for a possible replacement.
+
+.. _RELEASE-NOTES: ../../RELEASE-NOTES.html#future-changes
+.. _input_encoding: ../user/config.html#input-encoding
+.. _preferred encoding:
+ https://docs.python.org/3/library/locale.html#locale.getpreferredencoding
+.. _BOM: https://docs.python.org/3/library/codecs.html#codecs.BOM
+.. _output_encoding: ../user/config.html#output-encoding
+.. _output_encoding_error_handler:
+ ../user/config.html#output-encoding-error-handler
+.. _auto-symbol footnotes:
+ ../ref/rst/restructuredtext.html#auto-symbol-footnotes
+.. _"contents" directive:
+ ../ref/rst/directives.html#table-of-contents
+.. _inspecting_codecs: https://codeberg.org/milde/inspecting-codecs
diff --git a/docutils/docs/index.txt b/docutils/docs/index.txt
index d94b5115b..c17543024 100644
--- a/docutils/docs/index.txt
+++ b/docutils/docs/index.txt
@@ -185,9 +185,12 @@ Prehistoric:
API Reference Material for Client-Developers
============================================
-* `The Docutils Publisher <api/publisher.html>`__
-* `Docutils Runtime Settings <api/runtime-settings.html>`__
-* `Docutils Transforms <api/transforms.html>`__
+`The Docutils Publisher <api/publisher.html>`__
+ entry points for using Docutils as a library
+`Docutils Runtime Settings <api/runtime-settings.html>`__
+ configuration framework details
+`Docutils Transforms <api/transforms.html>`__
+ change the document tree in-place (resolve references, …)
The `Docutils Design Specification`_ (PEP 258) is a must-read for any
Docutils developer.