summaryrefslogtreecommitdiff
path: root/docutils/docs/api/publisher.txt
diff options
context:
space:
mode:
authormilde <milde@929543f6-e4f2-0310-98a6-ba3bd3dd1d04>2023-05-02 23:04:27 +0000
committermilde <milde@929543f6-e4f2-0310-98a6-ba3bd3dd1d04>2023-05-02 23:04:27 +0000
commit327493250f147bdb6e60a5cefafee46cdc4e8acb (patch)
tree5fff5ac653669b7cc646922c53286db97c7d10de /docutils/docs/api/publisher.txt
parent7457c3e1896cfe3807929a2e1a18531e02a05f20 (diff)
downloaddocutils-327493250f147bdb6e60a5cefafee46cdc4e8acb.tar.gz
Revert addition of `io.OutString` and the "auto_encode" argument.
We need a review of the "string output" interface and a consensus on the "clean" end-state before starting with the implementation. git-svn-id: https://svn.code.sf.net/p/docutils/code/trunk@9369 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
Diffstat (limited to 'docutils/docs/api/publisher.txt')
-rw-r--r--docutils/docs/api/publisher.txt123
1 files changed, 70 insertions, 53 deletions
diff --git a/docutils/docs/api/publisher.txt b/docutils/docs/api/publisher.txt
index 3ff5c4e6c..a31935879 100644
--- a/docutils/docs/api/publisher.txt
+++ b/docutils/docs/api/publisher.txt
@@ -16,9 +16,8 @@
The ``docutils.core.Publisher`` class is the core of Docutils,
managing all the processing and relationships between components. See
`PEP 258`_ for an overview of Docutils components.
-Configuration_ is done via `runtime settings`_ assembled from several sources.
-
-The `Publisher convenience functions`_ are the normal entry points for
+Configuration is done via `runtime settings`_ assembled from several sources.
+The *Publisher convenience functions* are the normal entry points for
using Docutils as a library.
.. _PEP 258: ../peps/pep-0258.html
@@ -42,14 +41,14 @@ a description of the function arguments.
publish_cmdline()
-----------------
-Function for command-line front-end tools, like ``rst2html.py``
-or functions for `"console_scripts" entry points`_ like `core.rst2html()`
-with file I/O.
-In addition to writing the output document to a file, also returns it as
-`str` instance (rsp. `bytes` for binary output document formats).
+Function for command-line front-end tools, like ``rst2html.py`` or
+`"console_scripts" entry points`_ like `core.rst2html()` with file I/O.
+In addition to writing the output document to a file-like object, also
+returns it as `str` instance (rsp. `bytes` for binary output document
+formats).
There are several examples in the ``tools/`` directory of the Docutils
-repository. A detailed analysis of one such tool is in `Inside A Docutils
+repository. A detailed analysis of one such tool is `Inside A Docutils
Command-Line Front-End Tool`_.
.. _"console_scripts" entry points:
@@ -60,9 +59,9 @@ Command-Line Front-End Tool`_.
publish_file()
--------------
-For programmatic use with file-like I/O.
-In addition to writing the output document to a file, also returns it as
-`str` instance (rsp. `bytes` for binary output document formats).
+For programmatic use with file I/O. In addition to writing the output
+document to a file-like object, also returns it as `str` instance
+(rsp. `bytes` for binary output document formats).
publish_string()
@@ -75,23 +74,23 @@ Input
`bytes` are decoded with input_encoding_.
Output
- is a memory object:
-
- * a `str` instance [#]_, if the "auto_encode" function argument is
- ``False`` or output_encoding_ is set to the special value
+ * is a `bytes` instance, if output_encoding_ is set to an encoding
+ registered with Python's "codecs_" module (default: "utf-8"),
+ * a `str` instance, if output_encoding_ is set to the special value
``"unicode"``.
- * a `bytes` instance, if the "auto_encode" argument is ``True`` and
- output_encoding_ is set to an encoding registered with
- Python's "codecs_" module (default: "utf-8").
+.. Caution::
+ The "output_encoding" and "output_encoding_error_handler" `runtime
+ settings`_ may affect the content of the output document:
+ Some document formats contain an *encoding declaration*,
+ some formats use substitutions for non-encodable characters.
- Calling ``output = bytes(publish_string(…))`` ensures that ``output``
- is a `bytes` instance encoded with the configured output_encoding_
- (matching the encoding indicated inside HTML, XML, and LaTeX documents).
+ Use `publish_parts()`_ to get a `str` instance of the output document
+ as well as the values of the output_encoding_ and
+ output_encoding_error_handler_ runtime settings.
-.. [#] More precisely, an instance of a `str` sub-class with the
- output_encoding_ and output_encoding_error_handler_ configuration
- settings stored as "encoding" and "errors" attributes.
+*This function is provisional* because in Python 3 the name and behaviour
+no longer match.
.. _codecs: https://docs.python.org/3/library/codecs.html
@@ -110,15 +109,16 @@ publish_from_doctree()
Render from an existing `document tree`_ data structure (doctree).
Returns the output document as a memory object (cf. `string I/O`_).
+*This function is provisional* because in Python 3 the name and behaviour
+of the *string output* interface no longer match.
+
publish_programmatically()
--------------------------
Auxilliary function used by `publish_file()`_, `publish_string()`_,
-`publish_doctree()`_, and `publish_parts()`_.
-It returns a 2-tuple: the output document as memory object (cf. `string
-I/O`_) and the Publisher object.
-
+`publish_doctree()`_, and `publish_parts()`_.
+Applications should not need to call this function directly.
.. _publish-parts-details:
@@ -126,14 +126,24 @@ publish_parts()
---------------
For programmatic use with string input (cf. `string I/O`_).
-Returns a dictionary of document parts. Dictionary keys are the names of
-parts, and values are `str` instances; encoding is up to the client.
-Useful when only portions of the processed document are desired.
+Returns a dictionary of document parts as `str` instances. [#binary-output]_
+Dictionary keys are the part names.
+Each Writer component may publish a different set of document parts,
+described below.
-There are usage examples in the `docutils/examples.py`_ module.
+Example: post-process the output document with a custom function
+``post_process()`` before encoding with user-customizable encoding
+and errors ::
-Each Writer component may publish a different set of document parts,
-described below. Not all writers implement all parts.
+ def publish_bytes_with_postprocessing(*args, **kwargs):
+ parts = publish_parts(*args, **kwargs)
+ out_str = post_process(parts['whole'])
+ return out_str.encode(parts['encoding'], parts['errors'])
+
+There are more usage examples in the `docutils/examples.py`_ module.
+
+.. _docutils/examples.py: ../../docutils/examples.py
+.. _ODT: ../user/odt.html
Parts Provided By All Writers
@@ -141,7 +151,7 @@ Parts Provided By All Writers
_`encoding`
The `output_encoding`_ setting.
-
+
_`errors`
The `output_encoding_error_handler`_ setting.
@@ -149,7 +159,10 @@ _`version`
The version of Docutils used.
_`whole`
- ``parts['whole']`` contains the entire formatted document.
+ Contains the entire formatted document. [#binary-output]_
+
+ .. [#binary-output] Output documents in binary formats (e.g. ODT_)
+ are stored as a `bytes` instance.
Parts Provided By the HTML Writers
@@ -233,8 +246,8 @@ _`html_body`
_`html_head`
``parts['html_head']`` contains the HTML ``<head>`` content, less
the stylesheet link and the ``<head>`` and ``</head>`` tags
- themselves. Since ``publish_parts`` returns `str` instances and
- does not know about the output encoding, the "Content-Type" meta
+ themselves. Since `publish_parts()` returns `str` instances which
+ do not know about the output encoding, the "Content-Type" meta
tag's "charset" value is left unresolved, as "%s"::
<meta http-equiv="Content-Type" content="text/html; charset=%s" />
@@ -388,39 +401,42 @@ titledata
https://docutils.sourceforge.io/docutils/writers/latex2e/xelatex.tex
-.. _docutils/examples.py: ../../docutils/examples.py
-
+.. _runtime settings:
Configuration
=============
-Docutils is configured by runtime settings assembled from several
+Docutils is configured by *runtime settings* assembled from several
sources:
* *settings specifications* of the selected components (reader, parser,
writer),
-* *configuration files* (if enabled), and
+* the ``settings_overrides`` argument of the `Publisher convenience
+ functions`_ (see below),
+* *configuration files* (unless disabled), and
* *command-line options* (if enabled).
-The individual settings are described in `Docutils Configuration`_.
-
Docutils overlays default and explicitly specified values from these
sources such that settings behave the way we want and expect them to
behave. For details, see `Docutils Runtime Settings`_.
+The individual settings are described in `Docutils Configuration`_.
To pass application-specific setting defaults to the Publisher
convenience functions, use the ``settings_overrides`` parameter. Pass
a dictionary of setting names & values, like this::
- overrides = {'input_encoding': 'ascii',
- 'output_encoding': 'latin-1'}
- output = publish_string(..., settings_overrides=overrides)
+ app_defaults = {'input_encoding': 'ascii',
+ 'output_encoding': 'latin-1'}
+ output = publish_string(..., settings_overrides=app_defaults)
Settings from command-line options override configuration file
settings, and they override application defaults.
-Further customization is possible creating custom component
-objects and passing *them* to ``publish_*()`` or the ``Publisher``.
+See `Docutils Runtime Settings`_ or the docstring of
+`publish_programmatically()` for a description of all `configuration
+arguments`_ of the Publisher convenience functions.
+
+.. _configuration arguments: runtime-settings.html#convenience-functions
Encodings
@@ -453,7 +469,9 @@ The default **output encoding** is UTF-8.
A different encoding can be specified with the `output_encoding`_ setting.
.. Caution:: Docutils may introduce non-ASCII text if you use
- `auto-symbol footnotes`_ or the `"contents" directive`_.
+ `auto-symbol footnotes`_ or the `"contents" directive`_.
+ In non-English documents, also auto-generated labels
+ may contain non-ASCII characters.
.. [#magic-comment] A comment like ::
@@ -496,9 +514,8 @@ A different encoding can be specified with the `output_encoding`_ setting.
../ref/rst/restructuredtext.html#auto-symbol-footnotes
.. _"contents" directive:
../ref/rst/directives.html#table-of-contents
-.. _document tree:
+.. _document tree:
.. _Docutils document tree: ../ref/doctree.html
-.. _runtime settings:
.. _Docutils Runtime Settings: ./runtime-settings.html
.. _Docutils Configuration: ../user/config.html
.. _inspecting_codecs: https://codeberg.org/milde/inspecting-codecs