diff options
| author | milde <milde@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2023-05-02 23:04:27 +0000 |
|---|---|---|
| committer | milde <milde@929543f6-e4f2-0310-98a6-ba3bd3dd1d04> | 2023-05-02 23:04:27 +0000 |
| commit | 327493250f147bdb6e60a5cefafee46cdc4e8acb (patch) | |
| tree | 5fff5ac653669b7cc646922c53286db97c7d10de /docutils/docs/api/publisher.txt | |
| parent | 7457c3e1896cfe3807929a2e1a18531e02a05f20 (diff) | |
| download | docutils-327493250f147bdb6e60a5cefafee46cdc4e8acb.tar.gz | |
Revert addition of `io.OutString` and the "auto_encode" argument.
We need a review of the "string output" interface and a consensus
on the "clean" end-state before starting with the implementation.
git-svn-id: https://svn.code.sf.net/p/docutils/code/trunk@9369 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
Diffstat (limited to 'docutils/docs/api/publisher.txt')
| -rw-r--r-- | docutils/docs/api/publisher.txt | 123 |
1 files changed, 70 insertions, 53 deletions
diff --git a/docutils/docs/api/publisher.txt b/docutils/docs/api/publisher.txt index 3ff5c4e6c..a31935879 100644 --- a/docutils/docs/api/publisher.txt +++ b/docutils/docs/api/publisher.txt @@ -16,9 +16,8 @@ The ``docutils.core.Publisher`` class is the core of Docutils, managing all the processing and relationships between components. See `PEP 258`_ for an overview of Docutils components. -Configuration_ is done via `runtime settings`_ assembled from several sources. - -The `Publisher convenience functions`_ are the normal entry points for +Configuration is done via `runtime settings`_ assembled from several sources. +The *Publisher convenience functions* are the normal entry points for using Docutils as a library. .. _PEP 258: ../peps/pep-0258.html @@ -42,14 +41,14 @@ a description of the function arguments. publish_cmdline() ----------------- -Function for command-line front-end tools, like ``rst2html.py`` -or functions for `"console_scripts" entry points`_ like `core.rst2html()` -with file I/O. -In addition to writing the output document to a file, also returns it as -`str` instance (rsp. `bytes` for binary output document formats). +Function for command-line front-end tools, like ``rst2html.py`` or +`"console_scripts" entry points`_ like `core.rst2html()` with file I/O. +In addition to writing the output document to a file-like object, also +returns it as `str` instance (rsp. `bytes` for binary output document +formats). There are several examples in the ``tools/`` directory of the Docutils -repository. A detailed analysis of one such tool is in `Inside A Docutils +repository. A detailed analysis of one such tool is `Inside A Docutils Command-Line Front-End Tool`_. .. _"console_scripts" entry points: @@ -60,9 +59,9 @@ Command-Line Front-End Tool`_. publish_file() -------------- -For programmatic use with file-like I/O. -In addition to writing the output document to a file, also returns it as -`str` instance (rsp. `bytes` for binary output document formats). +For programmatic use with file I/O. In addition to writing the output +document to a file-like object, also returns it as `str` instance +(rsp. `bytes` for binary output document formats). publish_string() @@ -75,23 +74,23 @@ Input `bytes` are decoded with input_encoding_. Output - is a memory object: - - * a `str` instance [#]_, if the "auto_encode" function argument is - ``False`` or output_encoding_ is set to the special value + * is a `bytes` instance, if output_encoding_ is set to an encoding + registered with Python's "codecs_" module (default: "utf-8"), + * a `str` instance, if output_encoding_ is set to the special value ``"unicode"``. - * a `bytes` instance, if the "auto_encode" argument is ``True`` and - output_encoding_ is set to an encoding registered with - Python's "codecs_" module (default: "utf-8"). +.. Caution:: + The "output_encoding" and "output_encoding_error_handler" `runtime + settings`_ may affect the content of the output document: + Some document formats contain an *encoding declaration*, + some formats use substitutions for non-encodable characters. - Calling ``output = bytes(publish_string(…))`` ensures that ``output`` - is a `bytes` instance encoded with the configured output_encoding_ - (matching the encoding indicated inside HTML, XML, and LaTeX documents). + Use `publish_parts()`_ to get a `str` instance of the output document + as well as the values of the output_encoding_ and + output_encoding_error_handler_ runtime settings. -.. [#] More precisely, an instance of a `str` sub-class with the - output_encoding_ and output_encoding_error_handler_ configuration - settings stored as "encoding" and "errors" attributes. +*This function is provisional* because in Python 3 the name and behaviour +no longer match. .. _codecs: https://docs.python.org/3/library/codecs.html @@ -110,15 +109,16 @@ publish_from_doctree() Render from an existing `document tree`_ data structure (doctree). Returns the output document as a memory object (cf. `string I/O`_). +*This function is provisional* because in Python 3 the name and behaviour +of the *string output* interface no longer match. + publish_programmatically() -------------------------- Auxilliary function used by `publish_file()`_, `publish_string()`_, -`publish_doctree()`_, and `publish_parts()`_. -It returns a 2-tuple: the output document as memory object (cf. `string -I/O`_) and the Publisher object. - +`publish_doctree()`_, and `publish_parts()`_. +Applications should not need to call this function directly. .. _publish-parts-details: @@ -126,14 +126,24 @@ publish_parts() --------------- For programmatic use with string input (cf. `string I/O`_). -Returns a dictionary of document parts. Dictionary keys are the names of -parts, and values are `str` instances; encoding is up to the client. -Useful when only portions of the processed document are desired. +Returns a dictionary of document parts as `str` instances. [#binary-output]_ +Dictionary keys are the part names. +Each Writer component may publish a different set of document parts, +described below. -There are usage examples in the `docutils/examples.py`_ module. +Example: post-process the output document with a custom function +``post_process()`` before encoding with user-customizable encoding +and errors :: -Each Writer component may publish a different set of document parts, -described below. Not all writers implement all parts. + def publish_bytes_with_postprocessing(*args, **kwargs): + parts = publish_parts(*args, **kwargs) + out_str = post_process(parts['whole']) + return out_str.encode(parts['encoding'], parts['errors']) + +There are more usage examples in the `docutils/examples.py`_ module. + +.. _docutils/examples.py: ../../docutils/examples.py +.. _ODT: ../user/odt.html Parts Provided By All Writers @@ -141,7 +151,7 @@ Parts Provided By All Writers _`encoding` The `output_encoding`_ setting. - + _`errors` The `output_encoding_error_handler`_ setting. @@ -149,7 +159,10 @@ _`version` The version of Docutils used. _`whole` - ``parts['whole']`` contains the entire formatted document. + Contains the entire formatted document. [#binary-output]_ + + .. [#binary-output] Output documents in binary formats (e.g. ODT_) + are stored as a `bytes` instance. Parts Provided By the HTML Writers @@ -233,8 +246,8 @@ _`html_body` _`html_head` ``parts['html_head']`` contains the HTML ``<head>`` content, less the stylesheet link and the ``<head>`` and ``</head>`` tags - themselves. Since ``publish_parts`` returns `str` instances and - does not know about the output encoding, the "Content-Type" meta + themselves. Since `publish_parts()` returns `str` instances which + do not know about the output encoding, the "Content-Type" meta tag's "charset" value is left unresolved, as "%s":: <meta http-equiv="Content-Type" content="text/html; charset=%s" /> @@ -388,39 +401,42 @@ titledata https://docutils.sourceforge.io/docutils/writers/latex2e/xelatex.tex -.. _docutils/examples.py: ../../docutils/examples.py - +.. _runtime settings: Configuration ============= -Docutils is configured by runtime settings assembled from several +Docutils is configured by *runtime settings* assembled from several sources: * *settings specifications* of the selected components (reader, parser, writer), -* *configuration files* (if enabled), and +* the ``settings_overrides`` argument of the `Publisher convenience + functions`_ (see below), +* *configuration files* (unless disabled), and * *command-line options* (if enabled). -The individual settings are described in `Docutils Configuration`_. - Docutils overlays default and explicitly specified values from these sources such that settings behave the way we want and expect them to behave. For details, see `Docutils Runtime Settings`_. +The individual settings are described in `Docutils Configuration`_. To pass application-specific setting defaults to the Publisher convenience functions, use the ``settings_overrides`` parameter. Pass a dictionary of setting names & values, like this:: - overrides = {'input_encoding': 'ascii', - 'output_encoding': 'latin-1'} - output = publish_string(..., settings_overrides=overrides) + app_defaults = {'input_encoding': 'ascii', + 'output_encoding': 'latin-1'} + output = publish_string(..., settings_overrides=app_defaults) Settings from command-line options override configuration file settings, and they override application defaults. -Further customization is possible creating custom component -objects and passing *them* to ``publish_*()`` or the ``Publisher``. +See `Docutils Runtime Settings`_ or the docstring of +`publish_programmatically()` for a description of all `configuration +arguments`_ of the Publisher convenience functions. + +.. _configuration arguments: runtime-settings.html#convenience-functions Encodings @@ -453,7 +469,9 @@ The default **output encoding** is UTF-8. A different encoding can be specified with the `output_encoding`_ setting. .. Caution:: Docutils may introduce non-ASCII text if you use - `auto-symbol footnotes`_ or the `"contents" directive`_. + `auto-symbol footnotes`_ or the `"contents" directive`_. + In non-English documents, also auto-generated labels + may contain non-ASCII characters. .. [#magic-comment] A comment like :: @@ -496,9 +514,8 @@ A different encoding can be specified with the `output_encoding`_ setting. ../ref/rst/restructuredtext.html#auto-symbol-footnotes .. _"contents" directive: ../ref/rst/directives.html#table-of-contents -.. _document tree: +.. _document tree: .. _Docutils document tree: ../ref/doctree.html -.. _runtime settings: .. _Docutils Runtime Settings: ./runtime-settings.html .. _Docutils Configuration: ../user/config.html .. _inspecting_codecs: https://codeberg.org/milde/inspecting-codecs |
