summaryrefslogtreecommitdiff
path: root/Doc/library/email.policy.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/email.policy.rst')
-rw-r--r--Doc/library/email.policy.rst513
1 files changed, 513 insertions, 0 deletions
diff --git a/Doc/library/email.policy.rst b/Doc/library/email.policy.rst
new file mode 100644
index 0000000000..cb2023c0ed
--- /dev/null
+++ b/Doc/library/email.policy.rst
@@ -0,0 +1,513 @@
+:mod:`email.policy`: Policy Objects
+-----------------------------------
+
+.. module:: email.policy
+ :synopsis: Controlling the parsing and generating of messages
+
+.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
+.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
+
+.. versionadded:: 3.3
+
+
+The :mod:`email` package's prime focus is the handling of email messages as
+described by the various email and MIME RFCs. However, the general format of
+email messages (a block of header fields each consisting of a name followed by
+a colon followed by a value, the whole block followed by a blank line and an
+arbitrary 'body'), is a format that has found utility outside of the realm of
+email. Some of these uses conform fairly closely to the main RFCs, some do
+not. And even when working with email, there are times when it is desirable to
+break strict compliance with the RFCs.
+
+Policy objects give the email package the flexibility to handle all these
+disparate use cases.
+
+A :class:`Policy` object encapsulates a set of attributes and methods that
+control the behavior of various components of the email package during use.
+:class:`Policy` instances can be passed to various classes and methods in the
+email package to alter the default behavior. The settable values and their
+defaults are described below.
+
+There is a default policy used by all classes in the email package. This
+policy is named :class:`Compat32`, with a corresponding pre-defined instance
+named :const:`compat32`. It provides for complete backward compatibility (in
+some cases, including bug compatibility) with the pre-Python3.3 version of the
+email package.
+
+The first part of this documentation covers the features of :class:`Policy`, an
+:term:`abstract base class` that defines the features that are common to all
+policy objects, including :const:`compat32`. This includes certain hook
+methods that are called internally by the email package, which a custom policy
+could override to obtain different behavior.
+
+When a :class:`~email.message.Message` object is created, it acquires a policy.
+By default this will be :const:`compat32`, but a different policy can be
+specified. If the ``Message`` is created by a :mod:`~email.parser`, a policy
+passed to the parser will be the policy used by the ``Message`` it creates. If
+the ``Message`` is created by the program, then the policy can be specified
+when it is created. When a ``Message`` is passed to a :mod:`~email.generator`,
+the generator uses the policy from the ``Message`` by default, but you can also
+pass a specific policy to the generator that will override the one stored on
+the ``Message`` object.
+
+:class:`Policy` instances are immutable, but they can be cloned, accepting the
+same keyword arguments as the class constructor and returning a new
+:class:`Policy` instance that is a copy of the original but with the specified
+attributes values changed.
+
+As an example, the following code could be used to read an email message from a
+file on disk and pass it to the system ``sendmail`` program on a Unix system:
+
+.. testsetup::
+
+ >>> from unittest import mock
+ >>> mocker = mock.patch('subprocess.Popen')
+ >>> m = mocker.start()
+ >>> proc = mock.MagicMock()
+ >>> m.return_value = proc
+ >>> proc.stdin.close.return_value = None
+ >>> mymsg = open('mymsg.txt', 'w')
+ >>> mymsg.write('To: abc@xyz.com\n\n')
+ 17
+ >>> mymsg.flush()
+
+.. doctest::
+
+ >>> from email import message_from_binary_file
+ >>> from email.generator import BytesGenerator
+ >>> from email import policy
+ >>> from subprocess import Popen, PIPE
+ >>> with open('mymsg.txt', 'rb') as f:
+ ... msg = message_from_binary_file(f, policy=policy.default)
+ >>> p = Popen(['sendmail', msg['To'].addresses[0]], stdin=PIPE)
+ >>> g = BytesGenerator(p.stdin, policy=msg.policy.clone(linesep='\r\n'))
+ >>> g.flatten(msg)
+ >>> p.stdin.close()
+ >>> rc = p.wait()
+
+.. testsetup::
+
+ >>> mymsg.close()
+ >>> mocker.stop()
+ >>> import os
+ >>> os.remove('mymsg.txt')
+
+Here we are telling :class:`~email.generator.BytesGenerator` to use the RFC
+correct line separator characters when creating the binary string to feed into
+``sendmail's`` ``stdin``, where the default policy would use ``\n`` line
+separators.
+
+Policy objects can also be combined using the addition operator, producing a
+policy object whose settings are a combination of the non-default values of the
+summed objects::
+
+ >>> compat_SMTP = policy.compat32.clone(linesep='\r\n')
+ >>> compat_strict = policy.compat32.clone(raise_on_defect=True)
+ >>> compat_strict_SMTP = compat_SMTP + compat_strict
+
+This operation is not commutative; that is, the order in which the objects are
+added matters. To illustrate::
+
+ >>> policy100 = policy.compat32.clone(max_line_length=100)
+ >>> policy80 = policy.compat32.clone(max_line_length=80)
+ >>> apolicy = policy100 + policy80
+ >>> apolicy.max_line_length
+ 80
+ >>> apolicy = policy80 + policy100
+ >>> apolicy.max_line_length
+ 100
+
+
+.. class:: Policy(**kw)
+
+ This is the :term:`abstract base class` for all policy classes. It provides
+ default implementations for a couple of trivial methods, as well as the
+ implementation of the immutability property, the :meth:`clone` method, and
+ the constructor semantics.
+
+ The constructor of a policy class can be passed various keyword arguments.
+ The arguments that may be specified are any non-method properties on this
+ class, plus any additional non-method properties on the concrete class. A
+ value specified in the constructor will override the default value for the
+ corresponding attribute.
+
+ This class defines the following properties, and thus values for the
+ following may be passed in the constructor of any policy class:
+
+ .. attribute:: max_line_length
+
+ The maximum length of any line in the serialized output, not counting the
+ end of line character(s). Default is 78, per :rfc:`5322`. A value of
+ ``0`` or :const:`None` indicates that no line wrapping should be
+ done at all.
+
+ .. attribute:: linesep
+
+ The string to be used to terminate lines in serialized output. The
+ default is ``\n`` because that's the internal end-of-line discipline used
+ by Python, though ``\r\n`` is required by the RFCs.
+
+ .. attribute:: cte_type
+
+ Controls the type of Content Transfer Encodings that may be or are
+ required to be used. The possible values are:
+
+ .. tabularcolumns:: |l|L|
+
+ ======== ===============================================================
+ ``7bit`` all data must be "7 bit clean" (ASCII-only). This means that
+ where necessary data will be encoded using either
+ quoted-printable or base64 encoding.
+
+ ``8bit`` data is not constrained to be 7 bit clean. Data in headers is
+ still required to be ASCII-only and so will be encoded (see
+ 'binary_fold' below for an exception), but body parts may use
+ the ``8bit`` CTE.
+ ======== ===============================================================
+
+ A ``cte_type`` value of ``8bit`` only works with ``BytesGenerator``, not
+ ``Generator``, because strings cannot contain binary data. If a
+ ``Generator`` is operating under a policy that specifies
+ ``cte_type=8bit``, it will act as if ``cte_type`` is ``7bit``.
+
+ .. attribute:: raise_on_defect
+
+ If :const:`True`, any defects encountered will be raised as errors. If
+ :const:`False` (the default), defects will be passed to the
+ :meth:`register_defect` method.
+
+ The following :class:`Policy` method is intended to be called by code using
+ the email library to create policy instances with custom settings:
+
+ .. method:: clone(**kw)
+
+ Return a new :class:`Policy` instance whose attributes have the same
+ values as the current instance, except where those attributes are
+ given new values by the keyword arguments.
+
+ The remaining :class:`Policy` methods are called by the email package code,
+ and are not intended to be called by an application using the email package.
+ A custom policy must implement all of these methods.
+
+ .. method:: handle_defect(obj, defect)
+
+ Handle a *defect* found on *obj*. When the email package calls this
+ method, *defect* will always be a subclass of
+ :class:`~email.errors.Defect`.
+
+ The default implementation checks the :attr:`raise_on_defect` flag. If
+ it is ``True``, *defect* is raised as an exception. If it is ``False``
+ (the default), *obj* and *defect* are passed to :meth:`register_defect`.
+
+ .. method:: register_defect(obj, defect)
+
+ Register a *defect* on *obj*. In the email package, *defect* will always
+ be a subclass of :class:`~email.errors.Defect`.
+
+ The default implementation calls the ``append`` method of the ``defects``
+ attribute of *obj*. When the email package calls :attr:`handle_defect`,
+ *obj* will normally have a ``defects`` attribute that has an ``append``
+ method. Custom object types used with the email package (for example,
+ custom ``Message`` objects) should also provide such an attribute,
+ otherwise defects in parsed messages will raise unexpected errors.
+
+ .. method:: header_max_count(name)
+
+ Return the maximum allowed number of headers named *name*.
+
+ Called when a header is added to a :class:`~email.message.Message`
+ object. If the returned value is not ``0`` or ``None``, and there are
+ already a number of headers with the name *name* equal to the value
+ returned, a :exc:`ValueError` is raised.
+
+ Because the default behavior of ``Message.__setitem__`` is to append the
+ value to the list of headers, it is easy to create duplicate headers
+ without realizing it. This method allows certain headers to be limited
+ in the number of instances of that header that may be added to a
+ ``Message`` programmatically. (The limit is not observed by the parser,
+ which will faithfully produce as many headers as exist in the message
+ being parsed.)
+
+ The default implementation returns ``None`` for all header names.
+
+ .. method:: header_source_parse(sourcelines)
+
+ The email package calls this method with a list of strings, each string
+ ending with the line separation characters found in the source being
+ parsed. The first line includes the field header name and separator.
+ All whitespace in the source is preserved. The method should return the
+ ``(name, value)`` tuple that is to be stored in the ``Message`` to
+ represent the parsed header.
+
+ If an implementation wishes to retain compatibility with the existing
+ email package policies, *name* should be the case preserved name (all
+ characters up to the '``:``' separator), while *value* should be the
+ unfolded value (all line separator characters removed, but whitespace
+ kept intact), stripped of leading whitespace.
+
+ *sourcelines* may contain surrogateescaped binary data.
+
+ There is no default implementation
+
+ .. method:: header_store_parse(name, value)
+
+ The email package calls this method with the name and value provided by
+ the application program when the application program is modifying a
+ ``Message`` programmatically (as opposed to a ``Message`` created by a
+ parser). The method should return the ``(name, value)`` tuple that is to
+ be stored in the ``Message`` to represent the header.
+
+ If an implementation wishes to retain compatibility with the existing
+ email package policies, the *name* and *value* should be strings or
+ string subclasses that do not change the content of the passed in
+ arguments.
+
+ There is no default implementation
+
+ .. method:: header_fetch_parse(name, value)
+
+ The email package calls this method with the *name* and *value* currently
+ stored in the ``Message`` when that header is requested by the
+ application program, and whatever the method returns is what is passed
+ back to the application as the value of the header being retrieved.
+ Note that there may be more than one header with the same name stored in
+ the ``Message``; the method is passed the specific name and value of the
+ header destined to be returned to the application.
+
+ *value* may contain surrogateescaped binary data. There should be no
+ surrogateescaped binary data in the value returned by the method.
+
+ There is no default implementation
+
+ .. method:: fold(name, value)
+
+ The email package calls this method with the *name* and *value* currently
+ stored in the ``Message`` for a given header. The method should return a
+ string that represents that header "folded" correctly (according to the
+ policy settings) by composing the *name* with the *value* and inserting
+ :attr:`linesep` characters at the appropriate places. See :rfc:`5322`
+ for a discussion of the rules for folding email headers.
+
+ *value* may contain surrogateescaped binary data. There should be no
+ surrogateescaped binary data in the string returned by the method.
+
+ .. method:: fold_binary(name, value)
+
+ The same as :meth:`fold`, except that the returned value should be a
+ bytes object rather than a string.
+
+ *value* may contain surrogateescaped binary data. These could be
+ converted back into binary data in the returned bytes object.
+
+
+.. class:: Compat32(**kw)
+
+ This concrete :class:`Policy` is the backward compatibility policy. It
+ replicates the behavior of the email package in Python 3.2. The
+ :mod:`~email.policy` module also defines an instance of this class,
+ :const:`compat32`, that is used as the default policy. Thus the default
+ behavior of the email package is to maintain compatibility with Python 3.2.
+
+ The class provides the following concrete implementations of the
+ abstract methods of :class:`Policy`:
+
+ .. method:: header_source_parse(sourcelines)
+
+ The name is parsed as everything up to the '``:``' and returned
+ unmodified. The value is determined by stripping leading whitespace off
+ the remainder of the first line, joining all subsequent lines together,
+ and stripping any trailing carriage return or linefeed characters.
+
+ .. method:: header_store_parse(name, value)
+
+ The name and value are returned unmodified.
+
+ .. method:: header_fetch_parse(name, value)
+
+ If the value contains binary data, it is converted into a
+ :class:`~email.header.Header` object using the ``unknown-8bit`` charset.
+ Otherwise it is returned unmodified.
+
+ .. method:: fold(name, value)
+
+ Headers are folded using the :class:`~email.header.Header` folding
+ algorithm, which preserves existing line breaks in the value, and wraps
+ each resulting line to the ``max_line_length``. Non-ASCII binary data are
+ CTE encoded using the ``unknown-8bit`` charset.
+
+ .. method:: fold_binary(name, value)
+
+ Headers are folded using the :class:`~email.header.Header` folding
+ algorithm, which preserves existing line breaks in the value, and wraps
+ each resulting line to the ``max_line_length``. If ``cte_type`` is
+ ``7bit``, non-ascii binary data is CTE encoded using the ``unknown-8bit``
+ charset. Otherwise the original source header is used, with its existing
+ line breaks and any (RFC invalid) binary data it may contain.
+
+
+.. note::
+
+ The documentation below describes new policies that are included in the
+ standard library on a :term:`provisional basis <provisional package>`.
+ Backwards incompatible changes (up to and including removal of the feature)
+ may occur if deemed necessary by the core developers.
+
+
+.. class:: EmailPolicy(**kw)
+
+ This concrete :class:`Policy` provides behavior that is intended to be fully
+ compliant with the current email RFCs. These include (but are not limited
+ to) :rfc:`5322`, :rfc:`2047`, and the current MIME RFCs.
+
+ This policy adds new header parsing and folding algorithms. Instead of
+ simple strings, headers are custom objects with custom attributes depending
+ on the type of the field. The parsing and folding algorithm fully implement
+ :rfc:`2047` and :rfc:`5322`.
+
+ In addition to the settable attributes listed above that apply to all
+ policies, this policy adds the following additional attributes:
+
+ .. attribute:: refold_source
+
+ If the value for a header in the ``Message`` object originated from a
+ :mod:`~email.parser` (as opposed to being set by a program), this
+ attribute indicates whether or not a generator should refold that value
+ when transforming the message back into stream form. The possible values
+ are:
+
+ ======== ===============================================================
+ ``none`` all source values use original folding
+
+ ``long`` source values that have any line that is longer than
+ ``max_line_length`` will be refolded
+
+ ``all`` all values are refolded.
+ ======== ===============================================================
+
+ The default is ``long``.
+
+ .. attribute:: header_factory
+
+ A callable that takes two arguments, ``name`` and ``value``, where
+ ``name`` is a header field name and ``value`` is an unfolded header field
+ value, and returns a string subclass that represents that header. A
+ default ``header_factory`` (see :mod:`~email.headerregistry`) is provided
+ that understands some of the :RFC:`5322` header field types. (Currently
+ address fields and date fields have special treatment, while all other
+ fields are treated as unstructured. This list will be completed before
+ the extension is marked stable.)
+
+ The class provides the following concrete implementations of the abstract
+ methods of :class:`Policy`:
+
+ .. method:: header_max_count(name)
+
+ Returns the value of the
+ :attr:`~email.headerregistry.BaseHeader.max_count` attribute of the
+ specialized class used to represent the header with the given name.
+
+ .. method:: header_source_parse(sourcelines)
+
+ The implementation of this method is the same as that for the
+ :class:`Compat32` policy.
+
+ .. method:: header_store_parse(name, value)
+
+ The name is returned unchanged. If the input value has a ``name``
+ attribute and it matches *name* ignoring case, the value is returned
+ unchanged. Otherwise the *name* and *value* are passed to
+ ``header_factory``, and the resulting custom header object is returned as
+ the value. In this case a ``ValueError`` is raised if the input value
+ contains CR or LF characters.
+
+ .. method:: header_fetch_parse(name, value)
+
+ If the value has a ``name`` attribute, it is returned to unmodified.
+ Otherwise the *name*, and the *value* with any CR or LF characters
+ removed, are passed to the ``header_factory``, and the resulting custom
+ header object is returned. Any surrogateescaped bytes get turned into
+ the unicode unknown-character glyph.
+
+ .. method:: fold(name, value)
+
+ Header folding is controlled by the :attr:`refold_source` policy setting.
+ A value is considered to be a 'source value' if and only if it does not
+ have a ``name`` attribute (having a ``name`` attribute means it is a
+ header object of some sort). If a source value needs to be refolded
+ according to the policy, it is converted into a custom header object by
+ passing the *name* and the *value* with any CR and LF characters removed
+ to the ``header_factory``. Folding of a custom header object is done by
+ calling its ``fold`` method with the current policy.
+
+ Source values are split into lines using :meth:`~str.splitlines`. If
+ the value is not to be refolded, the lines are rejoined using the
+ ``linesep`` from the policy and returned. The exception is lines
+ containing non-ascii binary data. In that case the value is refolded
+ regardless of the ``refold_source`` setting, which causes the binary data
+ to be CTE encoded using the ``unknown-8bit`` charset.
+
+ .. method:: fold_binary(name, value)
+
+ The same as :meth:`fold` if :attr:`~Policy.cte_type` is ``7bit``, except
+ that the returned value is bytes.
+
+ If :attr:`~Policy.cte_type` is ``8bit``, non-ASCII binary data is
+ converted back
+ into bytes. Headers with binary data are not refolded, regardless of the
+ ``refold_header`` setting, since there is no way to know whether the
+ binary data consists of single byte characters or multibyte characters.
+
+The following instances of :class:`EmailPolicy` provide defaults suitable for
+specific application domains. Note that in the future the behavior of these
+instances (in particular the ``HTTP`` instance) may be adjusted to conform even
+more closely to the RFCs relevant to their domains.
+
+.. data:: default
+
+ An instance of ``EmailPolicy`` with all defaults unchanged. This policy
+ uses the standard Python ``\n`` line endings rather than the RFC-correct
+ ``\r\n``.
+
+.. data:: SMTP
+
+ Suitable for serializing messages in conformance with the email RFCs.
+ Like ``default``, but with ``linesep`` set to ``\r\n``, which is RFC
+ compliant.
+
+.. data:: HTTP
+
+ Suitable for serializing headers with for use in HTTP traffic. Like
+ ``SMTP`` except that ``max_line_length`` is set to ``None`` (unlimited).
+
+.. data:: strict
+
+ Convenience instance. The same as ``default`` except that
+ ``raise_on_defect`` is set to ``True``. This allows any policy to be made
+ strict by writing::
+
+ somepolicy + policy.strict
+
+With all of these :class:`EmailPolicies <.EmailPolicy>`, the effective API of
+the email package is changed from the Python 3.2 API in the following ways:
+
+ * Setting a header on a :class:`~email.message.Message` results in that
+ header being parsed and a custom header object created.
+
+ * Fetching a header value from a :class:`~email.message.Message` results
+ in that header being parsed and a custom header object created and
+ returned.
+
+ * Any custom header object, or any header that is refolded due to the
+ policy settings, is folded using an algorithm that fully implements the
+ RFC folding algorithms, including knowing where encoded words are required
+ and allowed.
+
+From the application view, this means that any header obtained through the
+:class:`~email.message.Message` is a custom header object with custom
+attributes, whose string value is the fully decoded unicode value of the
+header. Likewise, a header may be assigned a new value, or a new header
+created, using a unicode string, and the policy will take care of converting
+the unicode string into the correct RFC encoded form.
+
+The custom header objects and their attributes are described in
+:mod:`~email.headerregistry`.