summaryrefslogtreecommitdiff
path: root/Doc/library/email.headerregistry.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/email.headerregistry.rst')
-rw-r--r--Doc/library/email.headerregistry.rst453
1 files changed, 453 insertions, 0 deletions
diff --git a/Doc/library/email.headerregistry.rst b/Doc/library/email.headerregistry.rst
new file mode 100644
index 0000000000..db3aade416
--- /dev/null
+++ b/Doc/library/email.headerregistry.rst
@@ -0,0 +1,453 @@
+:mod:`email.headerregistry`: Custom Header Objects
+--------------------------------------------------
+
+.. module:: email.headerregistry
+ :synopsis: Automatic Parsing of headers based on the field name
+
+.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
+.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
+
+
+.. note::
+
+ The headerregistry module has been included in the standard library on a
+ :term:`provisional basis <provisional package>`. Backwards incompatible
+ changes (up to and including removal of the module) may occur if deemed
+ necessary by the core developers.
+
+.. versionadded:: 3.3
+ as a :term:`provisional module <provisional package>`.
+
+Headers are represented by customized subclasses of :class:`str`. The
+particular class used to represent a given header is determined by the
+:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in
+effect when the headers are created. This section documents the particular
+``header_factory`` implemented by the email package for handling :RFC:`5322`
+compliant email messages, which not only provides customized header objects for
+various header types, but also provides an extension mechanism for applications
+to add their own custom header types.
+
+When using any of the policy objects derived from
+:data:`~email.policy.EmailPolicy`, all headers are produced by
+:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base
+class. Each header class has an additional base class that is determined by
+the type of the header. For example, many headers have the class
+:class:`.UnstructuredHeader` as their other base class. The specialized second
+class for a header is determined by the name of the header, using a lookup
+table stored in the :class:`.HeaderRegistry`. All of this is managed
+transparently for the typical application program, but interfaces are provided
+for modifying the default behavior for use by more complex applications.
+
+The sections below first document the header base classes and their attributes,
+followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and
+finally the support classes used to represent the data parsed from structured
+headers.
+
+
+.. class:: BaseHeader(name, value)
+
+ *name* and *value* are passed to ``BaseHeader`` from the
+ :attr:`~email.policy.EmailPolicy.header_factory` call. The string value of
+ any header object is the *value* fully decoded to unicode.
+
+ This base class defines the following read-only properties:
+
+
+ .. attribute:: name
+
+ The name of the header (the portion of the field before the ':'). This
+ is exactly the value passed in the
+ :attr:`~email.policy.EmailPolicy.header_factory` call for *name*; that
+ is, case is preserved.
+
+
+ .. attribute:: defects
+
+ A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any
+ RFC compliance problems found during parsing. The email package tries to
+ be complete about detecting compliance issues. See the :mod:`~email.errors`
+ module for a discussion of the types of defects that may be reported.
+
+
+ .. attribute:: max_count
+
+ The maximum number of headers of this type that can have the same
+ ``name``. A value of ``None`` means unlimited. The ``BaseHeader`` value
+ for this attribute is ``None``; it is expected that specialized header
+ classes will override this value as needed.
+
+ ``BaseHeader`` also provides the following method, which is called by the
+ email library code and should not in general be called by application
+ programs:
+
+ .. method:: fold(*, policy)
+
+ Return a string containing :attr:`~email.policy.Policy.linesep`
+ characters as required to correctly fold the header according
+ to *policy*. A :attr:`~email.policy.Policy.cte_type` of
+ ``8bit`` will be treated as if it were ``7bit``, since strings
+ may not contain binary data.
+
+
+ ``BaseHeader`` by itself cannot be used to create a header object. It
+ defines a protocol that each specialized header cooperates with in order to
+ produce the header object. Specifically, ``BaseHeader`` requires that
+ the specialized class provide a :func:`classmethod` named ``parse``. This
+ method is called as follows::
+
+ parse(string, kwds)
+
+ ``kwds`` is a dictionary containing one pre-initialized key, ``defects``.
+ ``defects`` is an empty list. The parse method should append any detected
+ defects to this list. On return, the ``kwds`` dictionary *must* contain
+ values for at least the keys ``decoded`` and ``defects``. ``decoded``
+ should be the string value for the header (that is, the header value fully
+ decoded to unicode). The parse method should assume that *string* may
+ contain transport encoded parts, but should correctly handle all valid
+ unicode characters as well so that it can parse un-encoded header values.
+
+ ``BaseHeader``'s ``__new__`` then creates the header instance, and calls its
+ ``init`` method. The specialized class only needs to provide an ``init``
+ method if it wishes to set additional attributes beyond those provided by
+ ``BaseHeader`` itself. Such an ``init`` method should look like this::
+
+ def init(self, *args, **kw):
+ self._myattr = kw.pop('myattr')
+ super().init(*args, **kw)
+
+ That is, anything extra that the specialized class puts in to the ``kwds``
+ dictionary should be removed and handled, and the remaining contents of
+ ``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method.
+
+
+.. class:: UnstructuredHeader
+
+ An "unstructured" header is the default type of header in :rfc:`5322`.
+ Any header that does not have a specified syntax is treated as
+ unstructured. The classic example of an unstructured header is the
+ :mailheader:`Subject` header.
+
+ In :rfc:`5322`, an unstructured header is a run of arbitrary text in the
+ ASCII character set. :rfc:`2047`, however, has an :rfc:`5322` compatible
+ mechanism for encoding non-ASCII text as ASCII characters within a header
+ value. When a *value* containing encoded words is passed to the
+ constructor, the ``UnstructuredHeader`` parser converts such encoded words
+ back in to the original unicode, following the :rfc:`2047` rules for
+ unstructured text. The parser uses heuristics to attempt to decode certain
+ non-compliant encoded words. Defects are registered in such cases, as well
+ as defects for issues such as invalid characters within the encoded words or
+ the non-encoded text.
+
+ This header type provides no additional attributes.
+
+
+.. class:: DateHeader
+
+ :rfc:`5322` specifies a very specific format for dates within email headers.
+ The ``DateHeader`` parser recognizes that date format, as well as
+ recognizing a number of variant forms that are sometimes found "in the
+ wild".
+
+ This header type provides the following additional attributes:
+
+ .. attribute:: datetime
+
+ If the header value can be recognized as a valid date of one form or
+ another, this attribute will contain a :class:`~datetime.datetime`
+ instance representing that date. If the timezone of the input date is
+ specified as ``-0000`` (indicating it is in UTC but contains no
+ information about the source timezone), then :attr:`.datetime` will be a
+ naive :class:`~datetime.datetime`. If a specific timezone offset is
+ found (including `+0000`), then :attr:`.datetime` will contain an aware
+ ``datetime`` that uses :class:`datetime.timezone` to record the timezone
+ offset.
+
+ The ``decoded`` value of the header is determined by formatting the
+ ``datetime`` according to the :rfc:`5322` rules; that is, it is set to::
+
+ email.utils.format_datetime(self.datetime)
+
+ When creating a ``DateHeader``, *value* may be
+ :class:`~datetime.datetime` instance. This means, for example, that
+ the following code is valid and does what one would expect::
+
+ msg['Date'] = datetime(2011, 7, 15, 21)
+
+ Because this is a naive ``datetime`` it will be interpreted as a UTC
+ timestamp, and the resulting value will have a timezone of ``-0000``. Much
+ more useful is to use the :func:`~email.utils.localtime` function from the
+ :mod:`~email.utils` module::
+
+ msg['Date'] = utils.localtime()
+
+ This example sets the date header to the current time and date using
+ the current timezone offset.
+
+
+.. class:: AddressHeader
+
+ Address headers are one of the most complex structured header types.
+ The ``AddressHeader`` class provides a generic interface to any address
+ header.
+
+ This header type provides the following additional attributes:
+
+
+ .. attribute:: groups
+
+ A tuple of :class:`.Group` objects encoding the
+ addresses and groups found in the header value. Addresses that are
+ not part of a group are represented in this list as single-address
+ ``Groups`` whose :attr:`~.Group.display_name` is ``None``.
+
+
+ .. attribute:: addresses
+
+ A tuple of :class:`.Address` objects encoding all
+ of the individual addresses from the header value. If the header value
+ contains any groups, the individual addresses from the group are included
+ in the list at the point where the group occurs in the value (that is,
+ the list of addresses is "flattened" into a one dimensional list).
+
+ The ``decoded`` value of the header will have all encoded words decoded to
+ unicode. :class:`~encodings.idna` encoded domain names are also decoded to unicode. The
+ ``decoded`` value is set by :attr:`~str.join`\ ing the :class:`str` value of
+ the elements of the ``groups`` attribute with ``', '``.
+
+ A list of :class:`.Address` and :class:`.Group` objects in any combination
+ may be used to set the value of an address header. ``Group`` objects whose
+ ``display_name`` is ``None`` will be interpreted as single addresses, which
+ allows an address list to be copied with groups intact by using the list
+ obtained ``groups`` attribute of the source header.
+
+
+.. class:: SingleAddressHeader
+
+ A subclass of :class:`.AddressHeader` that adds one
+ additional attribute:
+
+
+ .. attribute:: address
+
+ The single address encoded by the header value. If the header value
+ actually contains more than one address (which would be a violation of
+ the RFC under the default :mod:`~email.policy`), accessing this attribute
+ will result in a :exc:`ValueError`.
+
+
+Many of the above classes also have a ``Unique`` variant (for example,
+``UniqueUnstructuredHeader``). The only difference is that in the ``Unique``
+variant, :attr:`~.BaseHeader.max_count` is set to 1.
+
+
+.. class:: MIMEVersionHeader
+
+ There is really only one valid value for the :mailheader:`MIME-Version`
+ header, and that is ``1.0``. For future proofing, this header class
+ supports other valid version numbers. If a version number has a valid value
+ per :rfc:`2045`, then the header object will have non-``None`` values for
+ the following attributes:
+
+ .. attribute:: version
+
+ The version number as a string, with any whitespace and/or comments
+ removed.
+
+ .. attribute:: major
+
+ The major version number as an integer
+
+ .. attribute:: minor
+
+ The minor version number as an integer
+
+
+.. class:: ParameterizedMIMEHeader
+
+ MOME headers all start with the prefix 'Content-'. Each specific header has
+ a certain value, described under the class for that header. Some can
+ also take a list of supplemental parameters, which have a common format.
+ This class serves as a base for all the MIME headers that take parameters.
+
+ .. attribute:: params
+
+ A dictionary mapping parameter names to parameter values.
+
+
+.. class:: ContentTypeHeader
+
+ A :class:`ParameterizedMIMEHeader` class that handles the
+ :mailheader:`Content-Type` header.
+
+ .. attribute:: content_type
+
+ The content type string, in the form ``maintype/subtype``.
+
+ .. attribute:: maintype
+
+ .. attribute:: subtype
+
+
+.. class:: ContentDispositionHeader
+
+ A :class:`ParameterizedMIMEHeader` class that handles the
+ :mailheader:`Content-Disposition` header.
+
+ .. attribute:: content-disposition
+
+ ``inline`` and ``attachment`` are the only valid values in common use.
+
+
+.. class:: ContentTransferEncoding
+
+ Handles the :mailheader:`Content-Transfer-Encoding` header.
+
+ .. attribute:: cte
+
+ Valid values are ``7bit``, ``8bit``, ``base64``, and
+ ``quoted-printable``. See :rfc:`2045` for more information.
+
+
+
+.. class:: HeaderRegistry(base_class=BaseHeader, \
+ default_class=UnstructuredHeader, \
+ use_default_map=True)
+
+ This is the factory used by :class:`~email.policy.EmailPolicy` by default.
+ ``HeaderRegistry`` builds the class used to create a header instance
+ dynamically, using *base_class* and a specialized class retrieved from a
+ registry that it holds. When a given header name does not appear in the
+ registry, the class specified by *default_class* is used as the specialized
+ class. When *use_default_map* is ``True`` (the default), the standard
+ mapping of header names to classes is copied in to the registry during
+ initialization. *base_class* is always the last class in the generated
+ class's ``__bases__`` list.
+
+ The default mappings are:
+
+ :subject: UniqueUnstructuredHeader
+ :date: UniqueDateHeader
+ :resent-date: DateHeader
+ :orig-date: UniqueDateHeader
+ :sender: UniqueSingleAddressHeader
+ :resent-sender: SingleAddressHeader
+ :to: UniqueAddressHeader
+ :resent-to: AddressHeader
+ :cc: UniqueAddressHeader
+ :resent-cc: AddressHeader
+ :from: UniqueAddressHeader
+ :resent-from: AddressHeader
+ :reply-to: UniqueAddressHeader
+
+ ``HeaderRegistry`` has the following methods:
+
+
+ .. method:: map_to_type(self, name, cls)
+
+ *name* is the name of the header to be mapped. It will be converted to
+ lower case in the registry. *cls* is the specialized class to be used,
+ along with *base_class*, to create the class used to instantiate headers
+ that match *name*.
+
+
+ .. method:: __getitem__(name)
+
+ Construct and return a class to handle creating a *name* header.
+
+
+ .. method:: __call__(name, value)
+
+ Retrieves the specialized header associated with *name* from the
+ registry (using *default_class* if *name* does not appear in the
+ registry) and composes it with *base_class* to produce a class,
+ calls the constructed class's constructor, passing it the same
+ argument list, and finally returns the class instance created thereby.
+
+
+The following classes are the classes used to represent data parsed from
+structured headers and can, in general, be used by an application program to
+construct structured values to assign to specific headers.
+
+
+.. class:: Address(display_name='', username='', domain='', addr_spec=None)
+
+ The class used to represent an email address. The general form of an
+ address is::
+
+ [display_name] <username@domain>
+
+ or::
+
+ username@domain
+
+ where each part must conform to specific syntax rules spelled out in
+ :rfc:`5322`.
+
+ As a convenience *addr_spec* can be specified instead of *username* and
+ *domain*, in which case *username* and *domain* will be parsed from the
+ *addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is
+ not ``Address`` will raise an error. Unicode characters are allowed and
+ will be property encoded when serialized. However, per the RFCs, unicode is
+ *not* allowed in the username portion of the address.
+
+ .. attribute:: display_name
+
+ The display name portion of the address, if any, with all quoting
+ removed. If the address does not have a display name, this attribute
+ will be an empty string.
+
+ .. attribute:: username
+
+ The ``username`` portion of the address, with all quoting removed.
+
+ .. attribute:: domain
+
+ The ``domain`` portion of the address.
+
+ .. attribute:: addr_spec
+
+ The ``username@domain`` portion of the address, correctly quoted
+ for use as a bare address (the second form shown above). This
+ attribute is not mutable.
+
+ .. method:: __str__()
+
+ The ``str`` value of the object is the address quoted according to
+ :rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII
+ characters.
+
+ To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if
+ ``username`` and ``domain`` are both the empty string (or ``None``), then
+ the string value of the ``Address`` is ``<>``.
+
+
+.. class:: Group(display_name=None, addresses=None)
+
+ The class used to represent an address group. The general form of an
+ address group is::
+
+ display_name: [address-list];
+
+ As a convenience for processing lists of addresses that consist of a mixture
+ of groups and single addresses, a ``Group`` may also be used to represent
+ single addresses that are not part of a group by setting *display_name* to
+ ``None`` and providing a list of the single address as *addresses*.
+
+ .. attribute:: display_name
+
+ The ``display_name`` of the group. If it is ``None`` and there is
+ exactly one ``Address`` in ``addresses``, then the ``Group`` represents a
+ single address that is not in a group.
+
+ .. attribute:: addresses
+
+ A possibly empty tuple of :class:`.Address` objects representing the
+ addresses in the group.
+
+ .. method:: __str__()
+
+ The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`,
+ but with no Content Transfer Encoding of any non-ASCII characters. If
+ ``display_name`` is none and there is a single ``Address`` in the
+ ``addresses`` list, the ``str`` value will be the same as the ``str`` of
+ that single ``Address``.