summaryrefslogtreecommitdiff
path: root/Doc/library/email.parser.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/email.parser.rst')
-rw-r--r--Doc/library/email.parser.rst358
1 files changed, 183 insertions, 175 deletions
diff --git a/Doc/library/email.parser.rst b/Doc/library/email.parser.rst
index b8eb7c516e..c323ebc640 100644
--- a/Doc/library/email.parser.rst
+++ b/Doc/library/email.parser.rst
@@ -8,210 +8,223 @@
--------------
-Message object structures can be created in one of two ways: they can be created
-from whole cloth by instantiating :class:`~email.message.Message` objects and
-stringing them together via :meth:`~email.message.Message.attach` and
-:meth:`~email.message.Message.set_payload` calls, or they
-can be created by parsing a flat text representation of the email message.
+Message object structures can be created in one of two ways: they can be
+created from whole cloth by creating an :class:`~email.message.EmailMessage`
+object, adding headers using the dictionary interface, and adding payload(s)
+using :meth:`~email.message.EmailMessage.set_content` and related methods, or
+they can be created by parsing a serialized representation of the email
+message.
The :mod:`email` package provides a standard parser that understands most email
-document structures, including MIME documents. You can pass the parser a string
-or a file object, and the parser will return to you the root
-:class:`~email.message.Message` instance of the object structure. For simple,
-non-MIME messages the payload of this root object will likely be a string
-containing the text of the message. For MIME messages, the root object will
-return ``True`` from its :meth:`~email.message.Message.is_multipart` method, and
-the subparts can be accessed via the :meth:`~email.message.Message.get_payload`
-and :meth:`~email.message.Message.walk` methods.
-
-There are actually two parser interfaces available for use, the classic
-:class:`Parser` API and the incremental :class:`FeedParser` API. The classic
-:class:`Parser` API is fine if you have the entire text of the message in memory
-as a string, or if the entire message lives in a file on the file system.
-:class:`FeedParser` is more appropriate for when you're reading the message from
-a stream which might block waiting for more input (e.g. reading an email message
-from a socket). The :class:`FeedParser` can consume and parse the message
-incrementally, and only returns the root object when you close the parser [#]_.
+document structures, including MIME documents. You can pass the parser a
+bytes, string or file object, and the parser will return to you the root
+:class:`~email.message.EmailMessage` instance of the object structure. For
+simple, non-MIME messages the payload of this root object will likely be a
+string containing the text of the message. For MIME messages, the root object
+will return ``True`` from its :meth:`~email.message.EmailMessage.is_multipart`
+method, and the subparts can be accessed via the payload manipulation methods,
+such as :meth:`~email.message.EmailMessage.get_body`,
+:meth:`~email.message.EmailMessage.iter_parts`, and
+:meth:`~email.message.EmailMessage.walk`.
+
+There are actually two parser interfaces available for use, the :class:`Parser`
+API and the incremental :class:`FeedParser` API. The :class:`Parser` API is
+most useful if you have the entire text of the message in memory, or if the
+entire message lives in a file on the file system. :class:`FeedParser` is more
+appropriate when you are reading the message from a stream which might block
+waiting for more input (such as reading an email message from a socket). The
+:class:`FeedParser` can consume and parse the message incrementally, and only
+returns the root object when you close the parser.
Note that the parser can be extended in limited ways, and of course you can
-implement your own parser completely from scratch. There is no magical
-connection between the :mod:`email` package's bundled parser and the
-:class:`~email.message.Message` class, so your custom parser can create message
-object trees any way it finds necessary.
+implement your own parser completely from scratch. All of the logic that
+connects the :mod:`email` package's bundled parser and the
+:class:`~email.message.EmailMessage` class is embodied in the :mod:`policy`
+class, so a custom parser can create message object trees any way it finds
+necessary by implementing custom versions of the appropriate :mod:`policy`
+methods.
FeedParser API
^^^^^^^^^^^^^^
-The :class:`FeedParser`, imported from the :mod:`email.feedparser` module,
-provides an API that is conducive to incremental parsing of email messages, such
-as would be necessary when reading the text of an email message from a source
-that can block (e.g. a socket). The :class:`FeedParser` can of course be used
-to parse an email message fully contained in a string or a file, but the classic
-:class:`Parser` API may be more convenient for such use cases. The semantics
-and results of the two parser APIs are identical.
-
-The :class:`FeedParser`'s API is simple; you create an instance, feed it a bunch
-of text until there's no more to feed it, then close the parser to retrieve the
-root message object. The :class:`FeedParser` is extremely accurate when parsing
-standards-compliant messages, and it does a very good job of parsing
-non-compliant messages, providing information about how a message was deemed
-broken. It will populate a message object's *defects* attribute with a list of
-any problems it found in a message. See the :mod:`email.errors` module for the
+The :class:`BytesFeedParser`, imported from the :mod:`email.feedparser` module,
+provides an API that is conducive to incremental parsing of email messages,
+such as would be necessary when reading the text of an email message from a
+source that can block (such as a socket). The :class:`BytesFeedParser` can of
+course be used to parse an email message fully contained in a :term:`bytes-like
+object`, string, or file, but the :class:`BytesParser` API may be more
+convenient for such use cases. The semantics and results of the two parser
+APIs are identical.
+
+The :class:`BytesFeedParser`'s API is simple; you create an instance, feed it a
+bunch of bytes until there's no more to feed it, then close the parser to
+retrieve the root message object. The :class:`BytesFeedParser` is extremely
+accurate when parsing standards-compliant messages, and it does a very good job
+of parsing non-compliant messages, providing information about how a message
+was deemed broken. It will populate a message object's
+:attr:`~email.message.EmailMessage.defects` attribute with a list of any
+problems it found in a message. See the :mod:`email.errors` module for the
list of defects that it can find.
-Here is the API for the :class:`FeedParser`:
+Here is the API for the :class:`BytesFeedParser`:
-.. class:: FeedParser(_factory=email.message.Message, *, policy=policy.compat32)
+.. class:: BytesFeedParser(_factory=None, *, policy=policy.compat32)
- Create a :class:`FeedParser` instance. Optional *_factory* is a no-argument
- callable that will be called whenever a new message object is needed. It
- defaults to the :class:`email.message.Message` class.
+ Create a :class:`BytesFeedParser` instance. Optional *_factory* is a
+ no-argument callable; if not specified use the
+ :attr:`~email.policy.Policy.message_factory` from the *policy*. Call
+ *_factory* whenever a new message object is needed.
- If *policy* is specified (it must be an instance of a :mod:`~email.policy`
- class) use the rules it specifies to update the representation of the
- message. If *policy* is not set, use the :class:`compat32
- <email.policy.Compat32>` policy, which maintains backward compatibility with
- the Python 3.2 version of the email package. For more information see the
+ If *policy* is specified use the rules it specifies to update the
+ representation of the message. If *policy* is not set, use the
+ :class:`compat32 <email.policy.Compat32>` policy, which maintains backward
+ compatibility with the Python 3.2 version of the email package and provides
+ :class:`~email.message.Message` as the default factory. All other policies
+ provide :class:`~email.message.EmailMessage` as the default *_factory*. For
+ more information on what else *policy* controls, see the
:mod:`~email.policy` documentation.
+ Note: **The policy keyword should always be specified**; The default will
+ change to :data:`email.policy.default` in a future version of Python.
+
+ .. versionadded:: 3.2
+
.. versionchanged:: 3.3 Added the *policy* keyword.
+ .. versionchanged:: 3.6 *_factory* defaults to the policy ``message_factory``.
+
.. method:: feed(data)
- Feed the :class:`FeedParser` some more data. *data* should be a string
- containing one or more lines. The lines can be partial and the
- :class:`FeedParser` will stitch such partial lines together properly. The
- lines in the string can have any of the common three line endings,
- carriage return, newline, or carriage return and newline (they can even be
- mixed).
+ Feed the parser some more data. *data* should be a :term:`bytes-like
+ object` containing one or more lines. The lines can be partial and the
+ parser will stitch such partial lines together properly. The lines can
+ have any of the three common line endings: carriage return, newline, or
+ carriage return and newline (they can even be mixed).
+
.. method:: close()
- Closing a :class:`FeedParser` completes the parsing of all previously fed
- data, and returns the root message object. It is undefined what happens
- if you feed more data to a closed :class:`FeedParser`.
+ Complete the parsing of all previously fed data and return the root
+ message object. It is undefined what happens if :meth:`~feed` is called
+ after this method has been called.
-.. class:: BytesFeedParser(_factory=email.message.Message)
+.. class:: FeedParser(_factory=None, *, policy=policy.compat32)
- Works exactly like :class:`FeedParser` except that the input to the
- :meth:`~FeedParser.feed` method must be bytes and not string.
+ Works like :class:`BytesFeedParser` except that the input to the
+ :meth:`~BytesFeedParser.feed` method must be a string. This is of limited
+ utility, since the only way for such a message to be valid is for it to
+ contain only ASCII text or, if :attr:`~email.policy.Policy.utf8` is
+ ``True``, no binary attachments.
- .. versionadded:: 3.2
+ .. versionchanged:: 3.3 Added the *policy* keyword.
-Parser class API
-^^^^^^^^^^^^^^^^
+Parser API
+^^^^^^^^^^
-The :class:`Parser` class, imported from the :mod:`email.parser` module,
+The :class:`BytesParser` class, imported from the :mod:`email.parser` module,
provides an API that can be used to parse a message when the complete contents
-of the message are available in a string or file. The :mod:`email.parser`
-module also provides header-only parsers, called :class:`HeaderParser` and
-:class:`BytesHeaderParser`, which can be used if you're only interested in the
-headers of the message. :class:`HeaderParser` and :class:`BytesHeaderParser`
+of the message are available in a :term:`bytes-like object` or file. The
+:mod:`email.parser` module also provides :class:`Parser` for parsing strings,
+and header-only parsers, :class:`BytesHeaderParser` and
+:class:`HeaderParser`, which can be used if you're only interested in the
+headers of the message. :class:`BytesHeaderParser` and :class:`HeaderParser`
can be much faster in these situations, since they do not attempt to parse the
-message body, instead setting the payload to the raw body as a string. They
-have the same API as the :class:`Parser` and :class:`BytesParser` classes.
+message body, instead setting the payload to the raw body.
-.. versionadded:: 3.3
- The BytesHeaderParser class.
+.. class:: BytesParser(_class=None, *, policy=policy.compat32)
-.. class:: Parser(_class=email.message.Message, *, policy=policy.compat32)
+ Create a :class:`BytesParser` instance. The *_class* and *policy*
+ arguments have the same meaning and sematnics as the *_factory*
+ and *policy* arguments of :class:`BytesFeedParser`.
- The constructor for the :class:`Parser` class takes an optional argument
- *_class*. This must be a callable factory (such as a function or a class), and
- it is used whenever a sub-message object needs to be created. It defaults to
- :class:`~email.message.Message` (see :mod:`email.message`). The factory will
- be called without arguments.
-
- If *policy* is specified (it must be an instance of a :mod:`~email.policy`
- class) use the rules it specifies to update the representation of the
- message. If *policy* is not set, use the :class:`compat32
- <email.policy.Compat32>` policy, which maintains backward compatibility with
- the Python 3.2 version of the email package. For more information see the
- :mod:`~email.policy` documentation.
+ Note: **The policy keyword should always be specified**; The default will
+ change to :data:`email.policy.default` in a future version of Python.
.. versionchanged:: 3.3
Removed the *strict* argument that was deprecated in 2.4. Added the
*policy* keyword.
-
- The other public :class:`Parser` methods are:
+ .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
.. method:: parse(fp, headersonly=False)
- Read all the data from the file-like object *fp*, parse the resulting
- text, and return the root message object. *fp* must support both the
- :meth:`~io.TextIOBase.readline` and the :meth:`~io.TextIOBase.read`
- methods on file-like objects.
+ Read all the data from the binary file-like object *fp*, parse the
+ resulting bytes, and return the message object. *fp* must support
+ both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read`
+ methods.
- The text contained in *fp* must be formatted as a block of :rfc:`2822`
+ The bytes contained in *fp* must be formatted as a block of :rfc:`5322`
+ (or, if :attr:`~email.policy.Policy.utf8` is ``True``, :rfc:`6532`)
style headers and header continuation lines, optionally preceded by an
envelope header. The header block is terminated either by the end of the
data or by a blank line. Following the header block is the body of the
- message (which may contain MIME-encoded subparts).
+ message (which may contain MIME-encoded subparts, including subparts
+ with a :mailheader:`Content-Transfer-Encoding` of ``8bit``.
Optional *headersonly* is a flag specifying whether to stop parsing after
reading the headers or not. The default is ``False``, meaning it parses
the entire contents of the file.
- .. method:: parsestr(text, headersonly=False)
- Similar to the :meth:`parse` method, except it takes a string object
- instead of a file-like object. Calling this method on a string is exactly
- equivalent to wrapping *text* in a :class:`~io.StringIO` instance first and
- calling :meth:`parse`.
+ .. method:: parsebytes(bytes, headersonly=False)
+
+ Similar to the :meth:`parse` method, except it takes a :term:`bytes-like
+ object` instead of a file-like object. Calling this method on a
+ :term:`bytes-like object` is equivalent to wrapping *bytes* in a
+ :class:`~io.BytesIO` instance first and calling :meth:`parse`.
Optional *headersonly* is as with the :meth:`parse` method.
+ .. versionadded:: 3.2
-.. class:: BytesParser(_class=email.message.Message, *, policy=policy.compat32)
- This class is exactly parallel to :class:`Parser`, but handles bytes input.
- The *_class* and *strict* arguments are interpreted in the same way as for
- the :class:`Parser` constructor.
+.. class:: BytesHeaderParser(_class=None, *, policy=policy.compat32)
- If *policy* is specified (it must be an instance of a :mod:`~email.policy`
- class) use the rules it specifies to update the representation of the
- message. If *policy* is not set, use the :class:`compat32
- <email.policy.Compat32>` policy, which maintains backward compatibility with
- the Python 3.2 version of the email package. For more information see the
- :mod:`~email.policy` documentation.
+ Exactly like :class:`BytesParser`, except that *headersonly*
+ defaults to ``True``.
+
+ .. versionadded:: 3.3
+
+
+.. class:: Parser(_class=None, *, policy=policy.compat32)
+
+ This class is parallel to :class:`BytesParser`, but handles string input.
.. versionchanged:: 3.3
Removed the *strict* argument. Added the *policy* keyword.
+ .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
+
.. method:: parse(fp, headersonly=False)
- Read all the data from the binary file-like object *fp*, parse the
- resulting bytes, and return the message object. *fp* must support
- both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read`
- methods on file-like objects.
+ Read all the data from the text-mode file-like object *fp*, parse the
+ resulting text, and return the root message object. *fp* must support
+ both the :meth:`~io.TextIOBase.readline` and the
+ :meth:`~io.TextIOBase.read` methods on file-like objects.
- The bytes contained in *fp* must be formatted as a block of :rfc:`2822`
- style headers and header continuation lines, optionally preceded by an
- envelope header. The header block is terminated either by the end of the
- data or by a blank line. Following the header block is the body of the
- message (which may contain MIME-encoded subparts, including subparts
- with a :mailheader:`Content-Transfer-Encoding` of ``8bit``.
+ Other than the text mode requirement, this method operates like
+ :meth:`BytesParser.parse`.
- Optional *headersonly* is a flag specifying whether to stop parsing after
- reading the headers or not. The default is ``False``, meaning it parses
- the entire contents of the file.
- .. method:: parsebytes(text, headersonly=False)
+ .. method:: parsestr(text, headersonly=False)
- Similar to the :meth:`parse` method, except it takes a :term:`bytes-like
- object` instead of a file-like object. Calling this method is equivalent
- to wrapping *text* in a :class:`~io.BytesIO` instance first and calling
- :meth:`parse`.
+ Similar to the :meth:`parse` method, except it takes a string object
+ instead of a file-like object. Calling this method on a string is
+ equivalent to wrapping *text* in a :class:`~io.StringIO` instance first
+ and calling :meth:`parse`.
Optional *headersonly* is as with the :meth:`parse` method.
- .. versionadded:: 3.2
+
+.. class:: HeaderParser(_class=None, *, policy=policy.compat32)
+
+ Exactly like :class:`Parser`, except that *headersonly*
+ defaults to ``True``.
Since creating a message object structure from a string or a file object is such
@@ -220,55 +233,58 @@ in the top-level :mod:`email` package namespace.
.. currentmodule:: email
-.. function:: message_from_string(s, _class=email.message.Message, *, \
- policy=policy.compat32)
- Return a message object structure from a string. This is exactly equivalent to
- ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as
- with the :class:`~email.parser.Parser` class constructor.
+.. function:: message_from_bytes(s, _class=None, *, policy=policy.compat32)
+
+ Return a message object structure from a :term:`bytes-like object`. This is
+ equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and
+ *strict* are interpreted as with the :class:`~email.parser.BytesParser` class
+ constructor.
+ .. versionadded:: 3.2
.. versionchanged:: 3.3
Removed the *strict* argument. Added the *policy* keyword.
-.. function:: message_from_bytes(s, _class=email.message.Message, *, \
- policy=policy.compat32)
- Return a message object structure from a :term:`bytes-like object`. This is exactly
- equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and
- *strict* are interpreted as with the :class:`~email.parser.Parser` class
+.. function:: message_from_binary_file(fp, _class=None, *,
+ policy=policy.compat32)
+
+ Return a message object structure tree from an open binary :term:`file
+ object`. This is equivalent to ``BytesParser().parse(fp)``. *_class* and
+ *policy* are interpreted as with the :class:`~email.parser.BytesParser` class
constructor.
.. versionadded:: 3.2
.. versionchanged:: 3.3
Removed the *strict* argument. Added the *policy* keyword.
-.. function:: message_from_file(fp, _class=email.message.Message, *, \
- policy=policy.compat32)
- Return a message object structure tree from an open :term:`file object`.
- This is exactly equivalent to ``Parser().parse(fp)``. *_class*
- and *policy* are interpreted as with the :class:`~email.parser.Parser` class
- constructor.
+.. function:: message_from_string(s, _class=None, *, policy=policy.compat32)
+
+ Return a message object structure from a string. This is equivalent to
+ ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as
+ with the :class:`~email.parser.Parser` class constructor.
.. versionchanged:: 3.3
Removed the *strict* argument. Added the *policy* keyword.
-.. function:: message_from_binary_file(fp, _class=email.message.Message, *, \
- policy=policy.compat32)
- Return a message object structure tree from an open binary :term:`file
- object`. This is exactly equivalent to ``BytesParser().parse(fp)``.
- *_class* and *policy* are interpreted as with the
- :class:`~email.parser.Parser` class constructor.
+.. function:: message_from_file(fp, _class=None, *, policy=policy.compat32)
+
+ Return a message object structure tree from an open :term:`file object`.
+ This is equivalent to ``Parser().parse(fp)``. *_class* and *policy* are
+ interpreted as with the :class:`~email.parser.Parser` class constructor.
- .. versionadded:: 3.2
.. versionchanged:: 3.3
Removed the *strict* argument. Added the *policy* keyword.
+ .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
-Here's an example of how you might use this at an interactive Python prompt::
+
+Here's an example of how you might use :func:`message_from_bytes` at an
+interactive Python prompt::
>>> import email
- >>> msg = email.message_from_string(myString) # doctest: +SKIP
+ >>> msg = email.message_from_bytes(myBytes) # doctest: +SKIP
Additional notes
@@ -278,35 +294,27 @@ Here are some notes on the parsing semantics:
* Most non-\ :mimetype:`multipart` type messages are parsed as a single message
object with a string payload. These objects will return ``False`` for
- :meth:`~email.message.Message.is_multipart`. Their
- :meth:`~email.message.Message.get_payload` method will return a string object.
+ :meth:`~email.message.EmailMessage.is_multipart`, and
+ :meth:`~email.message.EmailMessage.iter_parts` will yield an empty list.
* All :mimetype:`multipart` type messages will be parsed as a container message
object with a list of sub-message objects for their payload. The outer
container message will return ``True`` for
- :meth:`~email.message.Message.is_multipart` and their
- :meth:`~email.message.Message.get_payload` method will return the list of
- :class:`~email.message.Message` subparts.
+ :meth:`~email.message.EmailMessage.is_multipart`, and
+ :meth:`~email.message.EmailMessage.iter_parts` will yield a list of subparts.
-* Most messages with a content type of :mimetype:`message/\*` (e.g.
- :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also be
- parsed as container object containing a list payload of length 1. Their
- :meth:`~email.message.Message.is_multipart` method will return ``True``.
- The single element in the list payload will be a sub-message object.
+* Most messages with a content type of :mimetype:`message/\*` (such as
+ :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also
+ be parsed as container object containing a list payload of length 1. Their
+ :meth:`~email.message.EmailMessage.is_multipart` method will return ``True``.
+ The single element yielded by :meth:`~email.message.EmailMessage.iter_parts`
+ will be a sub-message object.
-* Some non-standards compliant messages may not be internally consistent about
+* Some non-standards-compliant messages may not be internally consistent about
their :mimetype:`multipart`\ -edness. Such messages may have a
:mailheader:`Content-Type` header of type :mimetype:`multipart`, but their
- :meth:`~email.message.Message.is_multipart` method may return ``False``.
+ :meth:`~email.message.EmailMessage.is_multipart` method may return ``False``.
If such messages were parsed with the :class:`~email.parser.FeedParser`,
they will have an instance of the
:class:`~email.errors.MultipartInvariantViolationDefect` class in their
*defects* attribute list. See :mod:`email.errors` for details.
-
-.. rubric:: Footnotes
-
-.. [#] As of email package version 3.0, introduced in Python 2.4, the classic
- :class:`~email.parser.Parser` was re-implemented in terms of the
- :class:`~email.parser.FeedParser`, so the semantics and results are
- identical between the two parsers.
-