summaryrefslogtreecommitdiff
path: root/Doc/library/re.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/re.rst')
-rw-r--r--Doc/library/re.rst83
1 files changed, 69 insertions, 14 deletions
diff --git a/Doc/library/re.rst b/Doc/library/re.rst
index c3c8b65d8d..0d305d58c9 100644
--- a/Doc/library/re.rst
+++ b/Doc/library/re.rst
@@ -281,9 +281,7 @@ The special characters are:
assertion`. ``(?<=abc)def`` will find a match in ``abcdef``, since the
lookbehind will back up 3 characters and check if the contained pattern matches.
The contained pattern must only match strings of some fixed length, meaning that
- ``abc`` or ``a|b`` are allowed, but ``a*`` and ``a{3,4}`` are not. Group
- references are not supported even if they match strings of some fixed length.
- Note that
+ ``abc`` or ``a|b`` are allowed, but ``a*`` and ``a{3,4}`` are not. Note that
patterns which start with positive lookbehind assertions will not match at the
beginning of the string being searched; you will most likely want to use the
:func:`search` function rather than the :func:`match` function:
@@ -299,12 +297,14 @@ The special characters are:
>>> m.group(0)
'egg'
+ .. versionchanged: 3.5
+ Added support for group references of fixed length.
+
``(?<!...)``
Matches if the current position in the string is not preceded by a match for
``...``. This is called a :dfn:`negative lookbehind assertion`. Similar to
positive lookbehind assertions, the contained pattern must only match strings of
- some fixed length and shouldn't contain group references.
- Patterns which start with negative lookbehind assertions may
+ some fixed length. Patterns which start with negative lookbehind assertions may
match at the beginning of the string being searched.
``(?(id/name)yes-pattern|no-pattern)``
@@ -524,7 +524,11 @@ form.
current locale. The use of this flag is discouraged as the locale mechanism
is very unreliable, and it only handles one "culture" at a time anyway;
you should use Unicode matching instead, which is the default in Python 3
- for Unicode (str) patterns.
+ for Unicode (str) patterns. This flag makes sense only with bytes patterns.
+
+ .. deprecated-removed:: 3.5 3.6
+ Deprecated the use of :const:`re.LOCALE` with string patterns or
+ :const:`re.ASCII`.
.. data:: M
@@ -625,17 +629,37 @@ form.
That way, separator components are always found at the same relative
indices within the result list.
- Note that *split* will never split a string on an empty pattern match.
- For example:
+ .. note::
+
+ :func:`split` doesn't currently split a string on an empty pattern match.
+ For example:
- >>> re.split('x*', 'foo')
- ['foo']
- >>> re.split("(?m)^$", "foo\n\nbar\n")
- ['foo\n\nbar\n']
+ >>> re.split('x*', 'axbc')
+ ['a', 'bc']
+
+ Even though ``'x*'`` also matches 0 'x' before 'a', between 'b' and 'c',
+ and after 'c', currently these matches are ignored. The correct behavior
+ (i.e. splitting on empty matches too and returning ``['', 'a', 'b', 'c',
+ '']``) will be implemented in future versions of Python, but since this
+ is a backward incompatible change, a :exc:`FutureWarning` will be raised
+ in the meanwhile.
+
+ Patterns that can only match empty strings currently never split the
+ string. Since this doesn't match the expected behavior, a
+ :exc:`ValueError` will be raised starting from Python 3.5::
+
+ >>> re.split("^$", "foo\n\nbar\n", flags=re.M)
+ Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ ...
+ ValueError: split() requires a non-empty pattern match.
.. versionchanged:: 3.1
Added the optional flags argument.
+ .. versionchanged:: 3.5
+ Splitting on a pattern that could match an empty string now raises
+ a warning. Patterns that can only match empty strings are now rejected.
.. function:: findall(pattern, string, flags=0)
@@ -705,6 +729,9 @@ form.
.. versionchanged:: 3.1
Added the optional flags argument.
+ .. versionchanged:: 3.5
+ Unmatched groups are replaced with an empty string.
+
.. function:: subn(pattern, repl, string, count=0, flags=0)
@@ -714,6 +741,9 @@ form.
.. versionchanged:: 3.1
Added the optional flags argument.
+ .. versionchanged:: 3.5
+ Unmatched groups are replaced with an empty string.
+
.. function:: escape(string)
@@ -730,13 +760,36 @@ form.
Clear the regular expression cache.
-.. exception:: error
+.. exception:: error(msg, pattern=None, pos=None)
Exception raised when a string passed to one of the functions here is not a
valid regular expression (for example, it might contain unmatched parentheses)
or when some other error occurs during compilation or matching. It is never an
- error if a string contains no match for a pattern.
+ error if a string contains no match for a pattern. The error instance has
+ the following additional attributes:
+
+ .. attribute:: msg
+
+ The unformatted error message.
+
+ .. attribute:: pattern
+
+ The regular expression pattern.
+
+ .. attribute:: pos
+
+ The index of *pattern* where compilation failed.
+
+ .. attribute:: lineno
+
+ The line corresponding to *pos*.
+
+ .. attribute:: colno
+
+ The column corresponding to *pos*.
+ .. versionchanged:: 3.5
+ Added additional attributes.
.. _re-objects:
@@ -889,6 +942,8 @@ Match objects support the following methods and attributes:
(``\g<1>``, ``\g<name>``) are replaced by the contents of the
corresponding group.
+ .. versionchanged:: 3.5
+ Unmatched groups are replaced with an empty string.
.. method:: match.group([group1, ...])