summaryrefslogtreecommitdiff
path: root/Doc/library/re.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/re.rst')
-rw-r--r--Doc/library/re.rst93
1 files changed, 78 insertions, 15 deletions
diff --git a/Doc/library/re.rst b/Doc/library/re.rst
index c3c8b65d8d..888458449a 100644
--- a/Doc/library/re.rst
+++ b/Doc/library/re.rst
@@ -281,9 +281,7 @@ The special characters are:
assertion`. ``(?<=abc)def`` will find a match in ``abcdef``, since the
lookbehind will back up 3 characters and check if the contained pattern matches.
The contained pattern must only match strings of some fixed length, meaning that
- ``abc`` or ``a|b`` are allowed, but ``a*`` and ``a{3,4}`` are not. Group
- references are not supported even if they match strings of some fixed length.
- Note that
+ ``abc`` or ``a|b`` are allowed, but ``a*`` and ``a{3,4}`` are not. Note that
patterns which start with positive lookbehind assertions will not match at the
beginning of the string being searched; you will most likely want to use the
:func:`search` function rather than the :func:`match` function:
@@ -299,12 +297,14 @@ The special characters are:
>>> m.group(0)
'egg'
+ .. versionchanged: 3.5
+ Added support for group references of fixed length.
+
``(?<!...)``
Matches if the current position in the string is not preceded by a match for
``...``. This is called a :dfn:`negative lookbehind assertion`. Similar to
positive lookbehind assertions, the contained pattern must only match strings of
- some fixed length and shouldn't contain group references.
- Patterns which start with negative lookbehind assertions may
+ some fixed length. Patterns which start with negative lookbehind assertions may
match at the beginning of the string being searched.
``(?(id/name)yes-pattern|no-pattern)``
@@ -438,6 +438,10 @@ three digits in length.
.. versionchanged:: 3.3
The ``'\u'`` and ``'\U'`` escape sequences have been added.
+.. deprecated-removed:: 3.5 3.6
+ Unknown escapes consist of ``'\'`` and ASCII letter now raise a
+ deprecation warning and will be forbidden in Python 3.6.
+
.. seealso::
@@ -524,7 +528,11 @@ form.
current locale. The use of this flag is discouraged as the locale mechanism
is very unreliable, and it only handles one "culture" at a time anyway;
you should use Unicode matching instead, which is the default in Python 3
- for Unicode (str) patterns.
+ for Unicode (str) patterns. This flag makes sense only with bytes patterns.
+
+ .. deprecated-removed:: 3.5 3.6
+ Deprecated the use of :const:`re.LOCALE` with string patterns or
+ :const:`re.ASCII`.
.. data:: M
@@ -625,17 +633,37 @@ form.
That way, separator components are always found at the same relative
indices within the result list.
- Note that *split* will never split a string on an empty pattern match.
- For example:
+ .. note::
+
+ :func:`split` doesn't currently split a string on an empty pattern match.
+ For example:
- >>> re.split('x*', 'foo')
- ['foo']
- >>> re.split("(?m)^$", "foo\n\nbar\n")
- ['foo\n\nbar\n']
+ >>> re.split('x*', 'axbc')
+ ['a', 'bc']
+
+ Even though ``'x*'`` also matches 0 'x' before 'a', between 'b' and 'c',
+ and after 'c', currently these matches are ignored. The correct behavior
+ (i.e. splitting on empty matches too and returning ``['', 'a', 'b', 'c',
+ '']``) will be implemented in future versions of Python, but since this
+ is a backward incompatible change, a :exc:`FutureWarning` will be raised
+ in the meanwhile.
+
+ Patterns that can only match empty strings currently never split the
+ string. Since this doesn't match the expected behavior, a
+ :exc:`ValueError` will be raised starting from Python 3.5::
+
+ >>> re.split("^$", "foo\n\nbar\n", flags=re.M)
+ Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ ...
+ ValueError: split() requires a non-empty pattern match.
.. versionchanged:: 3.1
Added the optional flags argument.
+ .. versionchanged:: 3.5
+ Splitting on a pattern that could match an empty string now raises
+ a warning. Patterns that can only match empty strings are now rejected.
.. function:: findall(pattern, string, flags=0)
@@ -663,7 +691,7 @@ form.
*string* is returned unchanged. *repl* can be a string or a function; if it is
a string, any backslash escapes in it are processed. That is, ``\n`` is
converted to a single newline character, ``\r`` is converted to a carriage return, and
- so forth. Unknown escapes such as ``\j`` are left alone. Backreferences, such
+ so forth. Unknown escapes such as ``\&`` are left alone. Backreferences, such
as ``\6``, are replaced with the substring matched by group 6 in the pattern.
For example:
@@ -705,6 +733,13 @@ form.
.. versionchanged:: 3.1
Added the optional flags argument.
+ .. versionchanged:: 3.5
+ Unmatched groups are replaced with an empty string.
+
+ .. deprecated-removed:: 3.5 3.6
+ Unknown escapes consist of ``'\'`` and ASCII letter now raise a
+ deprecation warning and will be forbidden in Python 3.6.
+
.. function:: subn(pattern, repl, string, count=0, flags=0)
@@ -714,6 +749,9 @@ form.
.. versionchanged:: 3.1
Added the optional flags argument.
+ .. versionchanged:: 3.5
+ Unmatched groups are replaced with an empty string.
+
.. function:: escape(string)
@@ -730,13 +768,36 @@ form.
Clear the regular expression cache.
-.. exception:: error
+.. exception:: error(msg, pattern=None, pos=None)
Exception raised when a string passed to one of the functions here is not a
valid regular expression (for example, it might contain unmatched parentheses)
or when some other error occurs during compilation or matching. It is never an
- error if a string contains no match for a pattern.
+ error if a string contains no match for a pattern. The error instance has
+ the following additional attributes:
+
+ .. attribute:: msg
+
+ The unformatted error message.
+
+ .. attribute:: pattern
+
+ The regular expression pattern.
+
+ .. attribute:: pos
+
+ The index of *pattern* where compilation failed.
+
+ .. attribute:: lineno
+
+ The line corresponding to *pos*.
+
+ .. attribute:: colno
+
+ The column corresponding to *pos*.
+ .. versionchanged:: 3.5
+ Added additional attributes.
.. _re-objects:
@@ -889,6 +950,8 @@ Match objects support the following methods and attributes:
(``\g<1>``, ``\g<name>``) are replaced by the contents of the
corresponding group.
+ .. versionchanged:: 3.5
+ Unmatched groups are replaced with an empty string.
.. method:: match.group([group1, ...])