diff options
Diffstat (limited to 'Doc/library/re.rst')
-rw-r--r-- | Doc/library/re.rst | 83 |
1 files changed, 69 insertions, 14 deletions
diff --git a/Doc/library/re.rst b/Doc/library/re.rst index c3c8b65d8d..0d305d58c9 100644 --- a/Doc/library/re.rst +++ b/Doc/library/re.rst @@ -281,9 +281,7 @@ The special characters are: assertion`. ``(?<=abc)def`` will find a match in ``abcdef``, since the lookbehind will back up 3 characters and check if the contained pattern matches. The contained pattern must only match strings of some fixed length, meaning that - ``abc`` or ``a|b`` are allowed, but ``a*`` and ``a{3,4}`` are not. Group - references are not supported even if they match strings of some fixed length. - Note that + ``abc`` or ``a|b`` are allowed, but ``a*`` and ``a{3,4}`` are not. Note that patterns which start with positive lookbehind assertions will not match at the beginning of the string being searched; you will most likely want to use the :func:`search` function rather than the :func:`match` function: @@ -299,12 +297,14 @@ The special characters are: >>> m.group(0) 'egg' + .. versionchanged: 3.5 + Added support for group references of fixed length. + ``(?<!...)`` Matches if the current position in the string is not preceded by a match for ``...``. This is called a :dfn:`negative lookbehind assertion`. Similar to positive lookbehind assertions, the contained pattern must only match strings of - some fixed length and shouldn't contain group references. - Patterns which start with negative lookbehind assertions may + some fixed length. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched. ``(?(id/name)yes-pattern|no-pattern)`` @@ -524,7 +524,11 @@ form. current locale. The use of this flag is discouraged as the locale mechanism is very unreliable, and it only handles one "culture" at a time anyway; you should use Unicode matching instead, which is the default in Python 3 - for Unicode (str) patterns. + for Unicode (str) patterns. This flag makes sense only with bytes patterns. + + .. deprecated-removed:: 3.5 3.6 + Deprecated the use of :const:`re.LOCALE` with string patterns or + :const:`re.ASCII`. .. data:: M @@ -625,17 +629,37 @@ form. That way, separator components are always found at the same relative indices within the result list. - Note that *split* will never split a string on an empty pattern match. - For example: + .. note:: + + :func:`split` doesn't currently split a string on an empty pattern match. + For example: - >>> re.split('x*', 'foo') - ['foo'] - >>> re.split("(?m)^$", "foo\n\nbar\n") - ['foo\n\nbar\n'] + >>> re.split('x*', 'axbc') + ['a', 'bc'] + + Even though ``'x*'`` also matches 0 'x' before 'a', between 'b' and 'c', + and after 'c', currently these matches are ignored. The correct behavior + (i.e. splitting on empty matches too and returning ``['', 'a', 'b', 'c', + '']``) will be implemented in future versions of Python, but since this + is a backward incompatible change, a :exc:`FutureWarning` will be raised + in the meanwhile. + + Patterns that can only match empty strings currently never split the + string. Since this doesn't match the expected behavior, a + :exc:`ValueError` will be raised starting from Python 3.5:: + + >>> re.split("^$", "foo\n\nbar\n", flags=re.M) + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + ... + ValueError: split() requires a non-empty pattern match. .. versionchanged:: 3.1 Added the optional flags argument. + .. versionchanged:: 3.5 + Splitting on a pattern that could match an empty string now raises + a warning. Patterns that can only match empty strings are now rejected. .. function:: findall(pattern, string, flags=0) @@ -705,6 +729,9 @@ form. .. versionchanged:: 3.1 Added the optional flags argument. + .. versionchanged:: 3.5 + Unmatched groups are replaced with an empty string. + .. function:: subn(pattern, repl, string, count=0, flags=0) @@ -714,6 +741,9 @@ form. .. versionchanged:: 3.1 Added the optional flags argument. + .. versionchanged:: 3.5 + Unmatched groups are replaced with an empty string. + .. function:: escape(string) @@ -730,13 +760,36 @@ form. Clear the regular expression cache. -.. exception:: error +.. exception:: error(msg, pattern=None, pos=None) Exception raised when a string passed to one of the functions here is not a valid regular expression (for example, it might contain unmatched parentheses) or when some other error occurs during compilation or matching. It is never an - error if a string contains no match for a pattern. + error if a string contains no match for a pattern. The error instance has + the following additional attributes: + + .. attribute:: msg + + The unformatted error message. + + .. attribute:: pattern + + The regular expression pattern. + + .. attribute:: pos + + The index of *pattern* where compilation failed. + + .. attribute:: lineno + + The line corresponding to *pos*. + + .. attribute:: colno + + The column corresponding to *pos*. + .. versionchanged:: 3.5 + Added additional attributes. .. _re-objects: @@ -889,6 +942,8 @@ Match objects support the following methods and attributes: (``\g<1>``, ``\g<name>``) are replaced by the contents of the corresponding group. + .. versionchanged:: 3.5 + Unmatched groups are replaced with an empty string. .. method:: match.group([group1, ...]) |