diff options
author | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2019-02-06 18:11:36 +0000 |
---|---|---|
committer | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2019-02-06 18:11:36 +0000 |
commit | 03c006cfda40d5218d2248674ddc3824f8169897 (patch) | |
tree | 8bfb007e8adba8eb8e1256afba09001b52509905 /doc/pcre2compat.3 | |
parent | 2aee0809b4ec6f9c2fdbb33a0c200b17a9fd333c (diff) | |
download | pcre2-03c006cfda40d5218d2248674ddc3824f8169897.tar.gz |
Allow non-ASCII in group names when UTF is set; revise group naming terminology
in documentation to use "capture group", as Perl does.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1066 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/pcre2compat.3')
-rw-r--r-- | doc/pcre2compat.3 | 52 |
1 files changed, 25 insertions, 27 deletions
diff --git a/doc/pcre2compat.3 b/doc/pcre2compat.3 index 6e448f6..a2fbf48 100644 --- a/doc/pcre2compat.3 +++ b/doc/pcre2compat.3 @@ -1,4 +1,4 @@ -.TH PCRE2COMPAT 3 "28 July 2018" "PCRE2 10.32" +.TH PCRE2COMPAT 3 "03 February 2019" "PCRE2 10.33" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "DIFFERENCES BETWEEN PCRE2 AND PERL" @@ -23,10 +23,9 @@ character is not "a" three times (in principle; PCRE2 optimizes this to run the assertion just once). Perl allows some repeat quantifiers on other assertions, for example, \eb* (but not \eb{3}), but these do not seem to have any use. .P -3. Capturing subpatterns that occur inside negative lookaround assertions are -counted, but their entries in the offsets vector are set only when a negative -assertion is a condition that has a matching branch (that is, the condition is -false). +3. Capture groups that occur inside negative lookaround assertions are counted, +but their entries in the offsets vector are set only when a negative assertion +is a condition that has a matching branch (that is, the condition is false). .P 4. The following Perl escape sequences are not supported: \eF, \el, \eL, \eu, \eU, and \eN when followed by a character name. \eN on its own, matching a @@ -79,13 +78,13 @@ documentation for details. to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking into subroutine calls is now supported, as in Perl. .P -9. If any of the backtracking control verbs are used in a subpattern that is -called as a subroutine (whether or not recursively), their effect is confined -to that subpattern; it does not extend to the surrounding pattern. This is not -always the case in Perl. In particular, if (*THEN) is present in a group that -is called as a subroutine, its action is limited to that group, even if the -group does not contain any | characters. Note that such subpatterns are -processed as anchored at the point where they are tested. +9. If any of the backtracking control verbs are used in a group that is called +as a subroutine (whether or not recursively), their effect is confined to that +group; it does not extend to the surrounding pattern. This is not always the +case in Perl. In particular, if (*THEN) is present in a group that is called as +a subroutine, its action is limited to that group, even if the group does not +contain any | characters. Note that such groups are processed as anchored +at the point where they are tested. .P 10. If a pattern contains more than one backtracking control verb, the first one that is backtracked onto acts. For example, in the pattern @@ -101,21 +100,20 @@ strings when part of a pattern is repeated. For example, matching "aba" against the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to "b". .P -13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern -names is not as general as Perl's. This is a consequence of the fact the PCRE2 -works internally just with numbers, using an external table to translate -between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b>B), -where the two capturing parentheses have the same number but different names, -is not supported, and causes an error at compile time. If it were allowed, it -would not be possible to distinguish which parentheses matched, because both -names map to capturing subpattern number 1. To avoid this confusing situation, -an error is given at compile time. +13. PCRE2's handling of duplicate capture group numbers and names is not as +general as Perl's. This is a consequence of the fact the PCRE2 works internally +just with numbers, using an external table to translate between numbers and +names. In particular, a pattern such as (?|(?<a>A)|(?<b>B), where the two +capture groups have the same number but different names, is not supported, and +causes an error at compile time. If it were allowed, it would not be possible +to distinguish which group matched, because both names map to capture group +number 1. To avoid this confusing situation, an error is given at compile time. .P 14. Perl used to recognize comments in some places that PCRE2 does not, for -example, between the ( and ? at the start of a subpattern. If the /x modifier -is set, Perl allowed white space between ( and ? though the latest Perls give -an error (for a while it was just deprecated). There may still be some cases -where Perl behaves differently. +example, between the ( and ? at the start of a group. If the /x modifier is +set, Perl allowed white space between ( and ? though the latest Perls give an +error (for a while it was just deprecated). There may still be some cases where +Perl behaves differently. .P 15. Perl, when in warning mode, gives warnings for character classes such as [A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no @@ -200,6 +198,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 28 July 2018 -Copyright (c) 1997-2018 University of Cambridge. +Last updated: 03 February 2019 +Copyright (c) 1997-2019 University of Cambridge. .fi |