summaryrefslogtreecommitdiff
path: root/doc/html/pcre2compat.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html/pcre2compat.html')
-rw-r--r--doc/html/pcre2compat.html77
1 files changed, 42 insertions, 35 deletions
diff --git a/doc/html/pcre2compat.html b/doc/html/pcre2compat.html
index 0c72a9a..5238918 100644
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@@ -16,10 +16,10 @@ please consult the man page, in case the conversion went wrong.
DIFFERENCES BETWEEN PCRE2 AND PERL
</b><br>
<P>
-This document describes the differences in the ways that PCRE2 and Perl handle
-regular expressions. The differences described here are with respect to Perl
-versions 5.26, but as both Perl and PCRE2 are continually changing, the
-information may sometimes be out of date.
+This document describes some of the differences in the ways that PCRE2 and Perl
+handle regular expressions. The differences described here are with respect to
+Perl version 5.32.0, but as both Perl and PCRE2 are continually changing, the
+information may at times be out of date.
</P>
<P>
1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does
@@ -33,12 +33,15 @@ they do not mean what you might think. For example, (?!a){3} does not assert
that the next three characters are not "a". It just asserts that the next
character is not "a" three times (in principle; PCRE2 optimizes this to run the
assertion just once). Perl allows some repeat quantifiers on other assertions,
-for example, \b* (but not \b{3}), but these do not seem to have any use.
+for example, \b* (but not \b{3}, though oddly it does allow ^{3}), but these
+do not seem to have any use. PCRE2 does not allow any kind of quantifier on
+non-lookaround assertions.
</P>
<P>
3. Capture groups that occur inside negative lookaround assertions are counted,
but their entries in the offsets vector are set only when a negative assertion
-is a condition that has a matching branch (that is, the condition is false).
+is a condition that has a matching branch (that is, the condition is false).
+Perl may set such capture groups in other circumstances.
</P>
<P>
4. The following Perl escape sequences are not supported: \F, \l, \L, \u,
@@ -56,10 +59,12 @@ interprets them.
built with Unicode support (the default). The properties that can be tested
with \p and \P are limited to the general category properties such as Lu and
Nd, script names such as Greek or Han, and the derived properties Any and L&.
-PCRE2 does support the Cs (surrogate) property, which Perl does not; the Perl
-documentation says "Because Perl hides the need for the user to understand the
-internal representation of Unicode characters, there is no need to implement
-the somewhat messy concept of surrogates."
+Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use
+is limited. See the
+<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
+documentation for details. The long synonyms for property names that Perl
+supports (such as \p{Letter}) are not supported by PCRE2, nor is it permitted
+to prefix any of these properties with "Is".
</P>
<P>
6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
@@ -79,7 +84,8 @@ other character. Note the following examples:
\QA\B\E A\B A\B
\Q\\E \ \\E
</pre>
-The \Q...\E sequence is recognized both inside and outside character classes.
+The \Q...\E sequence is recognized both inside and outside character classes
+by both PCRE2 and Perl.
</P>
<P>
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
@@ -94,13 +100,13 @@ to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
into subroutine calls is now supported, as in Perl.
</P>
<P>
-9. If any of the backtracking control verbs are used in a group that is called
-as a subroutine (whether or not recursively), their effect is confined to that
-group; it does not extend to the surrounding pattern. This is not always the
-case in Perl. In particular, if (*THEN) is present in a group that is called as
-a subroutine, its action is limited to that group, even if the group does not
-contain any | characters. Note that such groups are processed as anchored
-at the point where they are tested.
+9. In PCRE2, if any of the backtracking control verbs are used in a group that
+is called as a subroutine (whether or not recursively), their effect is
+confined to that group; it does not extend to the surrounding pattern. This is
+not always the case in Perl. In particular, if (*THEN) is present in a group
+that is called as a subroutine, its action is limited to that group, even if
+the group does not contain any | characters. Note that such groups are
+processed as anchored at the point where they are tested.
</P>
<P>
10. If a pattern contains more than one backtracking control verb, the first
@@ -110,55 +116,56 @@ triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
same as PCRE2, but there are cases where it differs.
</P>
<P>
-11. Most backtracking verbs in assertions have their normal actions. They are
-not confined to the assertion.
-</P>
-<P>
-12. There are some differences that are concerned with the settings of captured
+11. There are some differences that are concerned with the settings of captured
strings when part of a pattern is repeated. For example, matching "aba" against
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
"b".
</P>
<P>
-13. PCRE2's handling of duplicate capture group numbers and names is not as
+12. PCRE2's handling of duplicate capture group numbers and names is not as
general as Perl's. This is a consequence of the fact the PCRE2 works internally
just with numbers, using an external table to translate between numbers and
-names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B), where the two
+names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B)), where the two
capture groups have the same number but different names, is not supported, and
causes an error at compile time. If it were allowed, it would not be possible
to distinguish which group matched, because both names map to capture group
number 1. To avoid this confusing situation, an error is given at compile time.
</P>
<P>
-14. Perl used to recognize comments in some places that PCRE2 does not, for
+13. Perl used to recognize comments in some places that PCRE2 does not, for
example, between the ( and ? at the start of a group. If the /x modifier is
set, Perl allowed white space between ( and ? though the latest Perls give an
error (for a while it was just deprecated). There may still be some cases where
Perl behaves differently.
</P>
<P>
-15. Perl, when in warning mode, gives warnings for character classes such as
+14. Perl, when in warning mode, gives warnings for character classes such as
[A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
warning features, so it gives an error in these cases because they are almost
certainly user mistakes.
</P>
<P>
-16. In PCRE2, the upper/lower case character properties Lu and Ll are not
+15. In PCRE2, the upper/lower case character properties Lu and Ll are not
affected when case-independent matching is specified. For example, \p{Lu}
always matches an upper case letter. I think Perl has changed in this respect;
-in the release at the time of writing (5.24), \p{Lu} and \p{Ll} match all
+in the release at the time of writing (5.32), \p{Lu} and \p{Ll} match all
letters, regardless of case, when case independence is specified.
</P>
<P>
+16. From release 5.32.0, Perl locks out the use of \K in lookaround
+assertions. In PCRE2, \K is acted on when it occurs in positive assertions,
+but is ignored in negative assertions.
+</P>
+<P>
17. PCRE2 provides some extensions to the Perl regular expression facilities.
-Perl 5.10 includes new features that are not in earlier versions of Perl, some
+Perl 5.10 included new features that were not in earlier versions of Perl, some
of which (such as named parentheses) were in PCRE2 for some time before. This
-list is with respect to Perl 5.26:
+list is with respect to Perl 5.32:
<br>
<br>
(a) Although lookbehind assertions in PCRE2 must match fixed length strings,
-each alternative branch of a lookbehind assertion can match a different length
-of string. Perl requires them all to have the same length.
+each alternative toplevel branch of a lookbehind assertion can match a
+different length of string. Perl requires them all to have the same length.
<br>
<br>
(b) From PCRE2 10.23, backreferences to groups of fixed length are supported
@@ -203,7 +210,7 @@ different way and is not Perl-compatible.
<br>
<br>
(l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
-the start of a pattern that set overall options that cannot be changed within
+the start of a pattern. These set overall options that cannot be changed within
the pattern.
<br>
<br>
@@ -239,7 +246,7 @@ Cambridge, England.
REVISION
</b><br>
<P>
-Last updated: 13 July 2019
+Last updated: 06 October 2020
<br>
Copyright &copy; 1997-2019 University of Cambridge.
<br>