diff options
author | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2017-03-31 16:49:33 +0000 |
---|---|---|
committer | ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069> | 2017-03-31 16:49:33 +0000 |
commit | 0b5cd88c0756b6b9101feff0d580dba70afadc18 (patch) | |
tree | 74c0b1ea9c09df7052d4bdf263c52fc2bf39aeb9 /doc/html/pcre2compat.html | |
parent | 88703fe34a96e29afa7ce63472da4f6ba83a84b2 (diff) | |
download | pcre2-0b5cd88c0756b6b9101feff0d580dba70afadc18.tar.gz |
Documentation update
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@722 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html/pcre2compat.html')
-rw-r--r-- | doc/html/pcre2compat.html | 64 |
1 files changed, 29 insertions, 35 deletions
diff --git a/doc/html/pcre2compat.html b/doc/html/pcre2compat.html index 993dfd1..b55ab82 100644 --- a/doc/html/pcre2compat.html +++ b/doc/html/pcre2compat.html @@ -18,7 +18,8 @@ DIFFERENCES BETWEEN PCRE2 AND PERL <P> This document describes the differences in the ways that PCRE2 and Perl handle regular expressions. The differences described here are with respect to Perl -versions 5.10 and above. +versions 5.24, but as both Perl and PCRE2 are continually changing, the +information may sometimes be out of date. </P> <P> 1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does @@ -27,17 +28,18 @@ have are given in the page. </P> <P> -2. PCRE2 allows repeat quantifiers only on parenthesized assertions, but they -do not mean what you might think. For example, (?!a){3} does not assert that -the next three characters are not "a". It just asserts that the next character -is not "a" three times (in principle: PCRE2 optimizes this to run the assertion -just once). Perl allows repeat quantifiers on other assertions such as \b, but -these do not seem to have any use. +2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but +they do not mean what you might think. For example, (?!a){3} does not assert +that the next three characters are not "a". It just asserts that the next +character is not "a" three times (in principle: PCRE2 optimizes this to run the +assertion just once). Perl allows some repeat quantifiers on other assertions, +for example, \b* (but not \b{3}), but these do not seem to have any use. </P> <P> -3. Capturing subpatterns that occur inside negative lookahead assertions are -counted, but their entries in the offsets vector are never set. Perl sometimes -(but not always) sets its numerical variables from inside negative assertions. +3. Capturing subpatterns that occur inside negative lookaround assertions are +counted, but their entries in the offsets vector are set only if the assertion +is a condition. Perl has changed its behaviour in this regard from time to +time. </P> <P> 4. The following Perl escape sequences are not supported: \l, \u, \L, @@ -50,13 +52,13 @@ generated by default. However, if the PCRE2_ALT_BSUX option is set, </P> <P> 5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is -built with Unicode support. The properties that can be tested with \p and \P -are limited to the general category properties such as Lu and Nd, script names -such as Greek or Han, and the derived properties Any and L&. PCRE2 does support -the Cs (surrogate) property, which Perl does not; the Perl documentation says -"Because Perl hides the need for the user to understand the internal -representation of Unicode characters, there is no need to implement the -somewhat messy concept of surrogates." +built with Unicode support (the default). The properties that can be tested +with \p and \P are limited to the general category properties such as Lu and +Nd, script names such as Greek or Han, and the derived properties Any and L&. +PCRE2 does support the Cs (surrogate) property, which Perl does not; the Perl +documentation says "Because Perl hides the need for the user to understand the +internal representation of Unicode characters, there is no need to implement +the somewhat messy concept of surrogates." </P> <P> 6. PCRE2 does support the \Q...\E escape for quoting substrings. Characters @@ -75,23 +77,15 @@ The \Q...\E sequence is recognized both inside and outside character classes. </P> <P> 7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code}) -constructions. However, there is support for recursive patterns. This is not -available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE2 "callout" -feature allows an external function to be called during pattern matching. See -the +constructions. However, there is support PCRE2's "callout" feature, which +allows an external function to be called during pattern matching. See the <a href="pcre2callout.html"><b>pcre2callout</b></a> documentation for details. </P> <P> -8. Subroutine calls (whether recursive or not) are treated as atomic groups. -Atomic recursion is like Python, but unlike Perl. Captured values that are set -outside a subroutine call can be referenced from inside in PCRE2, but not in -Perl. There is a discussion that explains these differences in more detail in -the -<a href="pcre2pattern.html#recursiondifference">section on recursion differences from Perl</a> -in the -<a href="pcre2pattern.html"><b>pcre2pattern</b></a> -page. +8. Subroutine calls (whether recursive or not) were treated as atomic groups up +to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking +into subroutine calls is now supported, as in Perl. </P> <P> 9. If any of the backtracking control verbs are used in a subpattern that is @@ -147,14 +141,14 @@ certainly user mistakes. 16. In PCRE2, the upper/lower case character properties Lu and Ll are not affected when case-independent matching is specified. For example, \p{Lu} always matches an upper case letter. I think Perl has changed in this respect; -in the release at the time of writing (5.16), \p{Lu} and \p{Ll} match all +in the release at the time of writing (5.24), \p{Lu} and \p{Ll} match all letters, regardless of case, when case independence is specified. </P> <P> 17. PCRE2 provides some extensions to the Perl regular expression facilities. Perl 5.10 includes new features that are not in earlier versions of Perl, some -of which (such as named parentheses) have been in PCRE2 for some time. This -list is with respect to Perl 5.10: +of which (such as named parentheses) were in PCRE2 for some time before. This +list is with respect to Perl 5.24: <br> <br> (a) Although lookbehind assertions in PCRE2 must match fixed length strings, @@ -220,9 +214,9 @@ Cambridge, England. REVISION </b><br> <P> -Last updated: 18 October 2016 +Last updated: 29 March 2017 <br> -Copyright © 1997-2016 University of Cambridge. +Copyright © 1997-2017 University of Cambridge. <br> <p> Return to the <a href="index.html">PCRE2 index page</a>. |