diff options
Diffstat (limited to 'doc/html/pcre2pattern.html')
-rw-r--r-- | doc/html/pcre2pattern.html | 46 |
1 files changed, 30 insertions, 16 deletions
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html index ec2e8c9..b2adf98 100644 --- a/doc/html/pcre2pattern.html +++ b/doc/html/pcre2pattern.html @@ -289,8 +289,11 @@ corresponding characters in the subject. As a trivial example, the pattern The quick brown fox </pre> matches a portion of a subject string that is identical to itself. When -caseless matching is specified (the PCRE2_CASELESS option), letters are matched -independently of case. +caseless matching is specified (the PCRE2_CASELESS option or (?i) within the +pattern), letters are matched independently of case. Note that there are two +ASCII characters, K and S, that, in addition to their lower case ASCII +equivalents, are case-equivalent with Unicode U+212A (Kelvin sign) and U+017F +(long S) respectively when either PCRE2_UTF or PCRE2_UCP is set. </P> <P> The power of regular expressions comes from the ability to include wild cards, @@ -326,6 +329,20 @@ a character class the only metacharacters are: [ POSIX character class (if followed by POSIX syntax) ] terminates the character class </pre> +If a pattern is compiled with the PCRE2_EXTENDED option, most white space in +the pattern, other than in a character class, and characters between a # +outside a character class and the next newline, inclusive, are ignored. An +escaping backslash can be used to include a white space or a # character as +part of the pattern. If the PCRE2_EXTENDED_MORE option is set, the same +applies, but in addition unescaped space and horizontal tab characters are +ignored inside a character class. Note: only these two characters are ignored, +not the full set of pattern white space characters that are ignored outside a +character class. Option settings can be changed within a pattern; see the +section entitled +<a href="#internaloptions">"Internal Option Setting"</a> +below. +</P> +<P> The following sections describe the use of each of the metacharacters. </P> <br><a name="SEC5" href="#TOC1">BACKSLASH</a><br> @@ -343,16 +360,9 @@ precede a non-alphanumeric with backslash to specify that it stands for itself. In particular, if you want to match a backslash, you write \\. </P> <P> -In a UTF mode, only ASCII digits and letters have any special meaning after a -backslash. All other characters (in particular, those whose code points are -greater than 127) are treated as literals. -</P> -<P> -If a pattern is compiled with the PCRE2_EXTENDED option, most white space in -the pattern (other than in a character class), and characters between a # -outside a character class and the next newline, inclusive, are ignored. An -escaping backslash can be used to include a white space or # character as part -of the pattern. +Only ASCII digits and letters have any special meaning after a backslash. All +other characters (in particular, those whose code points are greater than 127) +are treated as literals. </P> <P> If you want to treat all characters in a sequence as literals, you can do so by @@ -1165,8 +1175,9 @@ For example, when the pattern matches "foobar", the first substring is still set to "foo". </P> <P> -Perl documents that the use of \K within assertions is "not well defined". In -PCRE2, \K is acted upon when it occurs inside positive assertions, but is +Perl used to document that the use of \K within lookaround assertions is "not +well defined", but from version 5.32.0 Perl does not support this usage at all. +In PCRE2, \K is acted upon when it occurs inside positive assertions, but is ignored in negative assertions. Note that when a pattern such as (?=ab\K) matches, the reported start of the match can be greater than the end of the match. Using \K in a lookbehind assertion at the start of a pattern can also @@ -1443,7 +1454,10 @@ Characters in a class may be specified by their code points using \o, \x, or \N{U+hh..} in the usual way. When caseless matching is set, any letters in a class represent both their upper case and lower case versions, so for example, a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not -match "A", whereas a caseful version would. +match "A", whereas a caseful version would. Note that there are two ASCII +characters, K and S, that, in addition to their lower case ASCII equivalents, +are case-equivalent with Unicode U+212A (Kelvin sign) and U+017F (long S) +respectively when either PCRE2_UTF or PCRE2_UCP is set. </P> <P> Characters that might indicate line breaks are never treated in any special way @@ -3838,7 +3852,7 @@ Cambridge, England. </P> <br><a name="SEC32" href="#TOC1">REVISION</a><br> <P> -Last updated: 24 February 2020 +Last updated: 06 October 2020 <br> Copyright © 1997-2020 University of Cambridge. <br> |