summaryrefslogtreecommitdiff
path: root/doc/html/pcre2pattern.html
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2019-02-12 17:50:19 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2019-02-12 17:50:19 +0000
commit4e01f37e73ba7afa29fbfbe45a5f923efb0a1c68 (patch)
tree2f92e9bdf9f05dbe278c16ae6162b5dd725f2749 /doc/html/pcre2pattern.html
parent5a5285b1066d191d22eb858cbc9862b6e044ca9e (diff)
downloadpcre2-4e01f37e73ba7afa29fbfbe45a5f923efb0a1c68.tar.gz
Implement PCRE2_EXTRA_ALT_BSUX to support ECMAscript 6's \u{hhh..} syntax.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1070 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html/pcre2pattern.html')
-rw-r--r--doc/html/pcre2pattern.html68
1 files changed, 37 insertions, 31 deletions
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html
index d57dcea..d69e6cb 100644
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@@ -399,12 +399,33 @@ environment, these escapes are as follows:
\xhh character with hex code hh
\x{hhh..} character with hex code hhh..
\N{U+hhh..} character with Unicode hex code point hhh..
- \uhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set)
</pre>
-There are some legacy applications where the escape sequence \r is expected to
-match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
-pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
-(carriage return) character.
+By default, after \x that is not followed by {, from zero to two hexadecimal
+digits are read (letters can be in upper or lower case). Any number of
+hexadecimal digits may appear between \x{ and }. If a character other than a
+hexadecimal digit appears between \x{ and }, or if there is no terminating },
+an error occurs.
+</P>
+<P>
+Characters whose code points are less than 256 can be defined by either of the
+two syntaxes for \x or by an octal sequence. There is no difference in the way
+they are handled. For example, \xdc is exactly the same as \x{dc} or \334.
+However, using the braced versions does make such sequences easier to read.
+</P>
+<P>
+Support is available for some ECMAScript (aka JavaScript) escape sequences via
+two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \x followed
+by { is not recognized. Only if \x is followed by two hexadecimal digits is it
+recognized as a character escape. Otherwise it is interpreted as a literal "x"
+character. In this mode, support for code points greater than 256 is provided
+by \u, which must be followed by four hexadecimal digits; otherwise it is
+interpreted as a literal "u" character.
+</P>
+<P>
+PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition,
+\u{hhh..} is recognized as the character specified by hexadecimal code point.
+There may be any number of hexadecimal digits. This syntax is from ECMAScript
+6.
</P>
<P>
The \N{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
@@ -414,6 +435,12 @@ Note that when \N is not followed by an opening brace (curly bracket) it has
an entirely different meaning, matching any character that is not a newline.
</P>
<P>
+There are some legacy applications where the escape sequence \r is expected to
+match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
+pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
+(carriage return) character.
+</P>
+<P>
The precise effect of \cx on ASCII characters is as follows: if x is a lower
case letter, it is converted to upper case. Then bit 6 of the character (hex
40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
@@ -500,28 +527,6 @@ Note that octal values of 100 or greater that are specified using this syntax
must not be introduced by a leading zero, because no more than three octal
digits are ever read.
</P>
-<P>
-By default, after \x that is not followed by {, from zero to two hexadecimal
-digits are read (letters can be in upper or lower case). Any number of
-hexadecimal digits may appear between \x{ and }. If a character other than
-a hexadecimal digit appears between \x{ and }, or if there is no terminating
-}, an error occurs.
-</P>
-<P>
-If the PCRE2_ALT_BSUX option is set, the interpretation of \x is as just
-described only when it is followed by two hexadecimal digits. Otherwise, it
-matches a literal "x" character. In this mode, support for code points greater
-than 256 is provided by \u, which must be followed by four hexadecimal digits;
-otherwise it matches a literal "u" character. This syntax makes PCRE2 behave
-like ECMAscript (aka JavaScript). Code points greater than 0xFFFF are not
-supported.
-</P>
-<P>
-Characters whose value is less than 256 can be defined by either of the two
-syntaxes for \x (or by \u in PCRE2_ALT_BSUX mode). There is no difference in
-the way they are handled. For example, \xdc is exactly the same as \x{dc} (or
-\u00dc in PCRE2_ALT_BSUX mode).
-</P>
<br><b>
Constraints on character values
</b><br>
@@ -560,9 +565,10 @@ Unsupported escape sequences
<P>
In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
handler and used to modify the case of following characters. By default, PCRE2
-does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
-is set, \U matches a "U" character, and \u can be used to define a character
-by code point, as described above.
+does not support these escape sequences in patterns. However, if either of the
+PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \U matches a "U"
+character, and \u can be used to define a character by code point, as
+described above.
</P>
<br><b>
Absolute and relative backreferences
@@ -3721,7 +3727,7 @@ Cambridge, England.
</P>
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 04 February 2019
+Last updated: 12 February 2019
<br>
Copyright &copy; 1997-2019 University of Cambridge.
<br>