summaryrefslogtreecommitdiff
path: root/doc/html
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2019-02-12 17:50:19 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2019-02-12 17:50:19 +0000
commit4e01f37e73ba7afa29fbfbe45a5f923efb0a1c68 (patch)
tree2f92e9bdf9f05dbe278c16ae6162b5dd725f2749 /doc/html
parent5a5285b1066d191d22eb858cbc9862b6e044ca9e (diff)
downloadpcre2-4e01f37e73ba7afa29fbfbe45a5f923efb0a1c68.tar.gz
Implement PCRE2_EXTRA_ALT_BSUX to support ECMAscript 6's \u{hhh..} syntax.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@1070 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html')
-rw-r--r--doc/html/pcre2_compile.html7
-rw-r--r--doc/html/pcre2_set_compile_extra_options.html3
-rw-r--r--doc/html/pcre2api.html43
-rw-r--r--doc/html/pcre2compat.html7
-rw-r--r--doc/html/pcre2pattern.html68
-rw-r--r--doc/html/pcre2syntax.html29
-rw-r--r--doc/html/pcre2test.html3
7 files changed, 102 insertions, 58 deletions
diff --git a/doc/html/pcre2_compile.html b/doc/html/pcre2_compile.html
index d109eeb..23f75e1 100644
--- a/doc/html/pcre2_compile.html
+++ b/doc/html/pcre2_compile.html
@@ -86,7 +86,12 @@ PCRE2 must be built with Unicode support (the default) in order to use
PCRE2_UTF, PCRE2_UCP and related options.
</P>
<P>
-The yield of the function is a pointer to a private data structure that
+Additional options may be set in the compile context via the
+<a href="pcre2_set_compile_extra_options.html"><b>pcre2_set_compile_extra_options</b></a>
+function.
+</P>
+<P>
+The yield of this function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected.
</P>
<P>
diff --git a/doc/html/pcre2_set_compile_extra_options.html b/doc/html/pcre2_set_compile_extra_options.html
index 336852a..4e342cf 100644
--- a/doc/html/pcre2_set_compile_extra_options.html
+++ b/doc/html/pcre2_set_compile_extra_options.html
@@ -20,7 +20,7 @@ SYNOPSIS
</P>
<P>
<b>int pcre2_set_compile_extra_options(pcre2_compile_context *<i>ccontext</i>,</b>
-<b> PCRE2_SIZE <i>extra_options</i>);</b>
+<b> uint32_t <i>extra_options</i>);</b>
</P>
<br><b>
DESCRIPTION
@@ -31,6 +31,7 @@ housed in a compile context. It completely replaces all the bits. The extra
options are:
<pre>
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
+ PCRE2_EXTRA_ALT_BSUX Extended alternate \u, \U, and \x handling
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as a literal following character
PCRE2_EXTRA_ESCAPED_CR_IS_LF Interpret \r as \n
PCRE2_EXTRA_MATCH_LINE Pattern matches whole lines
diff --git a/doc/html/pcre2api.html b/doc/html/pcre2api.html
index bbfaeaa..c045ff5 100644
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@@ -1298,7 +1298,7 @@ are needed. The <b>pcre2_code_copy_with_tables()</b> provides this facility.
Copies of both the code and the tables are made, with the new code pointing to
the new tables. The memory for the new tables is automatically freed when
<b>pcre2_code_free()</b> is called for the new copy of the compiled code. If
-<b>pcre2_code_copy_withy_tables()</b> is called with a NULL argument, it returns
+<b>pcre2_code_copy_with_tables()</b> is called with a NULL argument, it returns
NULL.
</P>
<P>
@@ -1315,7 +1315,7 @@ PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
</P>
<P>
The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
-settings that affect the compilation. It should be zero if no options are
+settings that affect the compilation. It should be zero if none of them are
required. The available options are described below. Some of them (in
particular, those that are compatible with Perl, but some others as well) can
also be set and unset from within the pattern (see the detailed description in
@@ -1330,8 +1330,9 @@ compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK
options can be set at the time of matching as well as at compile time.
</P>
<P>
-Other, less frequently required compile-time parameters (for example, the
-newline setting) can be provided in a compile context (as described
+Some additional options and less frequently required compile-time parameters
+(for example, the newline setting) can be provided in a compile context (as
+described
<a href="#compilecontext">above).</a>
</P>
<P>
@@ -1384,7 +1385,13 @@ This code fragment shows a typical straightforward call to
&errorcode, /* for error code */
&erroffset, /* for error offset */
NULL); /* no compile context */
-</pre>
+
+</PRE>
+</P>
+<br><b>
+Main compile options
+</b><br>
+<P>
The following names for option bits are defined in the <b>pcre2.h</b> header
file:
<pre>
@@ -1424,6 +1431,14 @@ hexadecimal digits, in which case the hexadecimal number defines the code point
to match. By default, as in Perl, a hexadecimal number is always expected after
\x, but it may have zero, one, or two digits (so, for example, \xz matches a
binary zero character followed by z).
+</P>
+<P>
+ECMAscript 6 added additional functionality to \u. This can be accessed using
+the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options"
+<a href="#extracompileoptions">below).</a>
+Note that this alternative escape handling applies only to patterns. Neither of
+these options affects the processing of replacement strings passed to
+<b>pcre2_substitute()</b>.
<pre>
PCRE2_ALT_CIRCUMFLEX
</pre>
@@ -1830,9 +1845,8 @@ characters with code points greater than 127.
Extra compile options
</b><br>
<P>
-Unlike the main compile-time options, the extra options are not saved with the
-compiled pattern. The option bits that can be set in a compile context by
-calling the <b>pcre2_set_compile_extra_options()</b> function are as follows:
+The option bits that can be set in a compile context by calling the
+<b>pcre2_set_compile_extra_options()</b> function are as follows:
<pre>
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
</pre>
@@ -1858,6 +1872,14 @@ point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
incorporated in the compiled pattern. However, they can only match subject
characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
<pre>
+ PCRE2_EXTRA_ALT_BSUX
+</pre>
+The original option PCRE2_ALT_BSUX causes PCRE2 to process \U, \u, and \x in
+the way that ECMAscript (aka JavaScript) does. Additional functionality was
+defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of
+PCRE2_ALT_BSUX, but in addition it recognizes \u{hhh..} as a hexadecimal
+character code, where hhh.. is any number of hexadecimal digits.
+<pre>
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
</pre>
This is a dangerous option. Use with care. By default, an unrecognized escape
@@ -3382,7 +3404,8 @@ capture groups and letters within \Q...\E quoted sequences.
<P>
Note that case forcing sequences such as \U...\E do not nest. For example,
the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final \E has no
-effect.
+effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do
+not apply to not apply to replacement strings.
</P>
<P>
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
@@ -3784,7 +3807,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 04 February 2019
+Last updated: 12 February 2019
<br>
Copyright &copy; 1997-2019 University of Cambridge.
<br>
diff --git a/doc/html/pcre2compat.html b/doc/html/pcre2compat.html
index f110e33..7d728f5 100644
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@@ -47,8 +47,9 @@ non-newline character, and \N{U+dd..}, matching a Unicode code point, are
supported. The escapes that modify the case of following letters are
implemented by Perl's general string-handling and are not part of its pattern
matching engine. If any of these are encountered by PCRE2, an error is
-generated by default. However, if the PCRE2_ALT_BSUX option is set, \U and \u
-are interpreted as ECMAScript interprets them.
+generated by default. However, if either of the PCRE2_ALT_BSUX or
+PCRE2_EXTRA_ALT_BSUX options is set, \U and \u are interpreted as ECMAScript
+interprets them.
</P>
<P>
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
@@ -233,7 +234,7 @@ Cambridge, England.
REVISION
</b><br>
<P>
-Last updated: 03 February 2019
+Last updated: 12 February 2019
<br>
Copyright &copy; 1997-2019 University of Cambridge.
<br>
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html
index d57dcea..d69e6cb 100644
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@@ -399,12 +399,33 @@ environment, these escapes are as follows:
\xhh character with hex code hh
\x{hhh..} character with hex code hhh..
\N{U+hhh..} character with Unicode hex code point hhh..
- \uhhhh character with hex code hhhh (when PCRE2_ALT_BSUX is set)
</pre>
-There are some legacy applications where the escape sequence \r is expected to
-match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
-pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
-(carriage return) character.
+By default, after \x that is not followed by {, from zero to two hexadecimal
+digits are read (letters can be in upper or lower case). Any number of
+hexadecimal digits may appear between \x{ and }. If a character other than a
+hexadecimal digit appears between \x{ and }, or if there is no terminating },
+an error occurs.
+</P>
+<P>
+Characters whose code points are less than 256 can be defined by either of the
+two syntaxes for \x or by an octal sequence. There is no difference in the way
+they are handled. For example, \xdc is exactly the same as \x{dc} or \334.
+However, using the braced versions does make such sequences easier to read.
+</P>
+<P>
+Support is available for some ECMAScript (aka JavaScript) escape sequences via
+two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \x followed
+by { is not recognized. Only if \x is followed by two hexadecimal digits is it
+recognized as a character escape. Otherwise it is interpreted as a literal "x"
+character. In this mode, support for code points greater than 256 is provided
+by \u, which must be followed by four hexadecimal digits; otherwise it is
+interpreted as a literal "u" character.
+</P>
+<P>
+PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition,
+\u{hhh..} is recognized as the character specified by hexadecimal code point.
+There may be any number of hexadecimal digits. This syntax is from ECMAScript
+6.
</P>
<P>
The \N{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
@@ -414,6 +435,12 @@ Note that when \N is not followed by an opening brace (curly bracket) it has
an entirely different meaning, matching any character that is not a newline.
</P>
<P>
+There are some legacy applications where the escape sequence \r is expected to
+match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
+pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
+(carriage return) character.
+</P>
+<P>
The precise effect of \cx on ASCII characters is as follows: if x is a lower
case letter, it is converted to upper case. Then bit 6 of the character (hex
40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
@@ -500,28 +527,6 @@ Note that octal values of 100 or greater that are specified using this syntax
must not be introduced by a leading zero, because no more than three octal
digits are ever read.
</P>
-<P>
-By default, after \x that is not followed by {, from zero to two hexadecimal
-digits are read (letters can be in upper or lower case). Any number of
-hexadecimal digits may appear between \x{ and }. If a character other than
-a hexadecimal digit appears between \x{ and }, or if there is no terminating
-}, an error occurs.
-</P>
-<P>
-If the PCRE2_ALT_BSUX option is set, the interpretation of \x is as just
-described only when it is followed by two hexadecimal digits. Otherwise, it
-matches a literal "x" character. In this mode, support for code points greater
-than 256 is provided by \u, which must be followed by four hexadecimal digits;
-otherwise it matches a literal "u" character. This syntax makes PCRE2 behave
-like ECMAscript (aka JavaScript). Code points greater than 0xFFFF are not
-supported.
-</P>
-<P>
-Characters whose value is less than 256 can be defined by either of the two
-syntaxes for \x (or by \u in PCRE2_ALT_BSUX mode). There is no difference in
-the way they are handled. For example, \xdc is exactly the same as \x{dc} (or
-\u00dc in PCRE2_ALT_BSUX mode).
-</P>
<br><b>
Constraints on character values
</b><br>
@@ -560,9 +565,10 @@ Unsupported escape sequences
<P>
In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
handler and used to modify the case of following characters. By default, PCRE2
-does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
-is set, \U matches a "U" character, and \u can be used to define a character
-by code point, as described above.
+does not support these escape sequences in patterns. However, if either of the
+PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \U matches a "U"
+character, and \u can be used to define a character by code point, as
+described above.
</P>
<br><b>
Absolute and relative backreferences
@@ -3721,7 +3727,7 @@ Cambridge, England.
</P>
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 04 February 2019
+Last updated: 12 February 2019
<br>
Copyright &copy; 1997-2019 University of Cambridge.
<br>
diff --git a/doc/html/pcre2syntax.html b/doc/html/pcre2syntax.html
index 73da500..5022e12 100644
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@@ -58,7 +58,8 @@ documentation. This document contains a quick-reference summary of the syntax.
</P>
<br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
<P>
-This table applies to ASCII and Unicode environments.
+This table applies to ASCII and Unicode environments. An unrecognized escape
+sequence causes an error.
<pre>
\a alarm, that is, the BEL character (hex 07)
\cx "control-x", where x is any ASCII printing character
@@ -70,12 +71,25 @@ This table applies to ASCII and Unicode environments.
\0dd character with octal code 0dd
\ddd character with octal code ddd, or backreference
\o{ddd..} character with octal code ddd..
- \U "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
\N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
- \uhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set)
\xhh character with hex code hh
\x{hh..} character with hex code hh..
</pre>
+If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
+following are also recognized:
+<pre>
+ \U the character "U"
+ \uhhhh character with hex code hhhh
+ \u{hh..} character with hex code hh.. but only for EXTRA_ALT_BSUX
+</pre>
+When \x is not followed by {, from zero to two hexadecimal digits are read,
+but in ALT_BSUX mode \x must be followed by two hexadecimal digits to be
+recognized as a hexadecimal escape; otherwise it matches a literal "x".
+Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits
+or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
+matches a literal "u".
+</P>
+<P>
Note that \0dd is always an octal code. The treatment of backslash followed by
a non-zero digit is complicated; for details see the section
<a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
@@ -86,13 +100,6 @@ also given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not
supported in EBCDIC environments. Note that \N not followed by an opening
curly bracket has a different meaning (see below).
</P>
-<P>
-When \x is not followed by {, from zero to two hexadecimal digits are read,
-but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadecimal digits to
-be recognized as a hexadecimal escape; otherwise it matches a literal "x".
-Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits,
-it matches a literal "u".
-</P>
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
<P>
<pre>
@@ -660,7 +667,7 @@ Cambridge, England.
</P>
<br><a name="SEC28" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 03 February 2019
+Last updated: 11 February 2019
<br>
Copyright &copy; 1997-2019 University of Cambridge.
<br>
diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html
index db5001f..1eb1553 100644
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@@ -609,6 +609,7 @@ for a description of the effects of these options.
escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
/x extended set PCRE2_EXTENDED
/xx extended_more set PCRE2_EXTENDED_MORE
+ extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX
firstline set PCRE2_FIRSTLINE
literal set PCRE2_LITERAL
match_line set PCRE2_EXTRA_MATCH_LINE
@@ -2075,7 +2076,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 03 February 2019
+Last updated: 11 February 2019
<br>
Copyright &copy; 1997-2019 University of Cambridge.
<br>