summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2012-01-13 17:16:32 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2012-01-13 17:16:32 +0000
commit36aa4021b0390d7727d5e1b11aac2fc87765792a (patch)
tree12b276ca6ee48d05b3ba5185bba76e11e416476e
parent873f3e06c2aff02a618a67ecb4167bbec0bcfd58 (diff)
downloadpcre-36aa4021b0390d7727d5e1b11aac2fc87765792a.tar.gz
The last of the 16-bit documentation major updates.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@868 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r--Makefile.am4
-rw-r--r--doc/index.html.src8
-rw-r--r--doc/pcre_assign_jit_stack.314
-rw-r--r--doc/pcre_compile.320
-rw-r--r--doc/pcre_compile2.322
-rw-r--r--doc/pcre_config.311
-rw-r--r--doc/pcre_copy_named_substring.312
-rw-r--r--doc/pcre_copy_substring.310
-rw-r--r--doc/pcre_dfa_exec.320
-rw-r--r--doc/pcre_exec.313
-rw-r--r--doc/pcre_free_study.34
-rw-r--r--doc/pcre_free_substring.36
-rw-r--r--doc/pcre_free_substring_list.36
-rw-r--r--doc/pcre_fullinfo.316
-rw-r--r--doc/pcre_get_named_substring.321
-rw-r--r--doc/pcre_get_stringnumber.38
-rw-r--r--doc/pcre_get_stringtable_entries.36
-rw-r--r--doc/pcre_get_substring.319
-rw-r--r--doc/pcre_get_substring_list.320
-rw-r--r--doc/pcre_jit_stack_alloc.38
-rw-r--r--doc/pcre_jit_stack_free.36
-rw-r--r--doc/pcre_maketables.310
-rw-r--r--doc/pcre_pattern_to_host_byte_order.343
-rw-r--r--doc/pcre_refcount.32
-rw-r--r--doc/pcre_study.39
-rw-r--r--doc/pcre_utf16_to_host_byte_order.346
-rw-r--r--doc/pcre_version.39
-rw-r--r--doc/pcreunicode.3157
-rw-r--r--doc/perltest.txt23
29 files changed, 413 insertions, 140 deletions
diff --git a/Makefile.am b/Makefile.am
index fa88218..7f25bca 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -39,8 +39,10 @@ dist_html_DATA = \
doc/html/pcre_jit_stack_alloc.html \
doc/html/pcre_jit_stack_free.html \
doc/html/pcre_maketables.html \
+ doc/html/pcre_pattern_to_host_byte_order.html \
doc/html/pcre_refcount.html \
doc/html/pcre_study.html \
+ doc/html/pcre_utf16_to_host_byte_order.html \
doc/html/pcre_version.html \
doc/html/pcreapi.html \
doc/html/pcrebuild.html \
@@ -489,8 +491,10 @@ dist_man_MANS = \
doc/pcre_jit_stack_alloc.3 \
doc/pcre_jit_stack_free.3 \
doc/pcre_maketables.3 \
+ doc/pcre_pattern_to_host_byte_order.3 \
doc/pcre_refcount.3 \
doc/pcre_study.3 \
+ doc/pcre_utf16_to_host_byte_order.3 \
doc/pcre_version.3 \
doc/pcreapi.3 \
doc/pcrebuild.3 \
diff --git a/doc/index.html.src b/doc/index.html.src
index 0903f5d..20720df 100644
--- a/doc/index.html.src
+++ b/doc/index.html.src
@@ -87,7 +87,7 @@ The HTML documentation for PCRE comprises the following pages:
<p>
There are also individual pages that summarize the interface for each function
-in the library:
+in the library. There is a single page for each pair of 8-bit/16-bit functions.
</p>
<table>
@@ -154,12 +154,18 @@ in the library:
<tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
<td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
+<tr><td><a href="pcre_pattern_to_host_byte_order.html">pcre_pattern_to_host_byte_order</a></td>
+ <td>&nbsp;&nbsp;Convert compiled pattern to host byte order if necessary</td></tr>
+
<tr><td><a href="pcre_refcount.html">pcre_refcount</a></td>
<td>&nbsp;&nbsp;Maintain reference count in compiled pattern</td></tr>
<tr><td><a href="pcre_study.html">pcre_study</a></td>
<td>&nbsp;&nbsp;Study a compiled pattern</td></tr>
+<tr><td><a href="pcre_utf16_to_host_byte_order.html">pcre_utf16_to_host_byte_order</a></td>
+ <td>&nbsp;&nbsp;Convert UTF-16 string to host byte order if necessary</td></tr>
+
<tr><td><a href="pcre_version.html">pcre_version</a></td>
<td>&nbsp;&nbsp;Return PCRE version and release date</td></tr>
</table>
diff --git a/doc/pcre_assign_jit_stack.3 b/doc/pcre_assign_jit_stack.3
index a220c49..b5944a4 100644
--- a/doc/pcre_assign_jit_stack.3
+++ b/doc/pcre_assign_jit_stack.3
@@ -10,15 +10,19 @@ PCRE - Perl-compatible regular expressions
.B void pcre_assign_jit_stack(pcre_extra *\fIextra\fP,
.ti +5n
.B pcre_jit_callback \fIcallback\fP, void *\fIdata\fP);
+.PP
+.B void pcre16_assign_jit_stack(pcre16_extra *\fIextra\fP,
+.ti +5n
+.B pcre16_jit_callback \fIcallback\fP, void *\fIdata\fP);
.
.SH DESCRIPTION
.rs
.sp
This function provides control over the memory used as a stack at runtime by a
-call to \fBpcre_exec()\fP with a pattern that has been successfully compiled
-with JIT optimization. The arguments are:
+call to \fBpcre[16]_exec()\fP with a pattern that has been successfully
+compiled with JIT optimization. The arguments are:
.sp
- extra the data pointer returned by \fBpcre_study()\fP
+ extra the data pointer returned by \fBpcre[16]_study()\fP
callback a callback function
data a JIT stack or a value to be passed to the callback
function
@@ -27,12 +31,12 @@ If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32K block on
the machine stack is used.
.P
If \fIcallback\fP is NULL and \fIdata\fP is not NULL, \fIdata\fP must
-be a valid JIT stack, the result of calling \fBpcre_jit_stack_alloc()\fP.
+be a valid JIT stack, the result of calling \fBpcre[16]_jit_stack_alloc()\fP.
.P
If \fIcallback\fP not NULL, it is called with \fIdata\fP as an argument at
the start of matching, in order to set up a JIT stack. If the result is NULL,
the internal 32K stack is used; otherwise the return value must be a valid JIT
-stack, the result of calling \fBpcre_jit_stack_alloc()\fP.
+stack, the result of calling \fBpcre[16]_jit_stack_alloc()\fP.
.P
You may safely assign the same JIT stack to multiple patterns, as long as they
are all matched in the same thread. In a multithread application, each thread
diff --git a/doc/pcre_compile.3 b/doc/pcre_compile.3
index 03b3a32..0dec9e9 100644
--- a/doc/pcre_compile.3
+++ b/doc/pcre_compile.3
@@ -12,13 +12,19 @@ PCRE - Perl-compatible regular expressions
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
.ti +5n
.B const unsigned char *\fItableptr\fP);
+.PP
+.B pcre16 *pcre16_compile(PCRE_SPTR16 \fIpattern\fP, int \fIoptions\fP,
+.ti +5n
+.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
+.ti +5n
+.B const unsigned char *\fItableptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This function compiles a regular expression into an internal form. It is the
-same as \fBpcre_compile2()\fP, except for the absence of the \fIerrorcodeptr\fP
-argument. Its arguments are:
+same as \fBpcre[16]_compile2()\fP, except for the absence of the
+\fIerrorcodeptr\fP argument. Its arguments are:
.sp
\fIpattern\fP A zero-terminated string containing the
regular expression to be compiled
@@ -52,15 +58,19 @@ The option bits are:
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
+ PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16
+ validity (only relevant if
+ PCRE_UTF16 is set)
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
validity (only relevant if
PCRE_UTF8 is set)
PCRE_UCP Use Unicode properties for \ed, \ew, etc.
PCRE_UNGREEDY Invert greediness of quantifiers
- PCRE_UTF8 Run in UTF-8 mode
+ PCRE_UTF16 Run in \fBpcre16_compile()\fP UTF-16 mode
+ PCRE_UTF8 Run in \fBpcre_compile()\fP UTF-8 mode
.sp
-PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
-PCRE_NO_UTF8_CHECK, and with UCP support if PCRE_UCP is used.
+PCRE must be built with UTF support in order to use PCRE_UTF8/16 and
+PCRE_NO_UTF8/16_CHECK, and with UCP support if PCRE_UCP is used.
.P
The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected. Note that
diff --git a/doc/pcre_compile2.3 b/doc/pcre_compile2.3
index f0c5c3d..99f3872 100644
--- a/doc/pcre_compile2.3
+++ b/doc/pcre_compile2.3
@@ -14,13 +14,21 @@ PCRE - Perl-compatible regular expressions
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
.ti +5n
.B const unsigned char *\fItableptr\fP);
+.PP
+.B pcre16 *pcre16_compile2(PCRE_SPTR16 \fIpattern\fP, int \fIoptions\fP,
+.ti +5n
+.B int *\fIerrorcodeptr\fP,
+.ti +5n
+.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
+.ti +5n
+.B const unsigned char *\fItableptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This function compiles a regular expression into an internal form. It is the
-same as \fBpcre_compile()\fP, except for the addition of the \fIerrorcodeptr\fP
-argument. The arguments are:
+same as \fBpcre[16]_compile()\fP, except for the addition of the
+\fIerrorcodeptr\fP argument. The arguments are:
.
.sp
\fIpattern\fP A zero-terminated string containing the
@@ -56,15 +64,19 @@ The option bits are:
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
+ PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16
+ validity (only relevant if
+ PCRE_UTF16 is set)
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
validity (only relevant if
PCRE_UTF8 is set)
PCRE_UCP Use Unicode properties for \ed, \ew, etc.
PCRE_UNGREEDY Invert greediness of quantifiers
- PCRE_UTF8 Run in UTF-8 mode
+ PCRE_UTF16 Run \fBpcre16_compile()\fP in UTF-16 mode
+ PCRE_UTF8 Run \fBpcre_compile()\fP in UTF-8 mode
.sp
-PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
-PCRE_NO_UTF8_CHECK, and with UCP support if PCRE_UCP is used.
+PCRE must be built with UTF support in order to use PCRE_UTF8/16 and
+PCRE_NO_UTF8/16_CHECK, and with UCP support if PCRE_UCP is used.
.P
The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected. Note that
diff --git a/doc/pcre_config.3 b/doc/pcre_config.3
index d0846e1..e157b02 100644
--- a/doc/pcre_config.3
+++ b/doc/pcre_config.3
@@ -8,6 +8,8 @@ PCRE - Perl-compatible regular expressions
.PP
.SM
.B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP);
+.PP
+.B int pcre16_config(int \fIwhat\fP, void *\fIwhere\fP);
.
.SH DESCRIPTION
.rs
@@ -42,12 +44,17 @@ point to an unsigned long integer. The available codes are:
Threshold of return slots, above which
\fBmalloc()\fP is used by the POSIX API
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
- PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
+ PCRE_CONFIG_UTF16 Availability of UTF-16 support (1=yes
+ 0=no); option for \fBpcre16_config()\fP
+ PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no);
+ option for \fBpcre_config()\fP
PCRE_CONFIG_UNICODE_PROPERTIES
Availability of Unicode property support
(1=yes 0=no)
.sp
-The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
+The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise. That error
+is also given if PCRE_CONFIG_UTF16 is passed to \fBpcre_config()\fP or if
+PCRE_CONFIG_UTF8 is passed to \fBpcre16_config()\fP.
.P
There is a complete description of the PCRE native API in the
.\" HREF
diff --git a/doc/pcre_copy_named_substring.3 b/doc/pcre_copy_named_substring.3
index 9ad6826..266e33d 100644
--- a/doc/pcre_copy_named_substring.3
+++ b/doc/pcre_copy_named_substring.3
@@ -14,6 +14,14 @@ PCRE - Perl-compatible regular expressions
.B int \fIstringcount\fP, const char *\fIstringname\fP,
.ti +5n
.B char *\fIbuffer\fP, int \fIbuffersize\fP);
+.PP
+.B int pcre16_copy_named_substring(const pcre16 *\fIcode\fP,
+.ti +5n
+.B PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
+.ti +5n
+.B int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP,
+.ti +5n
+.B PCRE_UCHAR16 *\fIbuffer\fP, int \fIbuffersize\fP);
.
.SH DESCRIPTION
.rs
@@ -23,8 +31,8 @@ by name, into a given buffer. The arguments are:
.sp
\fIcode\fP Pattern that was successfully matched
\fIsubject\fP Subject that has been successfully matched
- \fIovector\fP Offset vector that \fBpcre_exec()\fP used
- \fIstringcount\fP Value returned by \fBpcre_exec()\fP
+ \fIovector\fP Offset vector that \fBpcre[16]_exec()\fP used
+ \fIstringcount\fP Value returned by \fBpcre[16]_exec()\fP
\fIstringname\fP Name of the required substring
\fIbuffer\fP Buffer to receive the string
\fIbuffersize\fP Size of buffer
diff --git a/doc/pcre_copy_substring.3 b/doc/pcre_copy_substring.3
index 1910d18..7285173 100644
--- a/doc/pcre_copy_substring.3
+++ b/doc/pcre_copy_substring.3
@@ -12,6 +12,12 @@ PCRE - Perl-compatible regular expressions
.B int \fIstringcount\fP, int \fIstringnumber\fP, char *\fIbuffer\fP,
.ti +5n
.B int \fIbuffersize\fP);
+.PP
+.B int pcre16_copy_substring(PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
+.ti +5n
+.B int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR16 *\fIbuffer\fP,
+.ti +5n
+.B int \fIbuffersize\fP);
.
.SH DESCRIPTION
.rs
@@ -20,8 +26,8 @@ This is a convenience function for extracting a captured substring into a given
buffer. The arguments are:
.sp
\fIsubject\fP Subject that has been successfully matched
- \fIovector\fP Offset vector that \fBpcre_exec()\fP used
- \fIstringcount\fP Value returned by \fBpcre_exec()\fP
+ \fIovector\fP Offset vector that \fBpcre[16]_exec()\fP used
+ \fIstringcount\fP Value returned by \fBpcre[16]_exec()\fP
\fIstringnumber\fP Number of the required substring
\fIbuffer\fP Buffer to receive the string
\fIbuffersize\fP Size of buffer
diff --git a/doc/pcre_dfa_exec.3 b/doc/pcre_dfa_exec.3
index a5064b8..170ac4f 100644
--- a/doc/pcre_dfa_exec.3
+++ b/doc/pcre_dfa_exec.3
@@ -14,6 +14,14 @@ PCRE - Perl-compatible regular expressions
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
.ti +5n
.B int *\fIworkspace\fP, int \fIwscount\fP);
+.PP
+.B int pcre16_dfa_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP,"
+.ti +5n
+.B "PCRE_SPTR16 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
+.ti +5n
+.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
+.ti +5n
+.B int *\fIworkspace\fP, int \fIwscount\fP);
.
.SH DESCRIPTION
.rs
@@ -21,10 +29,11 @@ PCRE - Perl-compatible regular expressions
This function matches a compiled regular expression against a given subject
string, using an alternative matching algorithm that scans the subject string
just once (\fInot\fP Perl-compatible). Note that the main, Perl-compatible,
-matching function is \fBpcre_exec()\fP. The arguments for this function are:
+matching function is \fBpcre[16]_exec()\fP. The arguments for this function
+are:
.sp
\fIcode\fP Points to the compiled pattern
- \fIextra\fP Points to an associated \fBpcre_extra\fP structure,
+ \fIextra\fP Points to an associated \fBpcre[16]_extra\fP structure,
or is NULL
\fIsubject\fP Points to the subject string
\fIlength\fP Length of the subject string, in bytes
@@ -52,6 +61,9 @@ The options are:
PCRE_NOTEMPTY_ATSTART An empty string at the start of the subject
is not a valid match
PCRE_NO_START_OPTIMIZE Do not do "start-match" optimizations
+ PCRE_NO_UTF16_CHECK Do not check the subject for UTF-16
+ validity (only relevant if PCRE_UTF16
+ was set at compile time)
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
validity (only relevant if PCRE_UTF8
was set at compile time)
@@ -73,10 +85,10 @@ documentation. For details of partial matching, see the
.\"
page.
.P
-A \fBpcre_extra\fP structure contains the following fields:
+A \fBpcre[16]_extra\fP structure contains the following fields:
.sp
\fIflags\fP Bits indicating which fields are set
- \fIstudy_data\fP Opaque data from \fBpcre_study()\fP
+ \fIstudy_data\fP Opaque data from \fBpcre[16]_study()\fP
\fImatch_limit\fP Limit on internal resource use
\fImatch_limit_recursion\fP Limit on internal recursion depth
\fIcallout_data\fP Opaque data passed back to callouts
diff --git a/doc/pcre_exec.3 b/doc/pcre_exec.3
index 76e7f81..68db3c0 100644
--- a/doc/pcre_exec.3
+++ b/doc/pcre_exec.3
@@ -12,6 +12,12 @@ PCRE - Perl-compatible regular expressions
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
.ti +5n
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
+.PP
+.B int pcre16_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP,"
+.ti +5n
+.B "PCRE_SPTR16 \fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
+.ti +5n
+.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
.
.SH DESCRIPTION
.rs
@@ -21,7 +27,7 @@ string, using a matching algorithm that is similar to Perl's. It returns
offsets to captured substrings. Its arguments are:
.sp
\fIcode\fP Points to the compiled pattern
- \fIextra\fP Points to an associated \fBpcre_extra\fP structure,
+ \fIextra\fP Points to an associated \fBpcre[16]_extra\fP structure,
or is NULL
\fIsubject\fP Points to the subject string
\fIlength\fP Length of the subject string, in bytes
@@ -47,6 +53,9 @@ The options are:
PCRE_NOTEMPTY_ATSTART An empty string at the start of the subject
is not a valid match
PCRE_NO_START_OPTIMIZE Do not do "start-match" optimizations
+ PCRE_NO_UTF16_CHECK Do not check the subject for UTF-16
+ validity (only relevant if PCRE_UTF16
+ was set at compile time)
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
validity (only relevant if PCRE_UTF8
was set at compile time)
@@ -62,7 +71,7 @@ For details of partial matching, see the
page. A \fBpcre_extra\fP structure contains the following fields:
.sp
\fIflags\fP Bits indicating which fields are set
- \fIstudy_data\fP Opaque data from \fBpcre_study()\fP
+ \fIstudy_data\fP Opaque data from \fBpcre[16]_study()\fP
\fImatch_limit\fP Limit on internal resource use
\fImatch_limit_recursion\fP Limit on internal recursion depth
\fIcallout_data\fP Opaque data passed back to callouts
diff --git a/doc/pcre_free_study.3 b/doc/pcre_free_study.3
index 846582e..308ca94 100644
--- a/doc/pcre_free_study.3
+++ b/doc/pcre_free_study.3
@@ -8,12 +8,14 @@ PCRE - Perl-compatible regular expressions
.PP
.SM
.B void pcre_free_study(pcre_extra *\fIextra\fP);
+.PP
+.B void pcre16_free_study(pcre16_extra *\fIextra\fP);
.
.SH DESCRIPTION
.rs
.sp
This function is used to free the memory used for the data generated by a call
-to \fBpcre_study()\fP when it is no longer needed. The argument must be the
+to \fBpcre[16]_study()\fP when it is no longer needed. The argument must be the
result of such a call.
.P
There is a complete description of the PCRE native API in the
diff --git a/doc/pcre_free_substring.3 b/doc/pcre_free_substring.3
index ed3999a..9f1d700 100644
--- a/doc/pcre_free_substring.3
+++ b/doc/pcre_free_substring.3
@@ -8,13 +8,15 @@ PCRE - Perl-compatible regular expressions
.PP
.SM
.B void pcre_free_substring(const char *\fIstringptr\fP);
+.PP
+.B void pcre16_free_substring(PCRE_SPTR16 \fIstringptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This is a convenience function for freeing the store obtained by a previous
-call to \fBpcre_get_substring()\fP or \fBpcre_get_named_substring()\fP. Its
-only argument is a pointer to the string.
+call to \fBpcre[16]_get_substring()\fP or \fBpcre[16]_get_named_substring()\fP.
+Its only argument is a pointer to the string.
.P
There is a complete description of the PCRE native API in the
.\" HREF
diff --git a/doc/pcre_free_substring_list.3 b/doc/pcre_free_substring_list.3
index 89b7078..b8d8bbb 100644
--- a/doc/pcre_free_substring_list.3
+++ b/doc/pcre_free_substring_list.3
@@ -8,13 +8,15 @@ PCRE - Perl-compatible regular expressions
.PP
.SM
.B void pcre_free_substring_list(const char **\fIstringptr\fP);
+.PP
+.B void pcre16_free_substring_list(PCRE_SPTR16 *\fIstringptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This is a convenience function for freeing the store obtained by a previous
-call to \fBpcre_get_substring_list()\fP. Its only argument is a pointer to the
-list of string pointers.
+call to \fBpcre[16]_get_substring_list()\fP. Its only argument is a pointer to
+the list of string pointers.
.P
There is a complete description of the PCRE native API in the
.\" HREF
diff --git a/doc/pcre_fullinfo.3 b/doc/pcre_fullinfo.3
index e12fd66..c16406b 100644
--- a/doc/pcre_fullinfo.3
+++ b/doc/pcre_fullinfo.3
@@ -10,6 +10,10 @@ PCRE - Perl-compatible regular expressions
.B int pcre_fullinfo(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
.ti +5n
.B int \fIwhat\fP, void *\fIwhere\fP);
+.PP
+.B int pcre16_fullinfo(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP,"
+.ti +5n
+.B int \fIwhat\fP, void *\fIwhere\fP);
.
.SH DESCRIPTION
.rs
@@ -17,7 +21,7 @@ PCRE - Perl-compatible regular expressions
This function returns information about a compiled pattern. Its arguments are:
.sp
\fIcode\fP Compiled regular expression
- \fIextra\fP Result of \fBpcre_study()\fP or NULL
+ \fIextra\fP Result of \fBpcre[16]_study()\fP or NULL
\fIwhat\fP What information is required
\fIwhere\fP Where to put the information
.sp
@@ -26,15 +30,16 @@ The following information is available:
PCRE_INFO_BACKREFMAX Number of highest back reference
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
PCRE_INFO_DEFAULT_TABLES Pointer to default tables
- PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
+ PCRE_INFO_FIRSTBYTE Fixed first data unit for a match, or
-1 for start of string
or after newline, or
-2 otherwise
- PCRE_INFO_FIRSTTABLE Table of first bytes (after studying)
+ PCRE_INFO_FIRSTTABLE Table of first data units (after studying)
PCRE_INFO_HASCRORLF Return 1 if explicit CR or LF matches exist
PCRE_INFO_JCHANGED Return 1 if (?J) or (?-J) was used
PCRE_INFO_JIT Return 1 after successful JIT compilation
- PCRE_INFO_LASTLITERAL Literal last byte required
+ PCRE_INFO_JITSIZE Size of JIT compiled code
+ PCRE_INFO_LASTLITERAL Literal last data unit required
PCRE_INFO_MINLENGTH Lower bound length of matching strings
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
@@ -50,7 +55,8 @@ following \fIwhat\fP values:
.sp
PCRE_INFO_DEFAULT_TABLES const unsigned char *
PCRE_INFO_FIRSTTABLE const unsigned char *
- PCRE_INFO_NAMETABLE const unsigned char *
+ PCRE_INFO_NAMETABLE PCRE_SPTR16 (16-bit library)
+ PCRE_INFO_NAMETABLE const unsigned char * (8-bit library)
PCRE_INFO_OPTIONS unsigned long int
PCRE_INFO_SIZE size_t
.sp
diff --git a/doc/pcre_get_named_substring.3 b/doc/pcre_get_named_substring.3
index 22d0c1b..60b107c 100644
--- a/doc/pcre_get_named_substring.3
+++ b/doc/pcre_get_named_substring.3
@@ -14,6 +14,14 @@ PCRE - Perl-compatible regular expressions
.B int \fIstringcount\fP, const char *\fIstringname\fP,
.ti +5n
.B const char **\fIstringptr\fP);
+.PP
+.B int pcre16_get_named_substring(const pcre16 *\fIcode\fP,
+.ti +5n
+.B PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
+.ti +5n
+.B int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP,
+.ti +5n
+.B PCRE_SPTR16 *\fIstringptr\fP);
.
.SH DESCRIPTION
.rs
@@ -23,16 +31,17 @@ arguments are:
.sp
\fIcode\fP Compiled pattern
\fIsubject\fP Subject that has been successfully matched
- \fIovector\fP Offset vector that \fBpcre_exec()\fP used
- \fIstringcount\fP Value returned by \fBpcre_exec()\fP
+ \fIovector\fP Offset vector that \fBpcre[16]_exec()\fP used
+ \fIstringcount\fP Value returned by \fBpcre[16]_exec()\fP
\fIstringname\fP Name of the required substring
\fIstringptr\fP Where to put the string pointer
.sp
The memory in which the substring is placed is obtained by calling
-\fBpcre_malloc()\fP. The convenience function \fBpcre_free_substring()\fP can
-be used to free it when it is no longer needed. The yield of the function is
-the length of the extracted substring, PCRE_ERROR_NOMEMORY if sufficient memory
-could not be obtained, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
+\fBpcre[16]_malloc()\fP. The convenience function
+\fBpcre[16]_free_substring()\fP can be used to free it when it is no longer
+needed. The yield of the function is the length of the extracted substring,
+PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
+PCRE_ERROR_NOSUBSTRING if the string name is invalid.
.P
There is a complete description of the PCRE native API in the
.\" HREF
diff --git a/doc/pcre_get_stringnumber.3 b/doc/pcre_get_stringnumber.3
index f6017ff..62bf737 100644
--- a/doc/pcre_get_stringnumber.3
+++ b/doc/pcre_get_stringnumber.3
@@ -10,6 +10,10 @@ PCRE - Perl-compatible regular expressions
.B int pcre_get_stringnumber(const pcre *\fIcode\fP,
.ti +5n
.B const char *\fIname\fP);
+.PP
+.B int pcre16_get_stringnumber(const pcre16 *\fIcode\fP,
+.ti +5n
+.B PCRE_SPTR16 \fIname\fP);
.
.SH DESCRIPTION
.rs
@@ -23,8 +27,8 @@ parenthesis in a compiled pattern. Its arguments are:
The yield of the function is the number of the parenthesis if the name is
found, or PCRE_ERROR_NOSUBSTRING otherwise. When duplicate names are allowed
(PCRE_DUPNAMES is set), it is not defined which of the numbers is returned by
-\fBpcre_get_stringnumber()\fP. You can obtain the complete list by calling
-\fBpcre_get_stringtable_entries()\fP.
+\fBpcre[16]_get_stringnumber()\fP. You can obtain the complete list by calling
+\fBpcre[16]_get_stringtable_entries()\fP.
.P
There is a complete description of the PCRE native API in the
.\" HREF
diff --git a/doc/pcre_get_stringtable_entries.3 b/doc/pcre_get_stringtable_entries.3
index 979c4be..9a862bb 100644
--- a/doc/pcre_get_stringtable_entries.3
+++ b/doc/pcre_get_stringtable_entries.3
@@ -10,6 +10,10 @@ PCRE - Perl-compatible regular expressions
.B int pcre_get_stringtable_entries(const pcre *\fIcode\fP,
.ti +5n
.B const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);
+.PP
+.B int pcre16_get_stringtable_entries(const pcre16 *\fIcode\fP,
+.ti +5n
+.B PCRE_SPTR16 \fIname\fP, PCRE_UCHAR16 **\fIfirst\fP, PCRE_UCHAR16 **\fIlast\fP);
.
.SH DESCRIPTION
.rs
@@ -17,7 +21,7 @@ PCRE - Perl-compatible regular expressions
This convenience function finds, for a compiled pattern, the first and last
entries for a given name in the table that translates capturing parenthesis
names into numbers. When names are required to be unique (PCRE_DUPNAMES is
-\fInot\fP set), it is usually easier to use \fBpcre_get_stringnumber()\fP
+\fInot\fP set), it is usually easier to use \fBpcre[16]_get_stringnumber()\fP
instead.
.sp
\fIcode\fP Compiled regular expression
diff --git a/doc/pcre_get_substring.3 b/doc/pcre_get_substring.3
index 8fb11ec..f27cc99 100644
--- a/doc/pcre_get_substring.3
+++ b/doc/pcre_get_substring.3
@@ -12,6 +12,12 @@ PCRE - Perl-compatible regular expressions
.B int \fIstringcount\fP, int \fIstringnumber\fP,
.ti +5n
.B const char **\fIstringptr\fP);
+.PP
+.B int pcre16_get_substring(PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP,
+.ti +5n
+.B int \fIstringcount\fP, int \fIstringnumber\fP,
+.ti +5n
+.B PCRE_SPTR16 *\fIstringptr\fP);
.
.SH DESCRIPTION
.rs
@@ -20,16 +26,17 @@ This is a convenience function for extracting a captured substring. The
arguments are:
.sp
\fIsubject\fP Subject that has been successfully matched
- \fIovector\fP Offset vector that \fBpcre_exec()\fP used
- \fIstringcount\fP Value returned by \fBpcre_exec()\fP
+ \fIovector\fP Offset vector that \fBpcre[16]_exec()\fP used
+ \fIstringcount\fP Value returned by \fBpcre[16]_exec()\fP
\fIstringnumber\fP Number of the required substring
\fIstringptr\fP Where to put the string pointer
.sp
The memory in which the substring is placed is obtained by calling
-\fBpcre_malloc()\fP. The convenience function \fBpcre_free_substring()\fP can
-be used to free it when it is no longer needed. The yield of the function is
-the length of the substring, PCRE_ERROR_NOMEMORY if sufficient memory could not
-be obtained, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
+\fBpcre[16]_malloc()\fP. The convenience function
+\fBpcre[16]_free_substring()\fP can be used to free it when it is no longer
+needed. The yield of the function is the length of the substring,
+PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
+PCRE_ERROR_NOSUBSTRING if the string number is invalid.
.P
There is a complete description of the PCRE native API in the
.\" HREF
diff --git a/doc/pcre_get_substring_list.3 b/doc/pcre_get_substring_list.3
index 647ae39..2df985a 100644
--- a/doc/pcre_get_substring_list.3
+++ b/doc/pcre_get_substring_list.3
@@ -10,6 +10,10 @@ PCRE - Perl-compatible regular expressions
.B int pcre_get_substring_list(const char *\fIsubject\fP,
.ti +5n
.B int *\fIovector\fP, int \fIstringcount\fP, "const char ***\fIlistptr\fP);"
+.PP
+.B int pcre16_get_substring_list(PCRE_SPTR16 \fIsubject\fP,
+.ti +5n
+.B int *\fIovector\fP, int \fIstringcount\fP, "PCRE_SPTR16 **\fIlistptr\fP);"
.
.SH DESCRIPTION
.rs
@@ -18,17 +22,17 @@ This is a convenience function for extracting a list of all the captured
substrings. The arguments are:
.sp
\fIsubject\fP Subject that has been successfully matched
- \fIovector\fP Offset vector that \fBpcre_exec\fP used
- \fIstringcount\fP Value returned by \fBpcre_exec\fP
+ \fIovector\fP Offset vector that \fBpcre[16]_exec\fP used
+ \fIstringcount\fP Value returned by \fBpcre[16]_exec\fP
\fIlistptr\fP Where to put a pointer to the list
.sp
The memory in which the substrings and the list are placed is obtained by
-calling \fBpcre_malloc()\fP. The convenience function
-\fBpcre_free_substring_list()\fP can be used to free it when it is no longer
-needed. A pointer to a list of pointers is put in the variable whose address is
-in \fIlistptr\fP. The list is terminated by a NULL pointer. The yield of the
-function is zero on success or PCRE_ERROR_NOMEMORY if sufficient memory could
-not be obtained.
+calling \fBpcre[16]_malloc()\fP. The convenience function
+\fBpcre[16]_free_substring_list()\fP can be used to free it when it is no
+longer needed. A pointer to a list of pointers is put in the variable whose
+address is in \fIlistptr\fP. The list is terminated by a NULL pointer. The
+yield of the function is zero on success or PCRE_ERROR_NOMEMORY if sufficient
+memory could not be obtained.
.P
There is a complete description of the PCRE native API in the
.\" HREF
diff --git a/doc/pcre_jit_stack_alloc.3 b/doc/pcre_jit_stack_alloc.3
index 9b35582..adf33d5 100644
--- a/doc/pcre_jit_stack_alloc.3
+++ b/doc/pcre_jit_stack_alloc.3
@@ -10,14 +10,16 @@ PCRE - Perl-compatible regular expressions
.B pcre_jit_stack *pcre_jit_stack_alloc(int \fIstartsize\fP,
.ti +5n
.B int \fImaxsize\fP);
+.PP
+.B pcre16_jit_stack *pcre16_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP);
.
.SH DESCRIPTION
.rs
.sp
This function is used to create a stack for use by the code compiled by the JIT
-optimization of \fBpcre_study()\fP. The arguments are a starting size for the
-stack, and a maximum size to which it is allowed to grow. The result can be
-passed to the JIT runtime code by \fBpcre_assign_jit_stack()\fP, or that
+optimization of \fBpcre[16]_study()\fP. The arguments are a starting size for
+the stack, and a maximum size to which it is allowed to grow. The result can be
+passed to the JIT runtime code by \fBpcre[16]_assign_jit_stack()\fP, or that
function can set up a callback for obtaining a stack. A maximum stack size of
512K to 1M should be more than enough for any pattern. For more details, see
the
diff --git a/doc/pcre_jit_stack_free.3 b/doc/pcre_jit_stack_free.3
index f03c86b..c0daacb 100644
--- a/doc/pcre_jit_stack_free.3
+++ b/doc/pcre_jit_stack_free.3
@@ -8,13 +8,15 @@ PCRE - Perl-compatible regular expressions
.PP
.SM
.B void pcre_jit_stack_free(pcre_jit_stack *\fIstack\fP);
+.PP
+.B void pcre16_jit_stack_free(pcre16_jit_stack *\fIstack\fP);
.
.SH DESCRIPTION
.rs
.sp
This function is used to free a JIT stack that was created by
-\fBpcre_jit_stack_alloc()\fP when it is no longer needed. For more details, see
-the
+\fBpcre[16]_jit_stack_alloc()\fP when it is no longer needed. For more details,
+see the
.\" HREF
\fBpcrejit\fP
.\"
diff --git a/doc/pcre_maketables.3 b/doc/pcre_maketables.3
index 8d3978c..8b2c0b2 100644
--- a/doc/pcre_maketables.3
+++ b/doc/pcre_maketables.3
@@ -8,15 +8,17 @@ PCRE - Perl-compatible regular expressions
.PP
.SM
.B const unsigned char *pcre_maketables(void);
+.PP
+.B const unsigned char *pcre16_maketables(void);
.
.SH DESCRIPTION
.rs
.sp
This function builds a set of character tables for character values less than
-256. These can be passed to \fBpcre_compile()\fP to override PCRE's internal,
-built-in tables (which were made by \fBpcre_maketables()\fP when PCRE was
-compiled). You might want to do this if you are using a non-standard locale.
-The function yields a pointer to the tables.
+256. These can be passed to \fBpcre[16]_compile()\fP to override PCRE's
+internal, built-in tables (which were made by \fBpcre[16]_maketables()\fP when
+PCRE was compiled). You might want to do this if you are using a non-standard
+locale. The function yields a pointer to the tables.
.P
There is a complete description of the PCRE native API in the
.\" HREF
diff --git a/doc/pcre_pattern_to_host_byte_order.3 b/doc/pcre_pattern_to_host_byte_order.3
new file mode 100644
index 0000000..adb51c0
--- /dev/null
+++ b/doc/pcre_pattern_to_host_byte_order.3
@@ -0,0 +1,43 @@
+.TH PCRE_PATTERN_TO_HOST_BYTE_ORDER 3
+.SH NAME
+PCRE - Perl-compatible regular expressions
+.SH SYNOPSIS
+.rs
+.sp
+.B #include <pcre.h>
+.PP
+.SM
+.B int pcre_pattern_to_host_byte_order(pcre *\fIcode\fP,
+.ti +5n
+.B pcre_extra *\fIextra\fP, const unsigned char *\fItables\fP);
+.PP
+.B int pcre16_pattern_to_host_byte_order(pcre16 *\fIcode\fP,
+.ti +5n
+.B pcre16_extra *\fIextra\fP, const unsigned char *\fItables\fP);
+.
+.
+.SH DESCRIPTION
+.rs
+.sp
+This function ensures that the bytes in 2-byte and 4-byte values in a compiled
+pattern are in the correct order for the current host. It is useful when a
+pattern that has been compiled on one host is transferred to another that might
+have different endianness. The arguments are:
+.sp
+ \fIcode\fP A compiled regular expression
+ \fIextra\fP Points to an associated \fBpcre[16]_extra\fP structure,
+ or is NULL
+ \fItables\fP Pointer to character tables, or NULL to
+ set the built-in default
+.sp
+The result is 0 for success, a negative PCRE_ERROR_xxx value otherwise.
+.P
+There is a complete description of the PCRE native API in the
+.\" HREF
+\fBpcreapi\fP
+.\"
+page and a description of the POSIX API in the
+.\" HREF
+\fBpcreposix\fP
+.\"
+page.
diff --git a/doc/pcre_refcount.3 b/doc/pcre_refcount.3
index 6ab9f4f..57c0ddc 100644
--- a/doc/pcre_refcount.3
+++ b/doc/pcre_refcount.3
@@ -8,6 +8,8 @@ PCRE - Perl-compatible regular expressions
.PP
.SM
.B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP);
+.PP
+.B int pcre16_refcount(pcre16 *\fIcode\fP, int \fIadjust\fP);
.
.SH DESCRIPTION
.rs
diff --git a/doc/pcre_study.3 b/doc/pcre_study.3
index f37a5e1..092a4d7 100644
--- a/doc/pcre_study.3
+++ b/doc/pcre_study.3
@@ -10,6 +10,10 @@ PCRE - Perl-compatible regular expressions
.B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,
.ti +5n
.B const char **\fIerrptr\fP);
+.PP
+.B pcre16_extra *pcre16_study(const pcre16 *\fIcode\fP, int \fIoptions\fP,
+.ti +5n
+.B const char **\fIerrptr\fP);
.
.SH DESCRIPTION
.rs
@@ -18,11 +22,12 @@ This function studies a compiled pattern, to see if additional information can
be extracted that might speed up matching. Its arguments are:
.sp
\fIcode\fP A compiled regular expression
- \fIoptions\fP Options for \fBpcre_study()\fP
+ \fIoptions\fP Options for \fBpcre[16]_study()\fP
\fIerrptr\fP Where to put an error message
.sp
If the function succeeds, it returns a value that can be passed to
-\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP via their \fIextra\fP arguments.
+\fBpcre[16]_exec()\fP or \fBpcre[16]_dfa_exec()\fP via their \fIextra\fP
+arguments.
.P
If the function returns NULL, either it could not find any additional
information, or there was an error. You can tell the difference by looking at
diff --git a/doc/pcre_utf16_to_host_byte_order.3 b/doc/pcre_utf16_to_host_byte_order.3
new file mode 100644
index 0000000..158aaea
--- /dev/null
+++ b/doc/pcre_utf16_to_host_byte_order.3
@@ -0,0 +1,46 @@
+.TH PCRE_UTF16_TO_HOST_BYTE_ORDER 3
+.SH NAME
+PCRE - Perl-compatible regular expressions
+.SH SYNOPSIS
+.rs
+.sp
+.B #include <pcre.h>
+.PP
+.SM
+.B int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *\fIoutput\fP,
+.ti +5n
+.B PCRE_SPTR16 \fIinput\fP, int \fIlength\fP, int *\fIbyte_order\fP,
+.ti +5n
+.B int \fIkeep_boms\fP);
+.
+.
+.SH DESCRIPTION
+.rs
+.sp
+This function, which exists only in the 16-bit library, converts a UTF-16
+string to the correct order for the current host, taking account of any byte
+order marks (BOMs) within the string. Its arguments are:
+.sp
+ \fIoutput\fP pointer to output buffer, may be the same as \fIinput\fP
+ \fIinput\fP pointer to input buffer
+ \fIlength\fP number of 16-bit units in the input, or negative for
+ a zero-terminated string
+ \fIbyte_order\fP a NULL value or a value of 0 pointed to means start
+ in host byte order
+ \fIkeep_boms\fP if non-zero, BOMs are copied to the output string
+.sp
+The result of the function is the number of 16-bit units placed into the output
+buffer, including the zero terminator if the string was zero-terminated.
+.P
+If \fIbyte_order\fP is not NULL, it is set to indicate the byte order that is
+current at the end of the string.
+.P
+There is a complete description of the PCRE native API in the
+.\" HREF
+\fBpcreapi\fP
+.\"
+page and a description of the POSIX API in the
+.\" HREF
+\fBpcreposix\fP
+.\"
+page.
diff --git a/doc/pcre_version.3 b/doc/pcre_version.3
index f1563fa..4658b5c 100644
--- a/doc/pcre_version.3
+++ b/doc/pcre_version.3
@@ -7,13 +7,16 @@ PCRE - Perl-compatible regular expressions
.B #include <pcre.h>
.PP
.SM
-.B char *pcre_version(void);
+.B const char *pcre_version(void);
+.PP
+.B const char *pcre16_version(void);
.
.SH DESCRIPTION
.rs
.sp
-This function returns a character string that gives the version number of the
-PCRE library and the date of its release.
+This function (even in the 16-bit library) returns a zero-terminated, 8-bit
+character string that gives the version number of the PCRE library and the date
+of its release.
.P
There is a complete description of the PCRE native API in the
.\" HREF
diff --git a/doc/pcreunicode.3 b/doc/pcreunicode.3
index b805a64..e480647 100644
--- a/doc/pcreunicode.3
+++ b/doc/pcreunicode.3
@@ -1,26 +1,55 @@
.TH PCREUNICODE 3
.SH NAME
PCRE - Perl-compatible regular expressions
-.SH "UTF-8 AND UNICODE PROPERTY SUPPORT"
+.SH "UTF-8, UTF-16, AND UNICODE PROPERTY SUPPORT"
.rs
.sp
-In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
-the code, and, in addition, you must call
+From Release 8.30, in addition to its previous UTF-8 support, PCRE also
+supports UTF-16 by means of a separate 16-bit library. This can be built as
+well as, or instead of, the 8-bit library.
+.
+.
+.SH "UTF-8 SUPPORT"
+.rs
+.sp
+In order process UTF-8 strings, you must build PCRE's 8-bit library with UTF
+support, and, in addition, you must call
.\" HREF
\fBpcre_compile()\fP
.\"
with the PCRE_UTF8 option flag, or the pattern must start with the sequence
(*UTF8). When either of these is the case, both the pattern and any subject
strings that are matched against it are treated as UTF-8 strings instead of
-strings of 1-byte characters. PCRE does not support any other formats (in
-particular, it does not support UTF-16).
-.P
-If you compile PCRE with UTF-8 support, but do not use it at run time, the
+strings of 1-byte characters.
+.
+.
+.SH "UTF-16 SUPPORT"
+.rs
+.sp
+In order process UTF-16 strings, you must build PCRE's 16-bit library with UTF
+support, and, in addition, you must call
+.\" HREF
+\fBpcre16_compile()\fP
+.\"
+with the PCRE_UTF16 option flag, or the pattern must start with the sequence
+(*UTF16). When either of these is the case, both the pattern and any subject
+strings that are matched against it are treated as UTF-16 strings instead of
+strings of 16-bit characters.
+.
+.
+.SH "UTF SUPPORT OVERHEAD"
+.rs
+.sp
+If you compile PCRE with UTF support, but do not use it at run time, the
library will be a bit bigger, but the additional run time overhead is limited
-to testing the PCRE_UTF8 flag occasionally, so should not be very big.
-.P
-If PCRE is built with Unicode character property support (which implies UTF-8
-support), the escape sequences \ep{..}, \eP{..}, and \eX are supported.
+to testing the PCRE_UTF8/16 flag occasionally, so should not be very big.
+.
+.
+.SH "UNICODE PROPERTY SUPPORT"
+.rs
+.sp
+If PCRE is built with Unicode character property support (which implies UTF
+support), the escape sequences \ep{..}, \eP{..}, and \eX can be used.
The available properties that can be tested are limited to the general
category properties such as Lu for an upper case letter or Nd for a decimal
number, the Unicode script names such as Arabic or Han, and the derived
@@ -38,22 +67,19 @@ compatibility with Perl 5.6. PCRE does not support this.
.SS "Validity of UTF-8 strings"
.rs
.sp
-When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
-are (by default) checked for validity on entry to the relevant functions. From
-release 7.3 of PCRE, the check is according the rules of RFC 3629, which are
-themselves derived from the Unicode specification. Earlier releases of PCRE
-followed the rules of RFC 2279, which allows the full range of 31-bit values (0
-to 0x7FFFFFFF). The current check allows only values in the range U+0 to
-U+10FFFF, excluding U+D800 to U+DFFF.
+When you set the PCRE_UTF8 flag, the byte strings passed as patterns and
+subjects are (by default) checked for validity on entry to the relevant
+functions. From release 7.3 of PCRE, the check is according the rules of RFC
+3629, which are themselves derived from the Unicode specification. Earlier
+releases of PCRE followed the rules of RFC 2279, which allows the full range of
+31-bit values (0 to 0x7FFFFFFF). The current check allows only values in the
+range U+0 to U+10FFFF, excluding U+D800 to U+DFFF.
.P
-The excluded code points are the "Low Surrogate Area" of Unicode, of which the
-Unicode Standard says this: "The Low Surrogate Area does not contain any
-character assignments, consequently no character code charts or namelists are
-provided for this area. Surrogates are reserved for use with UTF-16 and then
-must be used in pairs." The code points that are encoded by UTF-16 pairs are
-available as independent code points in the UTF-8 encoding. (In other words,
-the whole surrogate thing is a fudge for UTF-16 which unfortunately messes up
-UTF-8.)
+The excluded code points are the "Surrogate Area" of Unicode. They are reserved
+for use by UTF-16, where they are used in pairs to encode codepoints with
+values greater than 0xFFFF. The code points that are encoded by UTF-16 pairs
+are available independently in the UTF-8 encoding. (In other words, the whole
+surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)
.P
If an invalid UTF-8 string is passed to PCRE, an error return is given. At
compile time, the only additional information is the offset to the first byte
@@ -85,43 +111,70 @@ situation, you will have to apply your own validity check, and avoid the use of
JIT optimization.
.
.
-.SS "General comments about UTF-8 mode"
+.\" HTML <a name="utf16strings"></a>
+.SS "Validity of UTF-16 strings"
.rs
.sp
-1. An unbraced hexadecimal escape sequence (such as \exb3) matches a two-byte
-UTF-8 character if the value is greater than 127.
+When you set the PCRE_UTF16 flag, the strings of 16-bit data units that are
+passed as patterns and subjects are (by default) checked for validity on entry
+to the relevant functions. Values other than those in the surrogate range
+U+D800 to U+DFFF are independent code points. Values in the surrogate range
+must be used in pairs in the correct manner.
.P
-2. Octal numbers up to \e777 are recognized, and match two-byte UTF-8
-characters for values greater than \e177.
+If an invalid UTF-16 string is passed to PCRE, an error return is given. At
+compile time, the only additional information is the offset to the first data
+unit of the failing character. The runtime functions \fBpcre16_exec()\fP and
+\fBpcre16_dfa_exec()\fP also pass back this information, as well as a more
+detailed reason code if the caller has provided memory in which to do this.
.P
-3. Repeat quantifiers apply to complete UTF-8 characters, not to individual
-bytes, for example: \ex{100}{3}.
+In some situations, you may already know that your strings are valid, and
+therefore want to skip these checks in order to improve performance. If you set
+the PCRE_NO_UTF16_CHECK flag at compile time or at run time, PCRE assumes that
+the pattern or subject it is given (respectively) contains only valid UTF-16
+sequences. In this case, it does not diagnose an invalid UTF-16 string.
+.
+.
+.SS "General comments about UTF modes"
+.rs
+.sp
+1. Codepoints less than 256 can be specified by either braced or unbraced
+hexadecimal escape sequences (for example, \ex{b3} or \exb3). Larger values
+have to use braced sequences.
.P
-4. The dot metacharacter matches one UTF-8 character instead of a single byte.
+2. Octal numbers up to \e777 are recognized, and in UTF-8 mode, they match
+two-byte characters for values greater than \e177.
.P
-5. The escape sequence \eC can be used to match a single byte in UTF-8 mode,
-but its use can lead to some strange effects because it breaks up multibyte
-characters (see the description of \eC in the
+3. Repeat quantifiers apply to complete UTF characters, not to individual
+data units, for example: \ex{100}{3}.
+.P
+4. The dot metacharacter matches one UTF character instead of a single data
+unit.
+.P
+5. The escape sequence \eC can be used to match a single byte in UTF-8 mode, or
+a single 16-bit data unit in UTF-16 mode, but its use can lead to some strange
+effects because it breaks up multi-unit characters (see the description of \eC
+in the
.\" HREF
\fBpcrepattern\fP
.\"
documentation). The use of \eC is not supported in the alternative matching
-function \fBpcre_dfa_exec()\fP, nor is it supported in UTF-8 mode by the JIT
-optimization of \fBpcre_exec()\fP. If JIT optimization is requested for a UTF-8
-pattern that contains \eC, it will not succeed, and so the matching will be
-carried out by the normal interpretive function.
+function \fBpcre[16]_dfa_exec()\fP, nor is it supported in UTF mode by the JIT
+optimization of \fBpcre[16]_exec()\fP. If JIT optimization is requested for a
+UTF pattern that contains \eC, it will not succeed, and so the matching will
+be carried out by the normal interpretive function.
.P
6. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
test characters of any code value, but, by default, the characters that PCRE
-recognizes as digits, spaces, or word characters remain the same set as before,
-all with values less than 256. This remains true even when PCRE is built to
-include Unicode property support, because to do otherwise would slow down PCRE
-in many common cases. Note in particular that this applies to \eb and \eB,
-because they are defined in terms of \ew and \eW. If you really want to test
-for a wider sense of, say, "digit", you can use explicit Unicode property tests
-such as \ep{Nd}. Alternatively, if you set the PCRE_UCP option, the way that
-the character escapes work is changed so that Unicode properties are used to
-determine which characters match. There are more details in the section on
+recognizes as digits, spaces, or word characters remain the same set as in
+non-UTF mode, all with values less than 256. This remains true even when PCRE
+is built to include Unicode property support, because to do otherwise would
+slow down PCRE in many common cases. Note in particular that this applies to
+\eb and \eB, because they are defined in terms of \ew and \eW. If you really
+want to test for a wider sense of, say, "digit", you can use explicit Unicode
+property tests such as \ep{Nd}. Alternatively, if you set the PCRE_UCP option,
+the way that the character escapes work is changed so that Unicode properties
+are used to determine which characters match. There are more details in the
+section on
.\" HTML <a href="pcrepattern.html#genericchartypes">
.\" </a>
generic character types
@@ -163,6 +216,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 19 October 2011
-Copyright (c) 1997-2011 University of Cambridge.
+Last updated: 13 January 2012
+Copyright (c) 1997-2012 University of Cambridge.
.fi
diff --git a/doc/perltest.txt b/doc/perltest.txt
index bd13ada..3785bdd 100644
--- a/doc/perltest.txt
+++ b/doc/perltest.txt
@@ -14,16 +14,16 @@ other pcretest modifiers that are either handled or ignored:
/W ignored
/S ignored
/SS ignored
+ /Y ignored
-The data lines are processed as Perl double-quoted strings, so if they contain
-" $ or @ characters, these have to be escaped. For this reason, all such
-characters in testinput1, testinput4, testinput6, and testinput11 are escaped
-so that they can be used for perltest as well as for pcretest. The pcretest \Y
-escape in data lines is removed.
-
-The special upper case pattern modifiers such as /A that pcretest recognizes,
-and its special data line escapes, are not used in these files. The output
-should be identical, apart from the initial identifying banner.
+The pcretest \Y escape in data lines is removed before matching. The data lines
+are processed as Perl double-quoted strings, so if they contain " $ or @
+characters, these have to be escaped. For this reason, all such characters in
+the Perl-compatible testinput1 file are escaped so that they can be used for
+perltest as well as for pcretest. The special upper case pattern modifiers such
+as /A that pcretest recognizes, and its special data line escapes, are not used
+in the Perl-compatible test file. The output should be identical, apart from
+the initial identifying banner.
The perltest.pl script can also test UTF-8 features. It recognizes the special
modifier /8 that pcretest uses to invoke UTF-8 functionality. The testinput4
@@ -31,13 +31,10 @@ and testinput6 files can be fed to perltest to run compatible UTF-8 tests.
However, it is necessary to add "use utf8;" to the script to make this work
correctly.
-The testinput11 file contains tests that use features of Perl 5.10, so does not
-work with Perl 5.8.
-
The other testinput files are not suitable for feeding to perltest.pl, since
they make use of the special upper case modifiers and escapes that pcretest
uses to test some features of PCRE. Some of these files also contains malformed
regular expressions, in order to check that PCRE diagnoses them correctly.
Philip Hazel
-August 2011
+January 2012