summaryrefslogtreecommitdiff
path: root/doc/html
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2014-11-23 18:38:38 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2014-11-23 18:38:38 +0000
commit469ce4c0cdbb50172723c20b6ce2590a5e593023 (patch)
tree59e68ed0cc94d6367c7d19e230a778b58a89e46b /doc/html
parented4ed4376d5c874b42ca5817e91189b6ca1c7298 (diff)
downloadpcre2-469ce4c0cdbb50172723c20b6ce2590a5e593023.tar.gz
More documentation and test updates.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@158 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html')
-rw-r--r--doc/html/pcre2.html2
-rw-r--r--doc/html/pcre2build.html115
-rw-r--r--doc/html/pcre2callout.html40
-rw-r--r--doc/html/pcre2grep.html6
-rw-r--r--doc/html/pcre2jit.html16
-rw-r--r--doc/html/pcre2syntax.html4
-rw-r--r--doc/html/pcre2unicode.html14
7 files changed, 100 insertions, 97 deletions
diff --git a/doc/html/pcre2.html b/doc/html/pcre2.html
index a01c63f..2c2b106 100644
--- a/doc/html/pcre2.html
+++ b/doc/html/pcre2.html
@@ -148,7 +148,7 @@ listing), and the short pages for individual functions, are concatenated in
pcre2limits details of size and other limits
pcre2matching discussion of the two matching algorithms
pcre2partial details of the partial matching facility
- pcre2pattern syntax and semantics of supported regular expression patterns
+ pcre2pattern syntax and semantics of supported regular expression patterns
pcre2perform discussion of performance issues
pcre2posix the POSIX-compatible C API for the 8-bit library
pcre2sample discussion of the pcre2demo program
diff --git a/doc/html/pcre2build.html b/doc/html/pcre2build.html
index c6ba6de..b87dad7 100644
--- a/doc/html/pcre2build.html
+++ b/doc/html/pcre2build.html
@@ -17,9 +17,9 @@ please consult the man page, in case the conversion went wrong.
<li><a name="TOC2" href="#SEC2">PCRE2 BUILD-TIME OPTIONS</a>
<li><a name="TOC3" href="#SEC3">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
<li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a>
-<li><a name="TOC5" href="#SEC5">Unicode and UTF SUPPORT</a>
+<li><a name="TOC5" href="#SEC5">UNICODE AND UTF SUPPORT</a>
<li><a name="TOC6" href="#SEC6">JUST-IN-TIME COMPILER SUPPORT</a>
-<li><a name="TOC7" href="#SEC7">CODE VALUE OF NEWLINE</a>
+<li><a name="TOC7" href="#SEC7">NEWLINE RECOGNITION</a>
<li><a name="TOC8" href="#SEC8">WHAT \R MATCHES</a>
<li><a name="TOC9" href="#SEC9">HANDLING VERY LARGE PATTERNS</a>
<li><a name="TOC10" href="#SEC10">AVOIDING EXCESSIVE STACK USAGE</a>
@@ -91,12 +91,12 @@ respectively. These can be interpreted either as single-unit characters or
UTF-16/UTF-32 strings. To build these additional libraries, add one or both of
the following to the <b>configure</b> command:
<pre>
- --enable-pcre16
- --enable-pcre32
+ --enable-pcre2-16
+ --enable-pcre2-32
</pre>
If you do not want the 8-bit library, add
<pre>
- --disable-pcre8
+ --disable-pcre2-8
</pre>
as well. At least one of the three libraries must be built. Note that the POSIX
wrapper is for the 8-bit library only, and that <b>pcre2grep</b> is an 8-bit
@@ -106,14 +106,15 @@ libraries.
<br><a name="SEC4" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
<P>
The Autotools PCRE2 building process uses <b>libtool</b> to build both shared
-and static libraries by default. You can suppress one of these by adding one of
+and static libraries by default. You can suppress an unwanted library by adding
+one of
<pre>
--disable-shared
--disable-static
</pre>
-to the <b>configure</b> command, as required.
+to the <b>configure</b> command.
</P>
-<br><a name="SEC5" href="#TOC1">Unicode and UTF SUPPORT</a><br>
+<br><a name="SEC5" href="#TOC1">UNICODE AND UTF SUPPORT</a><br>
<P>
By default, PCRE2 is built with support for Unicode and UTF character strings.
To build it without Unicode support, add
@@ -126,20 +127,15 @@ in the same configuration.
</P>
<P>
Of itself, Unicode support does not make PCRE2 treat strings as UTF-8, UTF-16
-or UTF-32. To do that you have have to set the PCRE2_UTF option when you call
-<b>pcre2_compile()</b> to compile a pattern.
+or UTF-32. To do that, applications that use the library have to set the
+PCRE2_UTF option when they call <b>pcre2_compile()</b> to compile a pattern.
</P>
<P>
-It is not possible to support both EBCDIC and UTF-8 codes in the same version
-of the library. Consequently, --enable-unicode and --enable-ebcdic are mutually
-exclusive.
-</P>
-<P>
-UTF support allows the libraries to process character codepoints up to 0x10ffff
-in the strings that they handle. It also provides support for accessing the
-properties of such characters, using pattern escapes such as \P, \p, and \X.
-Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
-supported. Details are given in the
+UTF support allows the libraries to process character code points up to
+0x10ffff in the strings that they handle. It also provides support for
+accessing the Unicode properties of such characters, using pattern escapes such
+as \P, \p, and \X. Only the general category properties such as <i>Lu</i> and
+<i>Nd</i> are supported. Details are given in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation.
</P>
@@ -150,7 +146,7 @@ Just-in-time compiler support is included in the build by specifying
--enable-jit
</pre>
This support is available only for certain hardware architectures. If this
-option is set for an unsupported architecture, a compile time error occurs.
+option is set for an unsupported architecture, a building error occurs.
See the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for a discussion of JIT usage. When JIT support is enabled,
@@ -160,7 +156,7 @@ pcre2grep automatically makes use of it, unless you add
</pre>
to the "configure" command.
</P>
-<br><a name="SEC7" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
+<br><a name="SEC7" href="#TOC1">NEWLINE RECOGNITION</a><br>
<P>
By default, PCRE2 interprets the linefeed (LF) character as indicating the end
of a line. This is the normal newline character on Unix-like systems. You can
@@ -168,12 +164,13 @@ compile PCRE2 to use carriage return (CR) instead, by adding
<pre>
--enable-newline-is-cr
</pre>
-to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
+to the <b>configure</b> command. There is also an --enable-newline-is-lf option,
which explicitly specifies linefeed as the newline character.
-<br>
-<br>
-Alternatively, you can specify that line endings are to be indicated by the two
-character sequence CRLF. If you want this, add
+</P>
+<P>
+Alternatively, you can specify that line endings are to be indicated by the
+two-character sequence CRLF (CR immediately followed by LF). If you want this,
+add
<pre>
--enable-newline-is-crlf
</pre>
@@ -186,22 +183,26 @@ indicating a line ending. Finally, a fifth option, specified by
<pre>
--enable-newline-is-any
</pre>
-causes PCRE2 to recognize any Unicode newline sequence.
+causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline
+sequences are the three just mentioned, plus the single characters VT (vertical
+tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
+separator, U+2028), and PS (paragraph separator, U+2029).
</P>
<P>
-Whatever line ending convention is selected when PCRE2 is built can be
-overridden when the library functions are called. At build time it is
+Whatever default line ending convention is selected when PCRE2 is built can be
+overridden by applications that use the library. At build time it is
conventional to use the standard for your operating system.
</P>
<br><a name="SEC8" href="#TOC1">WHAT \R MATCHES</a><br>
<P>
By default, the sequence \R in a pattern matches any Unicode newline sequence,
-whatever has been selected as the line ending sequence. If you specify
+independently of what has been selected as the line ending sequence. If you
+specify
<pre>
--enable-bsr-anycrlf
</pre>
the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
-selected when PCRE2 is built can be overridden when the library functions are
+selected when PCRE2 is built can be overridden by applications that use the
called.
</P>
<br><a name="SEC9" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
@@ -210,10 +211,10 @@ Within a compiled pattern, offset values are used to point from one part to
another (for example, from an opening parenthesis to an alternation
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
are used for these offsets, leading to a maximum size for a compiled pattern of
-around 64K. This is sufficient to handle all but the most gigantic patterns.
-Nevertheless, some people do want to process truly enormous patterns, so it is
-possible to compile PCRE2 to use three-byte or four-byte offsets by adding a
-setting such as
+around 64K code units. This is sufficient to handle all but the most gigantic
+patterns. Nevertheless, some people do want to process truly enormous patterns,
+so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
+adding a setting such as
<pre>
--with-link-size=3
</pre>
@@ -294,16 +295,20 @@ hand".)
<br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br>
<P>
PCRE2 assumes by default that it will run in an environment where the character
-code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
+code is ASCII or Unicode, which is a superset of ASCII. This is the case for
most computer operating systems. PCRE2 can, however, be compiled to run in an
-EBCDIC environment by adding
+8-bit EBCDIC environment by adding
<pre>
--enable-ebcdic --disable-unicode
</pre>
to the <b>configure</b> command. This setting implies
--enable-rebuild-chartables. You should only use it if you know that you are in
-an EBCDIC environment (for example, an IBM mainframe operating system). The
---enable-ebcdic option is incompatible with Unicode support.
+an EBCDIC environment (for example, an IBM mainframe operating system).
+</P>
+<P>
+It is not possible to support both EBCDIC and UTF-8 codes in the same version
+of the library. Consequently, --enable-unicode and --enable-ebcdic are mutually
+exclusive.
</P>
<P>
The EBCDIC character that corresponds to an ASCII LF is assumed to have the
@@ -347,8 +352,8 @@ parameter value by adding, for example,
<pre>
--with-pcre2grep-bufsize=50K
</pre>
-to the <b>configure</b> command. The caller of \fPpcre2grep\fP can, however,
-override this value by specifying a run-time option.
+to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override this
+value by using --buffer-size on the command line..
</P>
<br><a name="SEC16" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
<P>
@@ -362,16 +367,16 @@ to the <b>configure</b> command, <b>pcre2test</b> is linked with the
from a terminal, it reads it using the <b>readline()</b> function. This provides
line-editing and history facilities. Note that <b>libreadline</b> is
GPL-licensed, so if you distribute a binary of <b>pcre2test</b> linked in this
-way, there may be licensing issues. These can be avoided by linking with
-<b>libedit</b> (which has a BSD licence) instead.
+way, there may be licensing issues. These can be avoided by linking instead
+with <b>libedit</b>, which has a BSD licence.
</P>
<P>
-Setting this option causes the <b>-lreadline</b> option to be added to the
-<b>pcre2test</b> build. In many operating environments with a sytem-installed
-readline library this is sufficient. However, in some environments (e.g. if an
-unmodified distribution version of readline is in use), some extra
-configuration may be necessary. The INSTALL file for <b>libreadline</b> says
-this:
+Setting --enable-pcre2test-libreadline causes the <b>-lreadline</b> option to be
+added to the <b>pcre2test</b> build. In many operating environments with a
+sytem-installed readline library this is sufficient. However, in some
+environments (e.g. if an unmodified distribution version of readline is in
+use), some extra configuration may be necessary. The INSTALL file for
+<b>libreadline</b> says this:
<pre>
"Readline uses the termcap functions, but does not link with
the termcap or curses library itself, allowing applications
@@ -386,13 +391,13 @@ immediately before the <b>configure</b> command.
</P>
<br><a name="SEC17" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
<P>
-By adding the
+If you add
<pre>
--enable-valgrind
</pre>
-option to to the <b>configure</b> command, PCRE2 will use valgrind annotations
-to mark certain memory regions as unaddressable. This allows it to detect
-invalid memory accesses, and is mostly useful for debugging PCRE2 itself.
+to the <b>configure</b> command, PCRE2 will use valgrind annotations to mark
+certain memory regions as unaddressable. This allows it to detect invalid
+memory accesses, and is mostly useful for debugging PCRE2 itself.
</P>
<br><a name="SEC18" href="#TOC1">CODE COVERAGE REPORTING</a><br>
<P>
@@ -466,7 +471,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 03 November 2014
+Last updated: 23 November 2014
<br>
Copyright &copy; 1997-2014 University of Cambridge.
<br>
diff --git a/doc/html/pcre2callout.html b/doc/html/pcre2callout.html
index baf76e1..4d0238a 100644
--- a/doc/html/pcre2callout.html
+++ b/doc/html/pcre2callout.html
@@ -85,29 +85,27 @@ expect.
<P>
At compile time, PCRE2 "auto-possessifies" repeated items when it knows that
what follows cannot be part of the repeat. For example, a+[bc] is compiled as
-if it were a++[bc]. The <b>pcre2test</b> output when this pattern is anchored
-and then applied with automatic callouts to the string "aaaa" is:
+if it were a++[bc]. The <b>pcre2test</b> output when this pattern is compiled
+with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied to the string
+"aaaa" is:
<pre>
---&#62;aaaa
- +0 ^ ^
- +1 ^ a+
- +3 ^ ^ [bc]
+ +0 ^ a+
+ +2 ^ ^ [bc]
No match
</pre>
This indicates that when matching [bc] fails, there is no backtracking into a+
and therefore the callouts that would be taken for the backtracks do not occur.
-You can disable the auto-possessify feature by passing PCRE2_NO_AUTO_POSSESS
-to <b>pcre2_compile()</b>, or starting the pattern with (*NO_AUTO_POSSESS). If
-this is done in <b>pcre2test</b> (using the /no_auto_possess qualifier), the
-output changes to this:
+You can disable the auto-possessify feature by passing PCRE2_NO_AUTO_POSSESS to
+<b>pcre2_compile()</b>, or starting the pattern with (*NO_AUTO_POSSESS). In this
+case, the output changes to this:
<pre>
---&#62;aaaa
- +0 ^ ^
- +1 ^ a+
- +3 ^ ^ [bc]
- +3 ^ ^ [bc]
- +3 ^ ^ [bc]
- +3 ^^ [bc]
+ +0 ^ a+
+ +2 ^ ^ [bc]
+ +2 ^ ^ [bc]
+ +2 ^ ^ [bc]
+ +2 ^^ [bc]
No match
</pre>
This time, when matching [bc] fails, the matcher backtracks into a+ and tries
@@ -137,10 +135,10 @@ callouts such as the example above are obeyed.
</P>
<br><a name="SEC4" href="#TOC1">THE CALLOUT INTERFACE</a><br>
<P>
-During matching, when PCRE2 reaches a callout point, the external function that
-is set in the match context is called (if it is set). This applies to both
-normal and DFA matching. The only argument to the callout function is a pointer
-to a <b>pcre2_callout</b> block. This structure contains the following fields:
+During matching, when PCRE2 reaches a callout point, if an external function is
+set in the match context, it is called. This applies to both normal and DFA
+matching. The only argument to the callout function is a pointer to a
+<b>pcre2_callout</b> block. This structure contains the following fields:
<pre>
uint32_t <i>version</i>;
uint32_t <i>callout_number</i>;
@@ -169,7 +167,7 @@ automatically generated callouts).
<P>
The <i>offset_vector</i> field is a pointer to the vector of capturing offsets
(the "ovector") that was passed to the matching function in the match data
-block. When <b>pcre2_match()</b> is used, the contents can be inspected, in
+block. When <b>pcre2_match()</b> is used, the contents can be inspected in
order to extract substrings that have been matched so far, in the same way as
for extracting substrings after a match has completed. For the DFA matching
function, this field is not useful.
@@ -261,7 +259,7 @@ Cambridge, England.
</P>
<br><a name="SEC7" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 19 October 2014
+Last updated: 23 November 2014
<br>
Copyright &copy; 1997-2014 University of Cambridge.
<br>
diff --git a/doc/html/pcre2grep.html b/doc/html/pcre2grep.html
index 0a5b5cb..dd498d4 100644
--- a/doc/html/pcre2grep.html
+++ b/doc/html/pcre2grep.html
@@ -467,8 +467,8 @@ used. There is no short form for this option.
Processing some regular expression patterns can require a very large amount of
memory, leading in some cases to a program crash if not enough is available.
Other patterns may take a very long time to search for all possible matching
-strings. The <b>pcre2_exec()</b> function that is called by <b>pcre2grep</b> to do
-the matching has two parameters that can limit the resources that it uses.
+strings. The <b>pcre2_match()</b> function that is called by <b>pcre2grep</b> to
+do the matching has two parameters that can limit the resources that it uses.
<br>
<br>
The <b>--match-limit</b> option provides a means of limiting resource usage
@@ -750,7 +750,7 @@ Cambridge, England.
</P>
<br><a name="SEC14" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 28 September 2014
+Last updated: 23 November 2014
<br>
Copyright &copy; 1997-2014 University of Cambridge.
<br>
diff --git a/doc/html/pcre2jit.html b/doc/html/pcre2jit.html
index 3e3092e..08e520b 100644
--- a/doc/html/pcre2jit.html
+++ b/doc/html/pcre2jit.html
@@ -31,11 +31,11 @@ please consult the man page, in case the conversion went wrong.
<P>
Just-in-time compiling is a heavyweight optimization that can greatly speed up
pattern matching. However, it comes at the cost of extra processing before the
-match is performed. Therefore, it is of most benefit when the same pattern is
-going to be matched many times. This does not necessarily mean many calls of a
-matching function; if the pattern is not anchored, matching attempts may take
-place many times at various positions in the subject, even for a single call.
-Therefore, if the subject string is very long, it may still pay to use JIT for
+match is performed, so it is of most benefit when the same pattern is going to
+be matched many times. This does not necessarily mean many calls of a matching
+function; if the pattern is not anchored, matching attempts may take place many
+times at various positions in the subject, even for a single call. Therefore,
+if the subject string is very long, it may still pay to use JIT even for
one-off matches. JIT support is available for all of the 8-bit, 16-bit and
32-bit PCRE2 libraries.
</P>
@@ -103,7 +103,7 @@ option bits. For example, you can call it once with PCRE2_JIT_COMPLETE and
PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time it will ignore
PCRE2_JIT_COMPLETE and just compile code for partial matching. If
<b>pcre2_jit_compile()</b> is called with no option bits set, it immediately
-returns zero. This is an alternative way of testing if JIT is available.
+returns zero. This is an alternative way of testing whether JIT is available.
</P>
<P>
At present, it is not possible to free JIT compiled code except when the entire
@@ -299,7 +299,7 @@ compiled patterns, contexts, and stacks in any order, anytime. Just \fIdo
not\fP call <b>pcre2_match()</b> with a match context pointing to an already
freed stack, as that will cause SEGFAULT. (Also, do not free a stack currently
used by <b>pcre2_match()</b> in another thread). You can also replace the stack
-in a context at any time when it is not in use. You can also free the previous
+in a context at any time when it is not in use. You should free the previous
stack before assigning a replacement.
</P>
<P>
@@ -418,7 +418,7 @@ Cambridge, England.
</P>
<br><a name="SEC13" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 12 November 2014
+Last updated: 23 November 2014
<br>
Copyright &copy; 1997-2014 University of Cambridge.
<br>
diff --git a/doc/html/pcre2syntax.html b/doc/html/pcre2syntax.html
index 7937059..dca9868 100644
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@@ -421,7 +421,7 @@ appear.
(*UCP) set PCRE2_UCP (use Unicode properties for \d etc)
</pre>
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
-limits set by the caller of pcre2_exec(), not increase them.
+limits set by the caller of pcre2_match(), not increase them.
</P>
<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
<P>
@@ -553,7 +553,7 @@ Cambridge, England.
</P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 14 November 2014
+Last updated: 23 November 2014
<br>
Copyright &copy; 1997-2014 University of Cambridge.
<br>
diff --git a/doc/html/pcre2unicode.html b/doc/html/pcre2unicode.html
index 4e8c4ea..d3793f9 100644
--- a/doc/html/pcre2unicode.html
+++ b/doc/html/pcre2unicode.html
@@ -72,7 +72,7 @@ but its use can lead to some strange effects because it breaks up multi-unit
characters (see the description of \C in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation). The use of \C is not supported in the alternative matching
-function <b>pcre2_dfa_exec()</b>, nor is it supported in UTF mode by the JIT
+function <b>pcre2_dfa_match()</b>, nor is it supported in UTF mode by the JIT
optimization. If JIT optimization is requested for a UTF pattern that contains
\C, it will not succeed, and so the matching will be carried out by the normal
interpretive function.
@@ -141,15 +141,15 @@ UTF-32.)
In some situations, you may already know that your strings are valid, and
therefore want to skip these checks in order to improve performance, for
example in the case of a long subject string that is being scanned repeatedly.
-If you set the PCRE2_NO_UTF_CHECK flag at compile time or at run time, PCRE2
-assumes that the pattern or subject it is given (respectively) contains only
-valid UTF code unit sequences.
+If you set the PCRE2_NO_UTF_CHECK option at compile time or at match time,
+PCRE2 assumes that the pattern or subject it is given (respectively) contains
+only valid UTF code unit sequences.
</P>
<P>
Passing PCRE2_NO_UTF_CHECK to <b>pcre2_compile()</b> just disables the check for
the pattern; it does not also apply to subject strings. If you want to disable
-the check for a subject string you must pass this option to <b>pcre2_exec()</b>
-or <b>pcre2_dfa_exec()</b>.
+the check for a subject string you must pass this option to <b>pcre2_match()</b>
+or <b>pcre2_dfa_match()</b>.
</P>
<P>
If you pass an invalid UTF string when PCRE2_NO_UTF_CHECK is set, the result
@@ -261,7 +261,7 @@ Cambridge, England.
REVISION
</b><br>
<P>
-Last updated: 03 November 2014
+Last updated: 23 November 2014
<br>
Copyright &copy; 1997-2014 University of Cambridge.
<br>