summaryrefslogtreecommitdiff
path: root/doc/html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html')
-rw-r--r--doc/html/pcre.html42
-rw-r--r--doc/html/pcre_config.html2
-rw-r--r--doc/html/pcre_dfa_exec.html8
-rw-r--r--doc/html/pcre_exec.html6
-rw-r--r--doc/html/pcreapi.html81
-rw-r--r--doc/html/pcrecompat.html11
-rw-r--r--doc/html/pcrecpp.html8
-rw-r--r--doc/html/pcregrep.html342
-rw-r--r--doc/html/pcrepartial.html24
-rw-r--r--doc/html/pcrepattern.html173
-rw-r--r--doc/html/pcreposix.html67
-rw-r--r--doc/html/pcreprecompile.html9
-rw-r--r--doc/html/pcretest.html33
13 files changed, 568 insertions, 238 deletions
diff --git a/doc/html/pcre.html b/doc/html/pcre.html
index 1f36924..f392371 100644
--- a/doc/html/pcre.html
+++ b/doc/html/pcre.html
@@ -156,10 +156,13 @@ If PCRE is built with Unicode character property support (which implies UTF-8
support), the escape sequences \p{..}, \P{..}, and \X are supported.
The available properties that can be tested are limited to the general
category properties such as Lu for an upper case letter or Nd for a decimal
-number. A full list is given in the
+number, the Unicode script names such as Arabic or Han, and the derived
+properties Any and L&. A full list is given in the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
-documentation. The PCRE library is increased in size by about 90K when Unicode
-property support is included.
+documentation. Only the short names for properties are supported. For example,
+\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.
+Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
+compatibility with Perl 5.6. PCRE does not support this.
</P>
<P>
The following comments apply when PCRE is running in UTF-8 mode:
@@ -177,31 +180,23 @@ PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
may crash.
</P>
<P>
-2. In a pattern, the escape sequence \x{...}, where the contents of the braces
-is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
-code number is the given hexadecimal number, for example: \x{1234}. If a
-non-hexadecimal digit appears between the braces, the item is not recognized.
-This escape sequence can be used either as a literal, or within a character
-class.
+2. An unbraced hexadecimal escape sequence (such as \xb3) matches a two-byte
+UTF-8 character if the value is greater than 127.
</P>
<P>
-3. The original hexadecimal escape sequence, \xhh, matches a two-byte UTF-8
-character if the value is greater than 127.
-</P>
-<P>
-4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
+3. Repeat quantifiers apply to complete UTF-8 characters, not to individual
bytes, for example: \x{100}{3}.
</P>
<P>
-5. The dot metacharacter matches one UTF-8 character instead of a single byte.
+4. The dot metacharacter matches one UTF-8 character instead of a single byte.
</P>
<P>
-6. The escape sequence \C can be used to match a single byte in UTF-8 mode,
+5. The escape sequence \C can be used to match a single byte in UTF-8 mode,
but its use can lead to some strange effects. This facility is not available in
the alternative matching function, <b>pcre_dfa_exec()</b>.
</P>
<P>
-7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
+6. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
test characters of any code value, but the characters that PCRE recognizes as
digits, spaces, or word characters remain the same set as before, all with
values less than 256. This remains true even when PCRE includes Unicode
@@ -210,16 +205,19 @@ cases. If you really want to test for a wider sense of, say, "digit", you
must use Unicode property tests such as \p{Nd}.
</P>
<P>
-8. Similarly, characters that match the POSIX named character classes are all
+7. Similarly, characters that match the POSIX named character classes are all
low-valued characters.
</P>
<P>
-9. Case-insensitive matching applies only to characters whose values are less
+8. Case-insensitive matching applies only to characters whose values are less
than 128, unless PCRE is built with Unicode property support. Even when Unicode
property support is available, PCRE still uses its own character tables when
checking the case of low-valued characters, so as not to degrade performance.
The Unicode property information is used only for characters with higher
-values.
+values. Even when Unicode property support is available, PCRE supports
+case-insensitive matching only when there is a one-to-one mapping between a
+letter's cases. There are a small number of many-to-one mappings in Unicode;
+these are not supported by PCRE.
</P>
<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
<P>
@@ -233,9 +231,9 @@ Cambridge CB2 3QG, England.
Putting an actual email address here seems to have been a spam magnet, so I've
taken it away. If you want to email me, use my initial and surname, separated
by a dot, at the domain ucs.cam.ac.uk.
-Last updated: 07 March 2005
+Last updated: 24 January 2006
<br>
-Copyright &copy; 1997-2005 University of Cambridge.
+Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
diff --git a/doc/html/pcre_config.html b/doc/html/pcre_config.html
index 8d8cc60..62ee2b6 100644
--- a/doc/html/pcre_config.html
+++ b/doc/html/pcre_config.html
@@ -36,6 +36,8 @@ The available codes are:
<pre>
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
+ PCRE_CONFIG_MATCH_LIMIT_RECURSION
+ Internal recursion depth limit
PCRE_CONFIG_NEWLINE Value of the newline character
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
Threshold of return slots, above
diff --git a/doc/html/pcre_dfa_exec.html b/doc/html/pcre_dfa_exec.html
index ceadb8b..10c29e2 100644
--- a/doc/html/pcre_dfa_exec.html
+++ b/doc/html/pcre_dfa_exec.html
@@ -69,13 +69,15 @@ A <b>pcre_extra</b> structure contains the following fields:
<pre>
<i>flags</i> Bits indicating which fields are set
<i>study_data</i> Opaque data from <b>pcre_study()</b>
- <i>match_limit</i> Limit on internal recursion
+ <i>match_limit</i> Limit on internal resource use
+ <i>match_limit_recursion</i> Limit on internal recursion depth
<i>callout_data</i> Opaque data passed back to callouts
<i>tables</i> Points to character tables or is NULL
</pre>
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
-PCRE_EXTRA_CALLOUT_DATA, and PCRE_EXTRA_TABLES. For DFA matching, the
-<i>match_limit</i> field is not used, and must not be set.
+PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
+PCRE_EXTRA_TABLES. For DFA matching, the <i>match_limit</i> and
+<i>match_limit_recursion</i> fields are not used, and must not be set.
</P>
<P>
There is a complete description of the PCRE native API in the
diff --git a/doc/html/pcre_exec.html b/doc/html/pcre_exec.html
index 5fae92f..c2581ee 100644
--- a/doc/html/pcre_exec.html
+++ b/doc/html/pcre_exec.html
@@ -61,12 +61,14 @@ A <b>pcre_extra</b> structure contains the following fields:
<pre>
<i>flags</i> Bits indicating which fields are set
<i>study_data</i> Opaque data from <b>pcre_study()</b>
- <i>match_limit</i> Limit on internal recursion
+ <i>match_limit</i> Limit on internal resource use
+ <i>match_limit_recursion</i> Limit on internal recursion depth
<i>callout_data</i> Opaque data passed back to callouts
<i>tables</i> Points to character tables or is NULL
</pre>
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
-PCRE_EXTRA_CALLOUT_DATA, and PCRE_EXTRA_TABLES.
+PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
+PCRE_EXTRA_TABLES.
</P>
<P>
There is a complete description of the PCRE native API in the
diff --git a/doc/html/pcreapi.html b/doc/html/pcreapi.html
index 29492a7..b4ca4c4 100644
--- a/doc/html/pcreapi.html
+++ b/doc/html/pcreapi.html
@@ -302,6 +302,12 @@ The output is an integer that gives the default limit for the number of
internal matching function calls in a <b>pcre_exec()</b> execution. Further
details are given with <b>pcre_exec()</b> below.
<pre>
+ PCRE_CONFIG_MATCH_LIMIT_RECURSION
+</pre>
+The output is an integer that gives the default limit for the depth of
+recursion when calling the internal matching function in a <b>pcre_exec()</b>
+execution. Further details are given with <b>pcre_exec()</b> below.
+<pre>
PCRE_CONFIG_STACKRECURSE
</pre>
The output is an integer that is set to one if internal recursion when running
@@ -358,8 +364,9 @@ time.
If <i>errptr</i> is NULL, <b>pcre_compile()</b> returns NULL immediately.
Otherwise, if compilation of a pattern fails, <b>pcre_compile()</b> returns
NULL, and sets the variable pointed to by <i>errptr</i> to point to a textual
-error message. The offset from the start of the pattern to the character where
-the error was discovered is placed in the variable pointed to by
+error message. This is a static string that is part of the library. You must
+not try to free it. The offset from the start of the pattern to the character
+where the error was discovered is placed in the variable pointed to by
<i>erroffset</i>, which must not be NULL. If it is, an immediate error is given.
</P>
<P>
@@ -615,9 +622,10 @@ options are defined, and this argument should always be zero.
<P>
The third argument for <b>pcre_study()</b> is a pointer for an error message. If
studying succeeds (even if no data is returned), the variable it points to is
-set to NULL. Otherwise it points to a textual error message. You should
-therefore test the error pointer for NULL after calling <b>pcre_study()</b>, to
-be sure that it has run successfully.
+set to NULL. Otherwise it is set to point to a textual error message. This is a
+static string that is part of the library. You must not try to free it. You
+should test the error pointer for NULL after calling <b>pcre_study()</b>, to be
+sure that it has run successfully.
</P>
<P>
This is a typical call to <b>pcre_study</b>():
@@ -639,7 +647,7 @@ digits, or whatever, by reference to a set of tables, indexed by character
value. When running in UTF-8 mode, this applies only to characters with codes
less than 128. Higher-valued codes never match escapes such as \w or \d, but
can be tested with \p if PCRE is built with Unicode character property
-support.
+support. The use of locales with Unicode is discouraged.
</P>
<P>
An internal set of tables is created in the default C locale when PCRE is
@@ -947,12 +955,13 @@ Extra data for <b>pcre_exec()</b>
If the <i>extra</i> argument is not NULL, it must point to a <b>pcre_extra</b>
data block. The <b>pcre_study()</b> function returns such a block (when it
doesn't return NULL), but you can also create one for yourself, and pass
-additional information in it. The fields in a <b>pcre_extra</b> block are as
-follows:
+additional information in it. The <b>pcre_extra</b> block contains the following
+fields (not necessarily in this order):
<pre>
unsigned long int <i>flags</i>;
void *<i>study_data</i>;
unsigned long int <i>match_limit</i>;
+ unsigned long int <i>match_limit_recursion</i>;
void *<i>callout_data</i>;
const unsigned char *<i>tables</i>;
</pre>
@@ -961,6 +970,7 @@ are set. The flag bits are:
<pre>
PCRE_EXTRA_STUDY_DATA
PCRE_EXTRA_MATCH_LIMIT
+ PCRE_EXTRA_MATCH_LIMIT_RECURSION
PCRE_EXTRA_CALLOUT_DATA
PCRE_EXTRA_TABLES
</pre>
@@ -977,18 +987,39 @@ classic example is the use of nested unlimited repeats.
</P>
<P>
Internally, PCRE uses a function called <b>match()</b> which it calls repeatedly
-(sometimes recursively). The limit is imposed on the number of times this
-function is called during a match, which has the effect of limiting the amount
-of recursion and backtracking that can take place. For patterns that are not
-anchored, the count starts from zero for each position in the subject string.
+(sometimes recursively). The limit set by <i>match_limit</i> is imposed on the
+number of times this function is called during a match, which has the effect of
+limiting the amount of backtracking that can take place. For patterns that are
+not anchored, the count restarts from zero for each position in the subject
+string.
</P>
<P>
-The default limit for the library can be set when PCRE is built; the default
+The default value for the limit can be set when PCRE is built; the default
default is 10 million, which handles all but the most extreme cases. You can
-reduce the default by suppling <b>pcre_exec()</b> with a <b>pcre_extra</b> block
-in which <i>match_limit</i> is set to a smaller value, and
-PCRE_EXTRA_MATCH_LIMIT is set in the <i>flags</i> field. If the limit is
-exceeded, <b>pcre_exec()</b> returns PCRE_ERROR_MATCHLIMIT.
+override the default by suppling <b>pcre_exec()</b> with a <b>pcre_extra</b>
+block in which <i>match_limit</i> is set, and PCRE_EXTRA_MATCH_LIMIT is set in
+the <i>flags</i> field. If the limit is exceeded, <b>pcre_exec()</b> returns
+PCRE_ERROR_MATCHLIMIT.
+</P>
+<P>
+The <i>match_limit_recursion</i> field is similar to <i>match_limit</i>, but
+instead of limiting the total number of times that <b>match()</b> is called, it
+limits the depth of recursion. The recursion depth is a smaller number than the
+total number of calls, because not all calls to <b>match()</b> are recursive.
+This limit is of use only if it is set smaller than <i>match_limit</i>.
+</P>
+<P>
+Limiting the recursion depth limits the amount of stack that can be used, or,
+when PCRE has been compiled to use memory on the heap instead of the stack, the
+amount of heap memory that can be used.
+</P>
+<P>
+The default value for <i>match_limit_recursion</i> can be set when PCRE is
+built; the default default is the same value as the default for
+<i>match_limit</i>. You can override the default by suppling <b>pcre_exec()</b>
+with a <b>pcre_extra</b> block in which <i>match_limit_recursion</i> is set, and
+PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the <i>flags</i> field. If the limit
+is exceeded, <b>pcre_exec()</b> returns PCRE_ERROR_RECURSIONLIMIT.
</P>
<P>
The <i>pcre_callout</i> field is used in conjunction with the "callout" feature,
@@ -1247,7 +1278,13 @@ below). It is never returned by <b>pcre_exec()</b>.
<pre>
PCRE_ERROR_MATCHLIMIT (-8)
</pre>
-The recursion and backtracking limit, as specified by the <i>match_limit</i>
+The backtracking limit, as specified by the <i>match_limit</i> field in a
+<b>pcre_extra</b> structure (or defaulted) was reached. See the description
+above.
+<pre>
+ PCRE_ERROR_RECURSIONLIMIT (-21)
+</pre>
+The internal recursion limit, as specified by the <i>match_limit_recursion</i>
field in a <b>pcre_extra</b> structure (or defaulted) was reached. See the
description above.
<pre>
@@ -1478,12 +1515,12 @@ multiple paths through the pattern tree. More workspace will be needed for
patterns and subjects where there are a lot of possible matches.
</P>
<P>
-Here is an example of a simple call to <b>pcre_exec()</b>:
+Here is an example of a simple call to <b>pcre_dfa_exec()</b>:
<pre>
int rc;
int ovector[10];
int wspace[20];
- rc = pcre_exec(
+ rc = pcre_dfa_exec(
re, /* result of pcre_compile() */
NULL, /* we didn't study the pattern */
"some string", /* the subject string */
@@ -1610,9 +1647,9 @@ error is given if the output vector is not large enough. This should be
extremely rare, as a vector of size 1000 is used.
</P>
<P>
-Last updated: 16 May 2005
+Last updated: 18 January 2006
<br>
-Copyright &copy; 1997-2005 University of Cambridge.
+Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
diff --git a/doc/html/pcrecompat.html b/doc/html/pcrecompat.html
index d15b3b0..21522f2 100644
--- a/doc/html/pcrecompat.html
+++ b/doc/html/pcrecompat.html
@@ -21,8 +21,8 @@ regular expressions. The differences described here are with respect to Perl
5.8.
</P>
<P>
-1. PCRE does not have full UTF-8 support. Details of what it does have are
-given in the
+1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
+it does have are given in the
<a href="pcre.html#utf8support">section on UTF-8 support</a>
in the main
<a href="pcre.html"><b>pcre</b></a>
@@ -57,7 +57,8 @@ encountered by PCRE, an error is generated.
6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is
built with Unicode character property support. The properties that can be
tested with \p and \P are limited to the general category properties such as
-Lu and Nd.
+Lu and Nd, script names such as Greek or Han, and the derived properties Any
+and L&.
</P>
<P>
7. PCRE does support the \Q...\E escape for quoting substrings. Characters in
@@ -146,9 +147,9 @@ different hosts that have the other endianness.
different way and is not Perl-compatible.
</P>
<P>
-Last updated: 28 February 2005
+Last updated: 24 January 2006
<br>
-Copyright &copy; 1997-2005 University of Cambridge.
+Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
diff --git a/doc/html/pcrecpp.html b/doc/html/pcrecpp.html
index 1d5acb7..8c3216d 100644
--- a/doc/html/pcrecpp.html
+++ b/doc/html/pcrecpp.html
@@ -188,13 +188,17 @@ which returns true if the modifier is set, and
<pre>
RE_Options & set_caseless(bool)
</pre>
-which sets or unsets the modifier. Moreover, PCRE_CONFIG_MATCH_LIMIT can be
+which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
functions. Setting <i>match_limit</i> to a non-zero value will limit the
execution of pcre to keep it from doing bad things like blowing the stack or
taking an eternity to return a result. A value of 5000 is good enough to stop
stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
-match limiting.
+match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
+which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
+recurses. <b>match_limit()</b> limits the number of matches PCRE does;
+<b>match_limit_recursion()</b> limits the depth of internal recursion, and
+therefore the amount of stack that is used.
</P>
<P>
Normally, to pass one or more modifiers to a RE class, you declare
diff --git a/doc/html/pcregrep.html b/doc/html/pcregrep.html
index 614034d..7b66c57 100644
--- a/doc/html/pcregrep.html
+++ b/doc/html/pcregrep.html
@@ -16,14 +16,16 @@ man page, in case the conversion went wrong.
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
<li><a name="TOC3" href="#SEC3">OPTIONS</a>
-<li><a name="TOC4" href="#SEC4">LONG OPTIONS</a>
-<li><a name="TOC5" href="#SEC5">OPTIONS WITH DATA</a>
-<li><a name="TOC6" href="#SEC6">DIAGNOSTICS</a>
-<li><a name="TOC7" href="#SEC7">AUTHOR</a>
+<li><a name="TOC4" href="#SEC4">ENVIRONMENT VARIABLES</a>
+<li><a name="TOC5" href="#SEC5">OPTIONS COMPATIBILITY</a>
+<li><a name="TOC6" href="#SEC6">OPTIONS WITH DATA</a>
+<li><a name="TOC7" href="#SEC7">MATCHING ERRORS</a>
+<li><a name="TOC8" href="#SEC8">DIAGNOSTICS</a>
+<li><a name="TOC9" href="#SEC9">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
<P>
-<b>pcregrep [options] [long options] [pattern] [file1 file2 ...]</b>
+<b>pcregrep [options] [long options] [pattern] [path1 path2 ...]</b>
</P>
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
<P>
@@ -35,8 +37,23 @@ for a full description of syntax and semantics of the regular expressions that
PCRE supports.
</P>
<P>
-A pattern must be specified on the command line unless the <b>-f</b> option is
-used (see below).
+Patterns, whether supplied on the command line or in a separate file, are given
+without delimiters. For example:
+<pre>
+ pcregrep Thursday /etc/motd
+</pre>
+If you attempt to use delimiters (for example, by surrounding a pattern with
+slashes, as is common in Perl scripts), they are interpreted as part of the
+pattern. Quotes can of course be used on the command line because they are
+interpreted by the shell, and indeed they are required if a pattern contains
+white space or shell metacharacters.
+</P>
+<P>
+The first argument that follows any option settings is treated as the single
+pattern to be matched when neither <b>-e</b> nor <b>-f</b> is present.
+Conversely, when one or both of these options are used to specify patterns, all
+arguments are treated as path names. At least one of <b>-e</b>, <b>-f</b>, or an
+argument pattern must be provided.
</P>
<P>
If no files are specified, <b>pcregrep</b> reads the standard input. The
@@ -46,8 +63,8 @@ For example:
pcregrep some-pattern /file1 - /file3
</pre>
By default, each line that matches the pattern is copied to the standard
-output, and if there is more than one file, the file name is printed before
-each line of output. However, there are options that can change how
+output, and if there is more than one file, the file name is output at the
+start of each line. However, there are options that can change how
<b>pcregrep</b> behaves. In particular, the <b>-M</b> option makes it possible to
search for patterns that span line boundaries.
</P>
@@ -55,40 +72,98 @@ search for patterns that span line boundaries.
Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.
BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>.
</P>
+<P>
+If the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variable is set,
+<b>pcregrep</b> uses the value to set a locale when calling the PCRE library.
+The <b>--locale</b> option can be used to override this.
+</P>
<br><a name="SEC3" href="#TOC1">OPTIONS</a><br>
<P>
<b>--</b>
This terminate the list of options. It is useful if the next item on the
-command line starts with a hyphen, but is not an option.
+command line starts with a hyphen but is not an option. This allows for the
+processing of patterns and filenames that start with hyphens.
</P>
<P>
-<b>-A</b> <i>number</i>
-Print <i>number</i> lines of context after each matching line. If file names
-and/or line numbers are being printed, a hyphen separator is used instead of a
-colon for the context lines. A line containing "--" is printed between each
+<b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
+Output <i>number</i> lines of context after each matching line. If filenames
+and/or line numbers are being output, a hyphen separator is used instead of a
+colon for the context lines. A line containing "--" is output between each
group of lines, unless they are in fact contiguous in the input file. The value
of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
-guarantees to have up to 8K of following text available for context printing.
+guarantees to have up to 8K of following text available for context output.
</P>
<P>
-<b>-B</b> <i>number</i>
-Print <i>number</i> lines of context before each matching line. If file names
-and/or line numbers are being printed, a hyphen separator is used instead of a
-colon for the context lines. A line containing "--" is printed between each
+<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
+Output <i>number</i> lines of context before each matching line. If filenames
+and/or line numbers are being output, a hyphen separator is used instead of a
+colon for the context lines. A line containing "--" is output between each
group of lines, unless they are in fact contiguous in the input file. The value
of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
-guarantees to have up to 8K of preceding text available for context printing.
+guarantees to have up to 8K of preceding text available for context output.
</P>
<P>
-<b>-C</b> <i>number</i>
-Print <i>number</i> lines of context both before and after each matching line.
+<b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
+Output <i>number</i> lines of context both before and after each matching line.
This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
</P>
<P>
-<b>-c</b>
-Do not print individual lines; instead just print a count of the number of
-lines that would otherwise have been printed. If several files are given, a
-count is printed for each of them.
+<b>-c</b>, <b>--count</b>
+Do not output individual lines; instead just output a count of the number of
+lines that would otherwise have been output. If several files are given, a
+count is output for each of them. In this mode, the <b>-A</b>, <b>-B</b>, and
+<b>-C</b> options are ignored.
+</P>
+<P>
+<b>--colour</b>, <b>--color</b>
+If this option is given without any data, it is equivalent to "--colour=auto".
+If data is required, it must be given in the same shell item, separated by an
+equals sign.
+</P>
+<P>
+<b>--colour=</b><i>value</i>, <b>--color=</b><i>value</i>
+This option specifies under what circumstances the part of a line that matched
+a pattern should be coloured in the output. The value may be "never" (the
+default), "always", or "auto". In the latter case, colouring happens only if
+the standard output is connected to a terminal. The colour can be specified by
+setting the environment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
+of this variable should be a string of two numbers, separated by a semicolon.
+They are copied directly into the control string for setting colour on a
+terminal, so it is your responsibility to ensure that they make sense. If
+neither of the environment variables is set, the default is "1;31", which gives
+red.
+</P>
+<P>
+<b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
+If an input path is not a regular file or a directory, "action" specifies how
+it is to be processed. Valid values are "read" (the default) or "skip"
+(silently skip the path).
+</P>
+<P>
+<b>-d</b> <i>action</i>, <b>--directories=</b><i>action</i>
+If an input path is a directory, "action" specifies how it is to be processed.
+Valid values are "read" (the default), "recurse" (equivalent to the <b>-r</b>
+option), or "skip" (silently skip the path). In the default case, directories
+are read as if they were ordinary files. In some operating systems the effect
+of reading a directory like this is an immediate end-of-file.
+</P>
+<P>
+<b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>,
+<b>--regexp=</b><i>pattern</i> Specify a pattern to be matched. This option can
+be used multiple times in order to specify several patterns. It can also be
+used as a way of specifying a single pattern that starts with a hyphen. When
+<b>-e</b> is used, no argument pattern is taken from the command line; all
+arguments are treated as file names. There is an overall maximum of 100
+patterns. They are applied to each line in the order in which they are defined
+until one matches (or fails to match if <b>-v</b> is used). If <b>-f</b> is used
+with <b>-e</b>, the command line patterns are matched first, followed by the
+patterns from the file, independent of the order in which these options are
+specified. Note that multiple use of <b>-e</b> is not the same as a single
+pattern with alternatives. For example, X|Y finds the first character in a line
+that is X or Y, whereas if the two patterns are given separately,
+<b>pcregrep</b> finds X if it is present, even if it follows Y in the line. It
+finds Y only if there is no X in the line. This really matters only if you are
+using <b>-o</b> to show the portion of the line that matched.
</P>
<P>
<b>--exclude</b>=<i>pattern</i>
@@ -99,50 +174,85 @@ both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no short
form for this option.
</P>
<P>
-<b>-f</b><i>filename</i>
-Read a number of patterns from the file, one per line, and match all of them
-against each line of input. A line is output if any of the patterns match it.
-When <b>-f</b> is used, no pattern is taken from the command line; all arguments
-are treated as file names. There is a maximum of 100 patterns. Trailing white
-space is removed, and blank lines are ignored. An empty file contains no
-patterns and therefore matches nothing.
+<b>-F</b>, <b>--fixed-strings</b>
+Interpret each pattern as a list of fixed strings, separated by newlines,
+instead of as a regular expression. The <b>-w</b> (match as a word) and <b>-x</b>
+(match whole line) options can be used with <b>-F</b>. They apply to each of the
+fixed strings. A line is selected if any of the fixed strings are found in it
+(subject to <b>-w</b> or <b>-x</b>, if present).
</P>
<P>
-<b>-h</b>
-Suppress printing of filenames when searching multiple files.
+<b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
+Read a number of patterns from the file, one per line, and match them against
+each line of input. A data line is output if any of the patterns match it. The
+filename can be given as "-" to refer to the standard input. When <b>-f</b> is
+used, patterns specified on the command line using <b>-e</b> may also be
+present; they are tested before the file's patterns. However, no other pattern
+is taken from the command line; all arguments are treated as file names. There
+is an overall maximum of 100 patterns. Trailing white space is removed from
+each line, and blank lines are ignored. An empty file contains no patterns and
+therefore matches nothing.
</P>
<P>
-<b>-i</b>
+<b>-H</b>, <b>--with-filename</b>
+Force the inclusion of the filename at the start of output lines when searching
+a single file. By default, the filename is not shown in this case. For matching
+lines, the filename is followed by a colon and a space; for context lines, a
+hyphen separator is used. If a line number is also being output, it follows the
+file name without a space.
+</P>
+<P>
+<b>-h</b>, <b>--no-filename</b>
+Suppress the output filenames when searching multiple files. By default,
+filenames are shown when multiple files are searched. For matching lines, the
+filename is followed by a colon and a space; for context lines, a hyphen
+separator is used. If a line number is also being output, it follows the file
+name without a space.
+</P>
+<P>
+<b>--help</b>
+Output a brief help message and exit.
+</P>
+<P>
+<b>-i</b>, <b>--ignore-case</b>
Ignore upper/lower case distinctions during comparisons.
</P>
<P>
<b>--include</b>=<i>pattern</i>
When <b>pcregrep</b> is searching the files in a directory as a consequence of
-the <b>-r</b> (recursive search) option, only files whose names match the
+the <b>-r</b> (recursive search) option, only those files whose names match the
pattern are included. The pattern is a PCRE regular expression. If a file name
matches both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no
short form for this option.
</P>
<P>
-<b>-L</b>
-Instead of printing lines from the files, just print the names of the files
-that do not contain any lines that would have been printed. Each file name is
-printed once, on a separate line.
+<b>-L</b>, <b>--files-without-match</b>
+Instead of outputting lines from the files, just output the names of the files
+that do not contain any lines that would have been output. Each file name is
+output once, on a separate line.
</P>
<P>
-<b>-l</b>
-Instead of printing lines from the files, just print the names of the files
-containing lines that would have been printed. Each file name is printed
-once, on a separate line.
+<b>-l</b>, <b>--files-with-matches</b>
+Instead of outputting lines from the files, just output the names of the files
+containing lines that would have been output. Each file name is output
+once, on a separate line. Searching stops as soon as a matching line is found
+in a file.
</P>
<P>
<b>--label</b>=<i>name</i>
This option supplies a name to be used for the standard input when file names
-are being printed. If not supplied, "(standard input)" is used. There is no
+are being output. If not supplied, "(standard input)" is used. There is no
short form for this option.
</P>
<P>
-<b>-M</b>
+<b>--locale</b>=<i>locale-name</i>
+This option specifies a locale to be used for pattern matching. It overrides
+the value in the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variables. If no
+locale is specified, the PCRE library's default (usually the "C" locale) is
+used. There is no short form for this option.
+</P>
+<P>
+<b>-M</b>, <b>--multiline</b>
Allow patterns to match more than one line. When this option is given, patterns
may usefully contain literal newline characters and internal occurrences of ^
and $ characters. The output for any one match may consist of more than one
@@ -155,84 +265,80 @@ the previous 8K characters (or all the previous characters, if fewer than 8K)
are guaranteed to be available for lookbehind assertions.
</P>
<P>
-<b>-n</b>
-Precede each line by its line number in the file.
+<b>-n</b>, <b>--line-number</b>
+Precede each output line by its line number in the file, followed by a colon
+and a space for matching lines or a hyphen and a space for context lines. If
+the filename is also being output, it precedes the line number.
+</P>
+<P>
+<b>-o</b>, <b>--only-matching</b>
+Show only the part of the line that matched a pattern. In this mode, no
+context is shown. That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b> options are
+ignored.
</P>
<P>
-<b>-q</b>
-Work quietly, that is, display nothing except error messages.
-The exit status indicates whether or not any matches were found.
+<b>-q</b>, <b>--quiet</b>
+Work quietly, that is, display nothing except error messages. The exit
+status indicates whether or not any matches were found.
</P>
<P>
-<b>-r</b>
+<b>-r</b>, <b>--recursive</b>
If any given path is a directory, recursively scan the files it contains,
-taking note of any <b>--include</b> and <b>--exclude</b> settings. Without
-<b>-r</b> a directory is scanned as a normal file.
+taking note of any <b>--include</b> and <b>--exclude</b> settings. By default, a
+directory is read as a normal file; in some operating systems this gives an
+immediate end-of-file. This option is a shorthand for setting the <b>-d</b>
+option to "recurse".
</P>
<P>
-<b>-s</b>
+<b>-s</b>, <b>--no-messages</b>
Suppress error messages about non-existent or unreadable files. Such files are
quietly skipped. However, the return code is still 2, even if matches were
found in other files.
</P>
<P>
-<b>-u</b>
+<b>-u</b>, <b>--utf-8</b>
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
-with UTF-8 support. Both the pattern and each subject line must be valid
-strings of UTF-8 characters.
+with UTF-8 support. Both patterns and subject lines must be valid strings of
+UTF-8 characters.
</P>
<P>
-<b>-V</b>
+<b>-V</b>, <b>--version</b>
Write the version numbers of <b>pcregrep</b> and the PCRE library that is being
used to the standard error stream.
</P>
<P>
-<b>-v</b>
-Invert the sense of the match, so that lines which do <i>not</i> match the
-pattern are the ones that are found.
+<b>-v</b>, <b>--invert-match</b>
+Invert the sense of the match, so that lines which do <i>not</i> match any of
+the patterns are the ones that are found.
</P>
<P>
-<b>-w</b>
-Force the pattern to match only whole words. This is equivalent to having \b
+<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
+Force the patterns to match only whole words. This is equivalent to having \b
at the start and end of the pattern.
</P>
<P>
-<b>-x</b>
-Force the pattern to be anchored (it must start matching at the beginning of
-the line) and in addition, require it to match the entire line. This is
+<b>-x</b>, <b>--line-regex</b>, \fP--line-regexp\fP
+Force the patterns to be anchored (each must start matching at the beginning of
+a line) and in addition, require them to match entire lines. This is
equivalent to having ^ and $ characters at the start and end of each
-alternative branch in the regular expression.
+alternative branch in every pattern.
</P>
-<br><a name="SEC4" href="#TOC1">LONG OPTIONS</a><br>
+<br><a name="SEC4" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
<P>
-Long forms of all the options are available, as in GNU grep. They are shown in
-the following table:
-<pre>
- -A --after-context
- -B --before-context
- -C --context
- -c --count
- --exclude (no short form)
- -f --file
- -h --no-filename
- --help (no short form)
- -i --ignore-case
- --include (no short form)
- -L --files-without-match
- -l --files-with-matches
- --label (no short form)
- -n --line-number
- -r --recursive
- -q --quiet
- -s --no-messages
- -u --utf-8
- -V --version
- -v --invert-match
- -x --line-regex
- -x --line-regexp
-</PRE>
-</P>
-<br><a name="SEC5" href="#TOC1">OPTIONS WITH DATA</a><br>
+The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
+order, for a locale. The first one that is set is used. This can be overridden
+by the <b>--locale</b> option. If no locale is set, the PCRE library's default
+(usually the "C" locale) is used.
+</P>
+<br><a name="SEC5" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
+<P>
+The majority of short and long forms of <b>pcregrep</b>'s options are the same
+as in the GNU <b>grep</b> program. Any long option of the form
+<b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
+(PCRE terminology). However, the <b>--locale</b>, <b>-M</b>, <b>--multiline</b>,
+<b>-u</b>, and <b>--utf-8</b> options are specific to <b>pcregrep</b>.
+</P>
+<br><a name="SEC6" href="#TOC1">OPTIONS WITH DATA</a><br>
<P>
There are four different ways in which an option with data can be specified.
If a short form option is used, the data may follow immediately, or in the next
@@ -242,22 +348,42 @@ command line item. For example:
-f /some/file
</pre>
If a long form option is used, the data may appear in the same command line
-item, separated by an = character, or it may appear in the next command line
-item. For example:
+item, separated by an equals character, or (with one exception) it may appear
+in the next command line item. For example:
<pre>
--file=/some/file
--file /some/file
-
-</PRE>
+</pre>
+Note, however, that if you want to supply a file name beginning with ~ as data
+in a shell command, and have the shell expand ~ to a home directory, you must
+separate the file name from the option, because the shell does not treat ~
+specially unless it is at the start of an item.
+</P>
+<P>
+The exception to the above is the <b>--colour</b> (or <b>--color</b>) option,
+for which the data is optional. If this option does have data, it must be given
+in the first form, using an equals character. Otherwise it will be assumed that
+it has no data.
+</P>
+<br><a name="SEC7" href="#TOC1">MATCHING ERRORS</a><br>
+<P>
+It is possible to supply a regular expression that takes a very long time to
+fail to match certain lines. Such patterns normally involve nested indefinite
+repeats, for example: (a+)*\d when matched against a line of a's with no final
+digit. The PCRE matching function has a resource limit that causes it to abort
+in these circumstances. If this happens, <b>pcregrep</b> outputs an error
+message and the line that caused the problem to the standard error stream. If
+there are more than 20 such errors, <b>pcregrep</b> gives up.
</P>
-<br><a name="SEC6" href="#TOC1">DIAGNOSTICS</a><br>
+<br><a name="SEC8" href="#TOC1">DIAGNOSTICS</a><br>
<P>
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
for syntax errors and non-existent or inacessible files (even if matches were
-found in other files). Using the <b>-s</b> option to suppress error messages
-about inaccessble files does not affect the return code.
+found in other files) or too many matching errors. Using the <b>-s</b> option to
+suppress error messages about inaccessble files does not affect the return
+code.
</P>
-<br><a name="SEC7" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC9" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
@@ -266,9 +392,9 @@ University Computing Service
Cambridge CB2 3QG, England.
</P>
<P>
-Last updated: 16 May 2005
+Last updated: 23 January 2006
<br>
-Copyright &copy; 1997-2005 University of Cambridge.
+Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
diff --git a/doc/html/pcrepartial.html b/doc/html/pcrepartial.html
index 0dbfc02..acb93c2 100644
--- a/doc/html/pcrepartial.html
+++ b/doc/html/pcrepartial.html
@@ -197,9 +197,29 @@ Because of this phenomenon, it does not usually make sense to end a pattern
that is going to be matched in this way with a variable repeat.
</P>
<P>
-Last updated: 28 February 2005
+4. Patterns that contain alternatives at the top level which do not all
+start with the same pattern item may not work as expected. For example,
+consider this pattern:
+<pre>
+ 1234|3789
+</pre>
+If the first part of the subject is "ABC123", a partial match of the first
+alternative is found at offset 3. There is no partial match for the second
+alternative, because such a match does not start at the same point in the
+subject string. Attempting to continue with the string "789" does not yield a
+match because only those alternatives that match at one point in the subject
+are remembered. The problem arises because the start of the second alternative
+matches within the first alternative. There is no problem with anchored
+patterns or patterns such as:
+<pre>
+ 1234|ABCD
+</pre>
+where no string can be a partial match for both alternatives.
+</P>
+<P>
+Last updated: 16 January 2006
<br>
-Copyright &copy; 1997-2005 University of Cambridge.
+Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
index 0f77b32..6df9ed8 100644
--- a/doc/html/pcrepattern.html
+++ b/doc/html/pcrepattern.html
@@ -175,7 +175,7 @@ represents:
\t tab (hex 09)
\ddd character with octal code ddd, or backreference
\xhh character with hex code hh
- \x{hhh..} character with hex code hhh... (UTF-8 mode only)
+ \x{hhh..} character with hex code hhh..
</pre>
The precise effect of \cx is as follows: if x is a lower case letter, it
is converted to upper case. Then bit 6 of the character (hex 40) is inverted.
@@ -184,18 +184,18 @@ Thus \cz becomes hex 1A, but \c{ becomes hex 3B, while \c; becomes hex
</P>
<P>
After \x, from zero to two hexadecimal digits are read (letters can be in
-upper or lower case). In UTF-8 mode, any number of hexadecimal digits may
-appear between \x{ and }, but the value of the character code must be less
-than 2**31 (that is, the maximum hexadecimal value is 7FFFFFFF). If characters
-other than hexadecimal digits appear between \x{ and }, or if there is no
-terminating }, this form of escape is not recognized. Instead, the initial
-\x will be interpreted as a basic hexadecimal escape, with no following
-digits, giving a character whose value is zero.
+upper or lower case). Any number of hexadecimal digits may appear between \x{
+and }, but the value of the character code must be less than 256 in non-UTF-8
+mode, and less than 2**31 in UTF-8 mode (that is, the maximum hexadecimal value
+is 7FFFFFFF). If characters other than hexadecimal digits appear between \x{
+and }, or if there is no terminating }, this form of escape is not recognized.
+Instead, the initial \x will be interpreted as a basic hexadecimal escape,
+with no following digits, giving a character whose value is zero.
</P>
<P>
Characters whose value is less than 256 can be defined by either of the two
-syntaxes for \x when PCRE is in UTF-8 mode. There is no difference in the
-way they are handled. For example, \xdc is exactly the same as \x{dc}.
+syntaxes for \x. There is no difference in the way they are handled. For
+example, \xdc is exactly the same as \x{dc}.
</P>
<P>
After \0 up to two further octal digits are read. In both cases, if there
@@ -285,36 +285,117 @@ greater than 128 are used for accented letters, and these are matched by \w.
<P>
In UTF-8 mode, characters with values greater than 128 never match \d, \s, or
\w, and always match \D, \S, and \W. This is true even when Unicode
-character property support is available.
+character property support is available. The use of locales with Unicode is
+discouraged.
<a name="uniextseq"></a></P>
<br><b>
Unicode character properties
</b><br>
<P>
When PCRE is built with Unicode character property support, three additional
-escape sequences to match generic character types are available when UTF-8 mode
+escape sequences to match character properties are available when UTF-8 mode
is selected. They are:
<pre>
- \p{<i>xx</i>} a character with the <i>xx</i> property
- \P{<i>xx</i>} a character without the <i>xx</i> property
- \X an extended Unicode sequence
-</pre>
-The property names represented by <i>xx</i> above are limited to the
-Unicode general category properties. Each character has exactly one such
-property, specified by a two-letter abbreviation. For compatibility with Perl,
-negation can be specified by including a circumflex between the opening brace
-and the property name. For example, \p{^Lu} is the same as \P{Lu}.
-</P>
-<P>
-If only one letter is specified with \p or \P, it includes all the properties
-that start with that letter. In this case, in the absence of negation, the
-curly brackets in the escape sequence are optional; these two examples have
-the same effect:
+ \p{<i>xx</i>} a character with the <i>xx</i> property
+ \P{<i>xx</i>} a character without the <i>xx</i> property
+ \X an extended Unicode sequence
+</pre>
+The property names represented by <i>xx</i> above are limited to the Unicode
+script names, the general category properties, and "Any", which matches any
+character (including newline). Other properties such as "InMusicalSymbols" are
+not currently supported by PCRE. Note that \P{Any} does not match any
+characters, so always causes a match failure.
+</P>
+<P>
+Sets of Unicode characters are defined as belonging to certain scripts. A
+character from one of these sets can be matched using a script name. For
+example:
+<pre>
+ \p{Greek}
+ \P{Han}
+</pre>
+Those that are not part of an identified script are lumped together as
+"Common". The current list of scripts is:
+</P>
+<P>
+Arabic,
+Armenian,
+Bengali,
+Bopomofo,
+Braille,
+Buginese,
+Buhid,
+Canadian_Aboriginal,
+Cherokee,
+Common,
+Coptic,
+Cypriot,
+Cyrillic,
+Deseret,
+Devanagari,
+Ethiopic,
+Georgian,
+Glagolitic,
+Gothic,
+Greek,
+Gujarati,
+Gurmukhi,
+Han,
+Hangul,
+Hanunoo,
+Hebrew,
+Hiragana,
+Inherited,
+Kannada,
+Katakana,
+Kharoshthi,
+Khmer,
+Lao,
+Latin,
+Limbu,
+Linear_B,
+Malayalam,
+Mongolian,
+Myanmar,
+New_Tai_Lue,
+Ogham,
+Old_Italic,
+Old_Persian,
+Oriya,
+Osmanya,
+Runic,
+Shavian,
+Sinhala,
+Syloti_Nagri,
+Syriac,
+Tagalog,
+Tagbanwa,
+Tai_Le,
+Tamil,
+Telugu,
+Thaana,
+Thai,
+Tibetan,
+Tifinagh,
+Ugaritic,
+Yi.
+</P>
+<P>
+Each character has exactly one general category property, specified by a
+two-letter abbreviation. For compatibility with Perl, negation can be specified
+by including a circumflex between the opening brace and the property name. For
+example, \p{^Lu} is the same as \P{Lu}.
+</P>
+<P>
+If only one letter is specified with \p or \P, it includes all the general
+category properties that start with that letter. In this case, in the absence
+of negation, the curly brackets in the escape sequence are optional; these two
+examples have the same effect:
<pre>
\p{L}
\pL
</pre>
-The following property codes are supported:
+The following general category property codes are supported:
<pre>
C Other
Cc Control
@@ -360,8 +441,19 @@ The following property codes are supported:
Zp Paragraph separator
Zs Space separator
</pre>
-Extended properties such as "Greek" or "InMusicalSymbols" are not supported by
-PCRE.
+The special property L& is also supported: it matches a character that has
+the Lu, Ll, or Lt property, in other words, a letter that is not classified as
+a modifier or "other".
+</P>
+<P>
+The long synonyms for these properties that Perl supports (such as \p{Letter})
+are not supported by PCRE. Nor is is permitted to prefix any of these
+properties with "Is".
+</P>
+<P>
+No character that is in the Unicode table has the Cn (unassigned) property.
+Instead, this property is assumed for any code point that is not in the
+Unicode table.
</P>
<P>
Specifying caseless matching does not affect these escape sequences. For
@@ -1360,14 +1452,19 @@ number, provided that it occurs inside that subpattern. (If not, it is a
(?R) is a recursive call of the entire regular expression.
</P>
<P>
-For example, this PCRE pattern solves the nested parentheses problem (assume
-the PCRE_EXTENDED option is set so that white space is ignored):
+A recursive subpattern call is always treated as an atomic group. That is, once
+it has matched some of the subject string, it is never re-entered, even if
+it contains untried alternatives and there is a subsequent matching failure.
+</P>
+<P>
+This PCRE pattern solves the nested parentheses problem (assume the
+PCRE_EXTENDED option is set so that white space is ignored):
<pre>
\( ( (?&#62;[^()]+) | (?R) )* \)
</pre>
First it matches an opening parenthesis. Then it matches any number of
substrings which can either be a sequence of non-parentheses, or a recursive
-match of the pattern itself (that is a correctly parenthesized substring).
+match of the pattern itself (that is, a correctly parenthesized substring).
Finally there is a closing parenthesis.
</P>
<P>
@@ -1450,6 +1547,12 @@ is used, it does match "sense and responsibility" as well as the other two
strings. Such references must, however, follow the subpattern to which they
refer.
</P>
+<P>
+Like recursive subpatterns, a "subroutine" call is always treated as an atomic
+group. That is, once it has matched some of the subject string, it is never
+re-entered, even if it contains untried alternatives and there is a subsequent
+matching failure.
+</P>
<br><a name="SEC20" href="#TOC1">CALLOUTS</a><br>
<P>
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
@@ -1486,9 +1589,9 @@ description of the interface to the callout function is given in the
documentation.
</P>
<P>
-Last updated: 28 February 2005
+Last updated: 24 January 2006
<br>
-Copyright &copy; 1997-2005 University of Cambridge.
+Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
diff --git a/doc/html/pcreposix.html b/doc/html/pcreposix.html
index 53ea2aa..9bf36ca 100644
--- a/doc/html/pcreposix.html
+++ b/doc/html/pcreposix.html
@@ -59,10 +59,10 @@ call the native ones, it is also necessary to add <b>-lpcre</b>.
</P>
<P>
I have implemented only those option bits that can be reasonably mapped to PCRE
-native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
-with the value zero. They have no effect, but since programs that are written
-to the POSIX interface often use them, this makes it easier to slot in PCRE as
-a replacement library. Other POSIX options are not even defined.
+native options. In addition, the option REG_EXTENDED is defined with the value
+zero. This has no effect, but since programs that are written to the POSIX
+interface often use it, this makes it easier to slot in PCRE as a replacement
+library. Other POSIX options are not even defined.
</P>
<P>
When PCRE is called via these functions, it is only the API that is POSIX-like
@@ -89,7 +89,7 @@ The function <b>regcomp()</b> is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
to a <b>regex_t</b> structure that is used as a base for storing information
-about the compiled expression.
+about the compiled regular expression.
</P>
<P>
The argument <i>cflags</i> is either zero, or contains one or more of the bits
@@ -97,19 +97,35 @@ defined by the following macros:
<pre>
REG_DOTALL
</pre>
-The PCRE_DOTALL option is set when the expression is passed for compilation to
-the native function. Note that REG_DOTALL is not part of the POSIX standard.
+The PCRE_DOTALL option is set when the regular expression is passed for
+compilation to the native function. Note that REG_DOTALL is not part of the
+POSIX standard.
<pre>
REG_ICASE
</pre>
-The PCRE_CASELESS option is set when the expression is passed for compilation
-to the native function.
+The PCRE_CASELESS option is set when the regular expression is passed for
+compilation to the native function.
<pre>
REG_NEWLINE
</pre>
-The PCRE_MULTILINE option is set when the expression is passed for compilation
-to the native function. Note that this does <i>not</i> mimic the defined POSIX
-behaviour for REG_NEWLINE (see the following section).
+The PCRE_MULTILINE option is set when the regular expression is passed for
+compilation to the native function. Note that this does <i>not</i> mimic the
+defined POSIX behaviour for REG_NEWLINE (see the following section).
+<pre>
+ REG_NOSUB
+</pre>
+The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
+for compilation to the native function. In addition, when a pattern that is
+compiled with this flag is passed to <b>regexec()</b> for matching, the
+<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
+are returned.
+<pre>
+ REG_UTF8
+</pre>
+The PCRE_UTF8 option is set when the regular expression is passed for
+compilation to the native function. This causes the pattern itself and all data
+strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
+is not part of the POSIX standard.
</P>
<P>
In the absence of these flags, no options are passed to the native function.
@@ -177,15 +193,20 @@ The PCRE_NOTEOL option is set when calling the underlying PCRE matching
function.
</P>
<P>
-The portion of the string that was matched, and also any captured substrings,
-are returned via the <i>pmatch</i> argument, which points to an array of
-<i>nmatch</i> structures of type <i>regmatch_t</i>, containing the members
-<i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first character of
-each substring and the offset to the first character after the end of each
-substring, respectively. The 0th element of the vector relates to the entire
-portion of <i>string</i> that was matched; subsequent elements relate to the
-capturing subpatterns of the regular expression. Unused entries in the array
-have both structure members set to -1.
+If the pattern was compiled with the REG_NOSUB flag, no data about any matched
+strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
+<b>regexec()</b> are ignored.
+</P>
+<P>
+Otherwise,the portion of the string that was matched, and also any captured
+substrings, are returned via the <i>pmatch</i> argument, which points to an
+array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
+members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
+character of each substring and the offset to the first character after the end
+of each substring, respectively. The 0th element of the vector relates to the
+entire portion of <i>string</i> that was matched; subsequent elements relate to
+the capturing subpatterns of the regular expression. Unused entries in the
+array have both structure members set to -1.
</P>
<P>
A successful match yields a zero return; various error codes are defined in the
@@ -215,9 +236,9 @@ University Computing Service,
Cambridge CB2 3QG, England.
</P>
<P>
-Last updated: 28 February 2005
+Last updated: 16 January 2006
<br>
-Copyright &copy; 1997-2005 University of Cambridge.
+Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
diff --git a/doc/html/pcreprecompile.html b/doc/html/pcreprecompile.html
index 4cf8add..00de3b0 100644
--- a/doc/html/pcreprecompile.html
+++ b/doc/html/pcreprecompile.html
@@ -127,9 +127,14 @@ advertised), you will have to recompile them for release 5.0. However, from now
on, it should be possible to make changes in a compatible manner.
</P>
<P>
-Last updated: 28 February 2005
+Notwithstanding the above, if you have any saved patterns in UTF-8 mode that
+use \p or \P that were compiled with any release up to and including 6.4, you
+will have to recompile them for release 6.5 and above.
+</P>
+<P>
+Last updated: 01 February 2006
<br>
-Copyright &copy; 1997-2005 University of Cambridge.
+Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
diff --git a/doc/html/pcretest.html b/doc/html/pcretest.html
index c43c8cb..8e97655 100644
--- a/doc/html/pcretest.html
+++ b/doc/html/pcretest.html
@@ -84,6 +84,10 @@ used to call PCRE. None of the other options has any effect when <b>-p</b> is
set.
</P>
<P>
+\fP-q\fP
+Do not output the version number of <b>pcretest</b> at the start of execution.
+</P>
+<P>
<b>-t</b>
Run each compile, study, and match many times with a timer, and output
resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with
@@ -291,7 +295,8 @@ recognized:
\Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin-
ated by next non-alphanumeric character)
\L call pcre_get_substringlist() after a successful match
- \M discover the minimum MATCH_LIMIT setting
+ \M discover the minimum MATCH_LIMIT and
+ MATCH_LIMIT_RECURSION settings
\N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b>
\Odd set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
\P pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
@@ -308,13 +313,16 @@ an empty line as data, since a real empty line terminates the data input.
</P>
<P>
If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with
-different values in the <i>match_limit</i> field of the <b>pcre_extra</b> data
-structure, until it finds the minimum number that is needed for
-<b>pcre_exec()</b> to complete. This number is a measure of the amount of
-recursion and backtracking that takes place, and checking it out can be
-instructive. For most simple matches, the number is quite small, but for
-patterns with very large numbers of matching possibilities, it can become large
-very quickly with increasing length of subject string.
+different values in the <i>match_limit</i> and <i>match_limit_recursion</i>
+fields of the <b>pcre_extra</b> data structure, until it finds the minimum
+numbers for each parameter that allow <b>pcre_exec()</b> to complete. The
+<i>match_limit</i> number is a measure of the amount of backtracking that takes
+place, and checking it out can be instructive. For most simple matches, the
+number is quite small, but for patterns with very large numbers of matching
+possibilities, it can become large very quickly with increasing length of
+subject string. The <i>match_limit_recursion</i> number is a measure of how much
+stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
+to complete the match attempt.
</P>
<P>
When \O is used, the value specified may be higher or lower than the size set
@@ -323,8 +331,9 @@ the call of <b>pcre_exec()</b> for the line in which it appears.
</P>
<P>
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper
-API to be used, only \B and \Z have any effect, causing REG_NOTBOL and
-REG_NOTEOL to be passed to <b>regexec()</b> respectively.
+API to be used, the only option-setting sequences that have any effect are \B
+and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
+<b>regexec()</b>.
</P>
<P>
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
@@ -572,9 +581,9 @@ University Computing Service,
Cambridge CB2 3QG, England.
</P>
<P>
-Last updated: 28 February 2005
+Last updated: 18 January 2006
<br>
-Copyright &copy; 1997-2005 University of Cambridge.
+Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>