summaryrefslogtreecommitdiff
path: root/doc/html/pcre2test.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html/pcre2test.html')
-rw-r--r--doc/html/pcre2test.html197
1 files changed, 117 insertions, 80 deletions
diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html
index 6097f02..537985d 100644
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@@ -486,7 +486,7 @@ the start of a modifier list. For example:
<pre>
abc\=notbol,notempty
</pre>
-If the subject string is empty and \= is followed by whitespace, the line is
+If the subject string is empty and \= is followed by whitespace, the line is
treated as a comment line, and is not used for matching. For example:
<pre>
\= This is a comment.
@@ -538,7 +538,7 @@ for a description of their effects.
no_utf_check set PCRE2_NO_UTF_CHECK
ucp set PCRE2_UCP
ungreedy set PCRE2_UNGREEDY
- use_offset_limit set PCRE2_USE_OFFSET_LIMIT
+ use_offset_limit set PCRE2_USE_OFFSET_LIMIT
utf set PCRE2_UTF
</pre>
As well as turning on the PCRE2_UTF option, the <b>utf</b> modifier causes all
@@ -564,7 +564,7 @@ about the pattern:
jitfast use JIT fast path
jitverify verify JIT use
locale=&#60;name&#62; use this locale
- max_pattern_length=&#60;n&#62; set the maximum pattern length
+ max_pattern_length=&#60;n&#62; set the maximum pattern length
memory show memory used
newline=&#60;type&#62; set newline type
null_context compile with a NULL context
@@ -649,9 +649,9 @@ by the item that follows it in the pattern.
Passing a NULL context
</b><br>
<P>
-Normally, <b>pcre2test</b> passes a context block to <b>pcre2_compile()</b>. If
-the <b>null_context</b> modifier is set, however, NULL is passed. This is for
-testing that <b>pcre2_compile()</b> behaves correctly in this case (it uses
+Normally, <b>pcre2test</b> passes a context block to <b>pcre2_compile()</b>. If
+the <b>null_context</b> modifier is set, however, NULL is passed. This is for
+testing that <b>pcre2_compile()</b> behaves correctly in this case (it uses
default values).
</P>
<br><b>
@@ -675,9 +675,9 @@ Generating long repetitive patterns
</b><br>
<P>
Some tests use long patterns that are very repetitive. Instead of creating a
-very long input line for such a pattern, you can use a special repetition
-feature, similar to the one described for subject lines above. If the
-<b>expand</b> modifier is present on a pattern, parts of the pattern that have
+very long input line for such a pattern, you can use a special repetition
+feature, similar to the one described for subject lines above. If the
+<b>expand</b> modifier is present on a pattern, parts of the pattern that have
the form
<pre>
\[&#60;characters&#62;]{&#60;count&#62;}
@@ -689,13 +689,13 @@ by decimal digits and "}" is found later in the pattern. If not, the characters
remain in the pattern unaltered.
</P>
<P>
-If part of an expanded pattern looks like an expansion, but is really part of
-the actual pattern, unwanted expansion can be avoided by giving two values in
-the quantifier. For example, \[AB]{6000,6000} is not recognized as an
+If part of an expanded pattern looks like an expansion, but is really part of
+the actual pattern, unwanted expansion can be avoided by giving two values in
+the quantifier. For example, \[AB]{6000,6000} is not recognized as an
expansion item.
</P>
<P>
-If the <b>info</b> modifier is set on an expanded pattern, the result of the
+If the <b>info</b> modifier is set on an expanded pattern, the result of the
expansion is included in the information that is output.
</P>
<br><b>
@@ -812,9 +812,9 @@ suite.
Limiting the pattern length
</b><br>
<P>
-The <b>max_pattern_length</b> modifier sets a limit, in code units, to the
-length of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit
-causes a compilation error. The default is the largest number a PCRE2_SIZE
+The <b>max_pattern_length</b> modifier sets a limit, in code units, to the
+length of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit
+causes a compilation error. The default is the largest number a PCRE2_SIZE
variable can hold (essentially unlimited).
</P>
<br><b>
@@ -836,13 +836,13 @@ modifiers set options for the <b>regcomp()</b> function:
ucp REG_UCP ) the POSIX standard
utf REG_UTF8 )
</pre>
-The <b>regerror_buffsize</b> modifier specifies a size for the error buffer that
+The <b>regerror_buffsize</b> modifier specifies a size for the error buffer that
is passed to <b>regerror()</b> in the event of a compilation error. For example:
<pre>
/abc/posix,regerror_buffsize=20
</pre>
-This provides a means of testing the behaviour of <b>regerror()</b> when the
-buffer is too small for the error message. If this modifier has not been set, a
+This provides a means of testing the behaviour of <b>regerror()</b> when the
+buffer is too small for the error message. If this modifier has not been set, a
large buffer is used.
</P>
<P>
@@ -892,14 +892,18 @@ are applied to every subject line that is processed with that pattern. They may
not appear in <b>#pattern</b> commands. These modifiers do not affect the
compilation process.
<pre>
- aftertext show text after match
- allaftertext show text after captures
- allcaptures show all captures
- allusedtext show all consulted text
- /g global global matching
- mark show mark values
- replace=&#60;string&#62; specify a replacement string
- startchar show starting character when relevant
+ aftertext show text after match
+ allaftertext show text after captures
+ allcaptures show all captures
+ allusedtext show all consulted text
+ /g global global matching
+ mark show mark values
+ replace=&#60;string&#62; specify a replacement string
+ startchar show starting character when relevant
+ substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
+ substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+ substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+ substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
</pre>
These modifiers may not appear in a <b>#pattern</b> command. If you want them as
defaults, set them in a <b>#subject</b> command.
@@ -964,33 +968,38 @@ information. Some of them may also be specified on a pattern line (see above),
in which case they apply to every subject line that is matched against that
pattern.
<pre>
- aftertext show text after match
- allaftertext show text after captures
- allcaptures show all captures
- allusedtext show all consulted text (non-JIT only)
- altglobal alternative global matching
- callout_capture show captures at callout time
- callout_data=&#60;n&#62; set a value to pass via callouts
- callout_fail=&#60;n&#62;[:&#60;m&#62;] control callout failure
- callout_none do not supply a callout function
- copy=&#60;number or name&#62; copy captured substring
- dfa use <b>pcre2_dfa_match()</b>
- find_limits find match and recursion limits
- get=&#60;number or name&#62; extract captured substring
- getall extract all captured substrings
- /g global global matching
- jitstack=&#60;n&#62; set size of JIT stack
- mark show mark values
- match_limit=&#60;n&#62; set a match limit
- memory show memory usage
- null_context match with a NULL context
- offset=&#60;n&#62; set starting offset
- offset_limit=&#60;n&#62; set offset limit
- ovector=&#60;n&#62; set size of output vector
- recursion_limit=&#60;n&#62; set a recursion limit
- replace=&#60;string&#62; specify a replacement string
- startchar show startchar when relevant
- zero_terminate pass the subject as zero-terminated
+ aftertext show text after match
+ allaftertext show text after captures
+ allcaptures show all captures
+ allusedtext show all consulted text (non-JIT only)
+ altglobal alternative global matching
+ callout_capture show captures at callout time
+ callout_data=&#60;n&#62; set a value to pass via callouts
+ callout_fail=&#60;n&#62;[:&#60;m&#62;] control callout failure
+ callout_none do not supply a callout function
+ copy=&#60;number or name&#62; copy captured substring
+ dfa use <b>pcre2_dfa_match()</b>
+ find_limits find match and recursion limits
+ get=&#60;number or name&#62; extract captured substring
+ getall extract all captured substrings
+ /g global global matching
+ jitstack=&#60;n&#62; set size of JIT stack
+ mark show mark values
+ match_limit=&#60;n&#62; set a match limit
+ memory show memory usage
+ null_context match with a NULL context
+ offset=&#60;n&#62; set starting offset
+ offset_limit=&#60;n&#62; set offset limit
+ ovector=&#60;n&#62; set size of output vector
+ recursion_limit=&#60;n&#62; set a recursion limit
+ replace=&#60;string&#62; specify a replacement string
+ startchar show startchar when relevant
+ startoffset=&#60;n&#62; same as offset=&#60;n&#62;
+ substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
+ substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+ substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+ substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
+ zero_terminate pass the subject as zero-terminated
</pre>
The effects of these modifiers are described in the following sections.
</P>
@@ -1129,19 +1138,34 @@ Testing the substitution function
</b><br>
<P>
If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
-called instead of one of the matching functions. Unlike subject strings,
-<b>pcre2test</b> does not process replacement strings for escape sequences. In
-UTF mode, a replacement string is checked to see if it is a valid UTF-8 string.
-If so, it is correctly converted to a UTF string of the appropriate code unit
-width. If it is not a valid UTF-8 string, the individual code units are copied
-directly. This provides a means of passing an invalid UTF-8 string for testing
-purposes.
-</P>
-<P>
-If the <b>global</b> modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
-<b>pcre2_substitute()</b>. After a successful substitution, the modified string
-is output, preceded by the number of replacements. This may be zero if there
-were no matches. Here is a simple example of a substitution test:
+called instead of one of the matching functions. Note that replacement strings
+cannot contain commas, because a comma signifies the end of a modifier. This is
+not thought to be an issue in a test program.
+</P>
+<P>
+Unlike subject strings, <b>pcre2test</b> does not process replacement strings
+for escape sequences. In UTF mode, a replacement string is checked to see if it
+is a valid UTF-8 string. If so, it is correctly converted to a UTF string of
+the appropriate code unit width. If it is not a valid UTF-8 string, the
+individual code units are copied directly. This provides a means of passing an
+invalid UTF-8 string for testing purposes.
+</P>
+<P>
+The following modifiers set options (in additional to the normal match options)
+for <b>pcre2_substitute()</b>:
+<pre>
+ global PCRE2_SUBSTITUTE_GLOBAL
+ substitute_extended PCRE2_SUBSTITUTE_EXTENDED
+ substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+ substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+ substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
+
+</PRE>
+</P>
+<P>
+After a successful substitution, the modified string is output, preceded by the
+number of replacements. This may be zero if there were no matches. Here is a
+simple example of a substitution test:
<pre>
/abc/replace=xxx
=abc=abc=
@@ -1149,12 +1173,12 @@ were no matches. Here is a simple example of a substitution test:
=abc=abc=\=global
2: =xxx=xxx=
</pre>
-Subject and replacement strings should be kept relatively short for
-substitution tests, as fixed-size buffers are used. To make it easy to test for
-buffer overflow, if the replacement string starts with a number in square
-brackets, that number is passed to <b>pcre2_substitute()</b> as the size of the
-output buffer, with the replacement string starting at the next character. Here
-is an example that tests the edge case:
+Subject and replacement strings should be kept relatively short (fewer than 256
+characters) for substitution tests, as fixed-size buffers are used. To make it
+easy to test for buffer overflow, if the replacement string starts with a
+number in square brackets, that number is passed to <b>pcre2_substitute()</b> as
+the size of the output buffer, with the replacement string starting at the next
+character. Here is an example that tests the edge case:
<pre>
/abc/
123abc123\=replace=[10]XYZ
@@ -1162,6 +1186,19 @@ is an example that tests the edge case:
123abc123\=replace=[9]XYZ
Failed: error -47: no more memory
</pre>
+The default action of <b>pcre2_substitute()</b> is to return
+PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
+PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
+<b>substitute_overflow_length</b> modifier), <b>pcre2_substitute()</b> continues
+to go through the motions of matching and substituting, in order to compute the
+size of buffer that is required. When this happens, <b>pcre2test</b> shows the
+required buffer length (which includes space for the trailing zero) as part of
+the error message. For example:
+<pre>
+ /abc/substitute_overflow_length
+ 123abc123\=replace=[9]XYZ
+ Failed: error -47: no more memory: 10 code units are needed
+</pre>
A replacement string is ignored with POSIX and DFA matching. Specifying partial
matching provokes an error return ("bad option value") from
<b>pcre2_substitute()</b>.
@@ -1236,10 +1273,10 @@ matching starts. Its value is a number of code units, not characters.
Setting an offset limit
</b><br>
<P>
-The <b>offset_limit</b> modifier sets a limit for unanchored matches. If a match
-cannot be found starting at or before this offset in the subject, a "no match"
-return is given. The data value is a number of code units, not characters. When
-this modifier is used, the <b>use_offset_limit</b> modifier must have been set
+The <b>offset_limit</b> modifier sets a limit for unanchored matches. If a match
+cannot be found starting at or before this offset in the subject, a "no match"
+return is given. The data value is a number of code units, not characters. When
+this modifier is used, the <b>use_offset_limit</b> modifier must have been set
for the pattern; if not, an error is generated.
</P>
<br><b>
@@ -1281,8 +1318,8 @@ Passing a NULL context
Normally, <b>pcre2test</b> passes a context block to <b>pcre2_match()</b>,
<b>pcre2_dfa_match()</b> or <b>pcre2_jit_match()</b>. If the <b>null_context</b>
modifier is set, however, NULL is passed. This is for testing that the matching
-functions behave correctly in this case (they use default values). This
-modifier cannot be used with the <b>find_limits</b> modifier or when testing the
+functions behave correctly in this case (they use default values). This
+modifier cannot be used with the <b>find_limits</b> modifier or when testing the
substitution function.
</P>
<br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
@@ -1623,7 +1660,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 05 November 2015
+Last updated: 12 December 2015
<br>
Copyright &copy; 1997-2015 University of Cambridge.
<br>