diff options
Diffstat (limited to 'doc/html/pcre2test.html')
-rw-r--r-- | doc/html/pcre2test.html | 197 |
1 files changed, 117 insertions, 80 deletions
diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html index 6097f02..537985d 100644 --- a/doc/html/pcre2test.html +++ b/doc/html/pcre2test.html @@ -486,7 +486,7 @@ the start of a modifier list. For example: <pre> abc\=notbol,notempty </pre> -If the subject string is empty and \= is followed by whitespace, the line is +If the subject string is empty and \= is followed by whitespace, the line is treated as a comment line, and is not used for matching. For example: <pre> \= This is a comment. @@ -538,7 +538,7 @@ for a description of their effects. no_utf_check set PCRE2_NO_UTF_CHECK ucp set PCRE2_UCP ungreedy set PCRE2_UNGREEDY - use_offset_limit set PCRE2_USE_OFFSET_LIMIT + use_offset_limit set PCRE2_USE_OFFSET_LIMIT utf set PCRE2_UTF </pre> As well as turning on the PCRE2_UTF option, the <b>utf</b> modifier causes all @@ -564,7 +564,7 @@ about the pattern: jitfast use JIT fast path jitverify verify JIT use locale=<name> use this locale - max_pattern_length=<n> set the maximum pattern length + max_pattern_length=<n> set the maximum pattern length memory show memory used newline=<type> set newline type null_context compile with a NULL context @@ -649,9 +649,9 @@ by the item that follows it in the pattern. Passing a NULL context </b><br> <P> -Normally, <b>pcre2test</b> passes a context block to <b>pcre2_compile()</b>. If -the <b>null_context</b> modifier is set, however, NULL is passed. This is for -testing that <b>pcre2_compile()</b> behaves correctly in this case (it uses +Normally, <b>pcre2test</b> passes a context block to <b>pcre2_compile()</b>. If +the <b>null_context</b> modifier is set, however, NULL is passed. This is for +testing that <b>pcre2_compile()</b> behaves correctly in this case (it uses default values). </P> <br><b> @@ -675,9 +675,9 @@ Generating long repetitive patterns </b><br> <P> Some tests use long patterns that are very repetitive. Instead of creating a -very long input line for such a pattern, you can use a special repetition -feature, similar to the one described for subject lines above. If the -<b>expand</b> modifier is present on a pattern, parts of the pattern that have +very long input line for such a pattern, you can use a special repetition +feature, similar to the one described for subject lines above. If the +<b>expand</b> modifier is present on a pattern, parts of the pattern that have the form <pre> \[<characters>]{<count>} @@ -689,13 +689,13 @@ by decimal digits and "}" is found later in the pattern. If not, the characters remain in the pattern unaltered. </P> <P> -If part of an expanded pattern looks like an expansion, but is really part of -the actual pattern, unwanted expansion can be avoided by giving two values in -the quantifier. For example, \[AB]{6000,6000} is not recognized as an +If part of an expanded pattern looks like an expansion, but is really part of +the actual pattern, unwanted expansion can be avoided by giving two values in +the quantifier. For example, \[AB]{6000,6000} is not recognized as an expansion item. </P> <P> -If the <b>info</b> modifier is set on an expanded pattern, the result of the +If the <b>info</b> modifier is set on an expanded pattern, the result of the expansion is included in the information that is output. </P> <br><b> @@ -812,9 +812,9 @@ suite. Limiting the pattern length </b><br> <P> -The <b>max_pattern_length</b> modifier sets a limit, in code units, to the -length of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit -causes a compilation error. The default is the largest number a PCRE2_SIZE +The <b>max_pattern_length</b> modifier sets a limit, in code units, to the +length of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit +causes a compilation error. The default is the largest number a PCRE2_SIZE variable can hold (essentially unlimited). </P> <br><b> @@ -836,13 +836,13 @@ modifiers set options for the <b>regcomp()</b> function: ucp REG_UCP ) the POSIX standard utf REG_UTF8 ) </pre> -The <b>regerror_buffsize</b> modifier specifies a size for the error buffer that +The <b>regerror_buffsize</b> modifier specifies a size for the error buffer that is passed to <b>regerror()</b> in the event of a compilation error. For example: <pre> /abc/posix,regerror_buffsize=20 </pre> -This provides a means of testing the behaviour of <b>regerror()</b> when the -buffer is too small for the error message. If this modifier has not been set, a +This provides a means of testing the behaviour of <b>regerror()</b> when the +buffer is too small for the error message. If this modifier has not been set, a large buffer is used. </P> <P> @@ -892,14 +892,18 @@ are applied to every subject line that is processed with that pattern. They may not appear in <b>#pattern</b> commands. These modifiers do not affect the compilation process. <pre> - aftertext show text after match - allaftertext show text after captures - allcaptures show all captures - allusedtext show all consulted text - /g global global matching - mark show mark values - replace=<string> specify a replacement string - startchar show starting character when relevant + aftertext show text after match + allaftertext show text after captures + allcaptures show all captures + allusedtext show all consulted text + /g global global matching + mark show mark values + replace=<string> specify a replacement string + startchar show starting character when relevant + substitute_extended use PCRE2_SUBSTITUTE_EXTENDED + substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH + substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET + substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY </pre> These modifiers may not appear in a <b>#pattern</b> command. If you want them as defaults, set them in a <b>#subject</b> command. @@ -964,33 +968,38 @@ information. Some of them may also be specified on a pattern line (see above), in which case they apply to every subject line that is matched against that pattern. <pre> - aftertext show text after match - allaftertext show text after captures - allcaptures show all captures - allusedtext show all consulted text (non-JIT only) - altglobal alternative global matching - callout_capture show captures at callout time - callout_data=<n> set a value to pass via callouts - callout_fail=<n>[:<m>] control callout failure - callout_none do not supply a callout function - copy=<number or name> copy captured substring - dfa use <b>pcre2_dfa_match()</b> - find_limits find match and recursion limits - get=<number or name> extract captured substring - getall extract all captured substrings - /g global global matching - jitstack=<n> set size of JIT stack - mark show mark values - match_limit=<n> set a match limit - memory show memory usage - null_context match with a NULL context - offset=<n> set starting offset - offset_limit=<n> set offset limit - ovector=<n> set size of output vector - recursion_limit=<n> set a recursion limit - replace=<string> specify a replacement string - startchar show startchar when relevant - zero_terminate pass the subject as zero-terminated + aftertext show text after match + allaftertext show text after captures + allcaptures show all captures + allusedtext show all consulted text (non-JIT only) + altglobal alternative global matching + callout_capture show captures at callout time + callout_data=<n> set a value to pass via callouts + callout_fail=<n>[:<m>] control callout failure + callout_none do not supply a callout function + copy=<number or name> copy captured substring + dfa use <b>pcre2_dfa_match()</b> + find_limits find match and recursion limits + get=<number or name> extract captured substring + getall extract all captured substrings + /g global global matching + jitstack=<n> set size of JIT stack + mark show mark values + match_limit=<n> set a match limit + memory show memory usage + null_context match with a NULL context + offset=<n> set starting offset + offset_limit=<n> set offset limit + ovector=<n> set size of output vector + recursion_limit=<n> set a recursion limit + replace=<string> specify a replacement string + startchar show startchar when relevant + startoffset=<n> same as offset=<n> + substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED + substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH + substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET + substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY + zero_terminate pass the subject as zero-terminated </pre> The effects of these modifiers are described in the following sections. </P> @@ -1129,19 +1138,34 @@ Testing the substitution function </b><br> <P> If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is -called instead of one of the matching functions. Unlike subject strings, -<b>pcre2test</b> does not process replacement strings for escape sequences. In -UTF mode, a replacement string is checked to see if it is a valid UTF-8 string. -If so, it is correctly converted to a UTF string of the appropriate code unit -width. If it is not a valid UTF-8 string, the individual code units are copied -directly. This provides a means of passing an invalid UTF-8 string for testing -purposes. -</P> -<P> -If the <b>global</b> modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to -<b>pcre2_substitute()</b>. After a successful substitution, the modified string -is output, preceded by the number of replacements. This may be zero if there -were no matches. Here is a simple example of a substitution test: +called instead of one of the matching functions. Note that replacement strings +cannot contain commas, because a comma signifies the end of a modifier. This is +not thought to be an issue in a test program. +</P> +<P> +Unlike subject strings, <b>pcre2test</b> does not process replacement strings +for escape sequences. In UTF mode, a replacement string is checked to see if it +is a valid UTF-8 string. If so, it is correctly converted to a UTF string of +the appropriate code unit width. If it is not a valid UTF-8 string, the +individual code units are copied directly. This provides a means of passing an +invalid UTF-8 string for testing purposes. +</P> +<P> +The following modifiers set options (in additional to the normal match options) +for <b>pcre2_substitute()</b>: +<pre> + global PCRE2_SUBSTITUTE_GLOBAL + substitute_extended PCRE2_SUBSTITUTE_EXTENDED + substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH + substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET + substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY + +</PRE> +</P> +<P> +After a successful substitution, the modified string is output, preceded by the +number of replacements. This may be zero if there were no matches. Here is a +simple example of a substitution test: <pre> /abc/replace=xxx =abc=abc= @@ -1149,12 +1173,12 @@ were no matches. Here is a simple example of a substitution test: =abc=abc=\=global 2: =xxx=xxx= </pre> -Subject and replacement strings should be kept relatively short for -substitution tests, as fixed-size buffers are used. To make it easy to test for -buffer overflow, if the replacement string starts with a number in square -brackets, that number is passed to <b>pcre2_substitute()</b> as the size of the -output buffer, with the replacement string starting at the next character. Here -is an example that tests the edge case: +Subject and replacement strings should be kept relatively short (fewer than 256 +characters) for substitution tests, as fixed-size buffers are used. To make it +easy to test for buffer overflow, if the replacement string starts with a +number in square brackets, that number is passed to <b>pcre2_substitute()</b> as +the size of the output buffer, with the replacement string starting at the next +character. Here is an example that tests the edge case: <pre> /abc/ 123abc123\=replace=[10]XYZ @@ -1162,6 +1186,19 @@ is an example that tests the edge case: 123abc123\=replace=[9]XYZ Failed: error -47: no more memory </pre> +The default action of <b>pcre2_substitute()</b> is to return +PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the +PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the +<b>substitute_overflow_length</b> modifier), <b>pcre2_substitute()</b> continues +to go through the motions of matching and substituting, in order to compute the +size of buffer that is required. When this happens, <b>pcre2test</b> shows the +required buffer length (which includes space for the trailing zero) as part of +the error message. For example: +<pre> + /abc/substitute_overflow_length + 123abc123\=replace=[9]XYZ + Failed: error -47: no more memory: 10 code units are needed +</pre> A replacement string is ignored with POSIX and DFA matching. Specifying partial matching provokes an error return ("bad option value") from <b>pcre2_substitute()</b>. @@ -1236,10 +1273,10 @@ matching starts. Its value is a number of code units, not characters. Setting an offset limit </b><br> <P> -The <b>offset_limit</b> modifier sets a limit for unanchored matches. If a match -cannot be found starting at or before this offset in the subject, a "no match" -return is given. The data value is a number of code units, not characters. When -this modifier is used, the <b>use_offset_limit</b> modifier must have been set +The <b>offset_limit</b> modifier sets a limit for unanchored matches. If a match +cannot be found starting at or before this offset in the subject, a "no match" +return is given. The data value is a number of code units, not characters. When +this modifier is used, the <b>use_offset_limit</b> modifier must have been set for the pattern; if not, an error is generated. </P> <br><b> @@ -1281,8 +1318,8 @@ Passing a NULL context Normally, <b>pcre2test</b> passes a context block to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> or <b>pcre2_jit_match()</b>. If the <b>null_context</b> modifier is set, however, NULL is passed. This is for testing that the matching -functions behave correctly in this case (they use default values). This -modifier cannot be used with the <b>find_limits</b> modifier or when testing the +functions behave correctly in this case (they use default values). This +modifier cannot be used with the <b>find_limits</b> modifier or when testing the substitution function. </P> <br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br> @@ -1623,7 +1660,7 @@ Cambridge, England. </P> <br><a name="SEC21" href="#TOC1">REVISION</a><br> <P> -Last updated: 05 November 2015 +Last updated: 12 December 2015 <br> Copyright © 1997-2015 University of Cambridge. <br> |