From 4939d7de20b030c6161dbc9cb45e7973cf77d5a1 Mon Sep 17 00:00:00 2001
From: ph10
For non-UTF testing of wide characters, the utf8_input modifier can be @@ -153,8 +153,13 @@ the 32-bit library has been built, this is the default. If the 32-bit library has not been built, this option causes an error.
+-ac +Behave as if each pattern has the auto_callout modifier, that is, insert +automatic callouts into every pattern that is compiled. +
+-b -Behave as if each pattern has the /fullbincode modifier; the full +Behave as if each pattern has the fullbincode modifier; the full internal binary form of the pattern is output after compilation.
@@ -220,7 +225,7 @@ Output a brief summary these options and then exit.
-i -Behave as if each pattern has the /info modifier; information about the +Behave as if each pattern has the info modifier; information about the compiled pattern is given after compilation.
@@ -582,7 +587,7 @@ for a description of their effects. As well as turning on the PCRE2_UTF option, the utf modifier causes all non-printing characters in output strings to be printed using the \x{hh...} notation. Otherwise, those less than 0x100 are output in hex without the curly -brackets. Setting utf in 16-bit or 32-bit mode also causes pattern and +brackets. Setting utf in 16-bit or 32-bit mode also causes pattern and subject strings to be translated to UTF-16 or UTF-32, respectively, before being passed to library functions.
@@ -615,8 +620,8 @@ about the pattern: pushcopy push a copy onto the stack stackguard=<number> test the stackguard feature tables=[0|1|2] select internal tables - use_length do not zero-terminate the pattern - utf8_input treat input as UTF-8 + use_length do not zero-terminate the pattern + utf8_input treat input as UTF-8 The effects of these modifiers are described in the following sections. @@ -705,7 +710,7 @@ Specifying the pattern's length By default, patterns are passed to the compiling functions as zero-terminated strings. When using the POSIX wrapper API, there is no other option. However, when using PCRE2's native API, patterns can be passed by length instead of -being zero-terminated. The use_length modifier causes this to happen. +being zero-terminated. The use_length modifier causes this to happen. Using a length happens automatically (whether or not use_length is set) when hex is set, because patterns specified in hexadecimal may contain binary zeros. @@ -733,17 +738,17 @@ the delimiter within a substring. The hex and expand modifiers are mutually exclusive.-The POSIX API cannot be used with patterns specified in hexadecimal because -they may contain binary zeros, which conflicts with regcomp()'s -requirement for a zero-terminated string. Such patterns are always passed to +The POSIX API cannot be used with patterns specified in hexadecimal because +they may contain binary zeros, which conflicts with regcomp()'s +requirement for a zero-terminated string. Such patterns are always passed to pcre2_compile() as a string with a length, not as zero-terminated.
-In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and -translated to UTF-16 or UTF-32 when the utf modifier is set. For testing +In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and +translated to UTF-16 or UTF-32 when the utf modifier is set. For testing the 16-bit and 32-bit libraries in non-UTF mode, the utf8_input modifier can be used. It is mutually exclusive with utf. Input lines are interpreted as UTF-8 as a means of specifying wide characters. More details are @@ -806,7 +811,7 @@ modes are to be compiled: 2 compile JIT code for soft partial matching 4 compile JIT code for hard partial matching -The possible values for the /jit modifier are therefore: +The possible values for the jit modifier are therefore:
0 disable JIT 1 normal matching only @@ -852,14 +857,14 @@ code was actually used in the match. Setting a locale
-The /locale modifier must specify the name of a locale, for example: +The locale modifier must specify the name of a locale, for example:
/pattern/locale=fr_FRThe given locale is set, pcre2_maketables() is called to build a set of character tables for the locale, and this is then passed to pcre2_compile() when compiling the regular expression. The same tables -are used when matching the following subject lines. The /locale modifier +are used when matching the following subject lines. The locale modifier applies only to the pattern on which it appears, but can be given in a #pattern command if a default is needed. Setting a locale and alternate character tables are mutually exclusive. @@ -868,7 +873,7 @@ character tables are mutually exclusive. Showing pattern memory
-The /memory modifier causes the size in bytes of the memory used to hold +The memory modifier causes the size in bytes of the memory used to hold the compiled pattern to be output. This does not include the size of the pcre2_code block; it is just the actual compiled data. If the pattern is subsequently passed to the JIT compiler, the size of the JIT compiled code is @@ -937,7 +942,7 @@ an error. Testing the stack guard feature
-The /stackguard modifier is used to test the use of +The stackguard modifier is used to test the use of pcre2_set_compile_recursion_guard(), a function that is provided to enable stack availability to be checked during compilation (see the pcre2api @@ -952,7 +957,7 @@ be aborted. Using alternative character tables
-The value specified for the /tables modifier must be one of the digits 0, +The value specified for the tables modifier must be one of the digits 0, 1, or 2. It causes a specific set of built-in character tables to be passed to pcre2_compile(). This is used in the PCRE2 tests to check behaviour with different character tables. The digit specifies the tables as follows: @@ -1042,7 +1047,7 @@ The partial matching modifiers are provided with abbreviations because they appear frequently in tests.
-If the /posix modifier was present on the pattern, causing the POSIX +If the posix modifier was present on the pattern, causing the POSIX wrapper API to be used, the only option-setting modifiers that have any effect are notbol, notempty, and noteol, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec(). @@ -1064,6 +1069,7 @@ pattern. altglobal alternative global matching callout_capture show captures at callout time callout_data=<n> set a value to pass via callouts + callout_error=<n>[:<m>] control callout error callout_fail=<n>[:<m>] control callout failure callout_none do not supply a callout function copy=<number or name> copy captured substring @@ -1159,15 +1165,22 @@ Testing callouts
A callout function is supplied when pcre2test calls the library matching functions, unless callout_none is specified. If callout_capture is -set, the current captured groups are output when a callout occurs. +set, the current captured groups are output when a callout occurs. The default +return from the callout function is zero, which allows matching to continue.
The callout_fail modifier can be given one or two numbers. If there is -only one number, 1 is returned instead of 0 when a callout of that number is -reached. If two numbers are given, 1 is returned when callout <n> is reached -for the <m>th time. Note that callouts with string arguments are always given -the number zero. See "Callouts" below for a description of the output when a -callout it taken. +only one number, 1 is returned instead of 0 (causing matching to backtrack) +when a callout of that number is reached. If two numbers (<n>:<m>) are given, 1 +is returned when callout <n> is reached and there have been at least <m> +callouts. The callout_error modifier is similar, except that +PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be +aborted. If both these modifiers are set for the same callout number, +callout_error takes precedence. +
++Note that callouts with string arguments are always given the number zero. See +"Callouts" below for a description of the output when a callout it taken.
The callout_data modifier can be given an unsigned or a negative number. @@ -1180,7 +1193,7 @@ Finding all matches in a string
Searching for all possible matches within a subject can be requested by the -global or /altglobal modifier. After finding a match, the matching +global or altglobal modifier. After finding a match, the matching function is called again to search the remainder of the subject. The difference between global and altglobal is that the former uses the start_offset argument to pcre2_match() or pcre2_dfa_match() @@ -1480,7 +1493,7 @@ unset substring is shown as "<unset>", as for the second data line. If the strings contain any non-printing characters, they are output as \xhh escapes if the value is less than 256 and UTF mode is not set. Otherwise they are output as \x{hh...} escapes. See below for the definition of non-printing -characters. If the /aftertext modifier is set, the output for substring +characters. If the aftertext modifier is set, the output for substring 0 is followed by the the rest of the subject string, identified by "0+" like this:
@@ -1673,7 +1686,7 @@ therefore shown as hex escapes.When pcre2test is outputting text that is a matched part of a subject string, it behaves in the same way, unless a different locale has been set for -the pattern (using the /locale modifier). In this case, the +the pattern (using the locale modifier). In this case, the isprint() function is used to distinguish printing and non-printing characters.
@@ -1766,7 +1779,7 @@ Cambridge, England.
REVISION
-Last updated: 04 November 2016 +Last updated: 28 December 2016
Copyright © 1997-2016 University of Cambridge.
-- cgit v1.2.1