From 4939d7de20b030c6161dbc9cb45e7973cf77d5a1 Mon Sep 17 00:00:00 2001 From: ph10 Date: Mon, 16 Jan 2017 17:40:47 +0000 Subject: File tidies for 10.23-RC1 git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@655 6239d852-aaf2-0410-a92c-79f79f948069 --- doc/html/pcre2test.html | 71 +++++++++++++++++++++++++++++-------------------- 1 file changed, 42 insertions(+), 29 deletions(-) (limited to 'doc/html/pcre2test.html') diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html index dc1b1dd..ee41e43 100644 --- a/doc/html/pcre2test.html +++ b/doc/html/pcre2test.html @@ -114,7 +114,7 @@ to the library. For subject lines, backslash escapes can be used. In addition, when the utf modifier (see "Setting compilation options" below) is set, the pattern and any following subject lines are interpreted as -UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate. +UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.

For non-UTF testing of wide characters, the utf8_input modifier can be @@ -153,8 +153,13 @@ the 32-bit library has been built, this is the default. If the 32-bit library has not been built, this option causes an error.

+-ac +Behave as if each pattern has the auto_callout modifier, that is, insert +automatic callouts into every pattern that is compiled. +

+

-b -Behave as if each pattern has the /fullbincode modifier; the full +Behave as if each pattern has the fullbincode modifier; the full internal binary form of the pattern is output after compilation.

@@ -220,7 +225,7 @@ Output a brief summary these options and then exit.

-i -Behave as if each pattern has the /info modifier; information about the +Behave as if each pattern has the info modifier; information about the compiled pattern is given after compilation.

@@ -582,7 +587,7 @@ for a description of their effects. As well as turning on the PCRE2_UTF option, the utf modifier causes all non-printing characters in output strings to be printed using the \x{hh...} notation. Otherwise, those less than 0x100 are output in hex without the curly -brackets. Setting utf in 16-bit or 32-bit mode also causes pattern and +brackets. Setting utf in 16-bit or 32-bit mode also causes pattern and subject strings to be translated to UTF-16 or UTF-32, respectively, before being passed to library functions.

@@ -615,8 +620,8 @@ about the pattern: pushcopy push a copy onto the stack stackguard=<number> test the stackguard feature tables=[0|1|2] select internal tables - use_length do not zero-terminate the pattern - utf8_input treat input as UTF-8 + use_length do not zero-terminate the pattern + utf8_input treat input as UTF-8 The effects of these modifiers are described in the following sections.

@@ -705,7 +710,7 @@ Specifying the pattern's length By default, patterns are passed to the compiling functions as zero-terminated strings. When using the POSIX wrapper API, there is no other option. However, when using PCRE2's native API, patterns can be passed by length instead of -being zero-terminated. The use_length modifier causes this to happen. +being zero-terminated. The use_length modifier causes this to happen. Using a length happens automatically (whether or not use_length is set) when hex is set, because patterns specified in hexadecimal may contain binary zeros. @@ -733,17 +738,17 @@ the delimiter within a substring. The hex and expand modifiers are mutually exclusive.

-The POSIX API cannot be used with patterns specified in hexadecimal because -they may contain binary zeros, which conflicts with regcomp()'s -requirement for a zero-terminated string. Such patterns are always passed to +The POSIX API cannot be used with patterns specified in hexadecimal because +they may contain binary zeros, which conflicts with regcomp()'s +requirement for a zero-terminated string. Such patterns are always passed to pcre2_compile() as a string with a length, not as zero-terminated.


Specifying wide characters in 16-bit and 32-bit modes

-In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and -translated to UTF-16 or UTF-32 when the utf modifier is set. For testing +In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and +translated to UTF-16 or UTF-32 when the utf modifier is set. For testing the 16-bit and 32-bit libraries in non-UTF mode, the utf8_input modifier can be used. It is mutually exclusive with utf. Input lines are interpreted as UTF-8 as a means of specifying wide characters. More details are @@ -806,7 +811,7 @@ modes are to be compiled: 2 compile JIT code for soft partial matching 4 compile JIT code for hard partial matching -The possible values for the /jit modifier are therefore: +The possible values for the jit modifier are therefore:

   0  disable JIT
   1  normal matching only
@@ -852,14 +857,14 @@ code was actually used in the match.
 Setting a locale
 

-The /locale modifier must specify the name of a locale, for example: +The locale modifier must specify the name of a locale, for example:

   /pattern/locale=fr_FR
 
The given locale is set, pcre2_maketables() is called to build a set of character tables for the locale, and this is then passed to pcre2_compile() when compiling the regular expression. The same tables -are used when matching the following subject lines. The /locale modifier +are used when matching the following subject lines. The locale modifier applies only to the pattern on which it appears, but can be given in a #pattern command if a default is needed. Setting a locale and alternate character tables are mutually exclusive. @@ -868,7 +873,7 @@ character tables are mutually exclusive. Showing pattern memory

-The /memory modifier causes the size in bytes of the memory used to hold +The memory modifier causes the size in bytes of the memory used to hold the compiled pattern to be output. This does not include the size of the pcre2_code block; it is just the actual compiled data. If the pattern is subsequently passed to the JIT compiler, the size of the JIT compiled code is @@ -937,7 +942,7 @@ an error. Testing the stack guard feature

-The /stackguard modifier is used to test the use of +The stackguard modifier is used to test the use of pcre2_set_compile_recursion_guard(), a function that is provided to enable stack availability to be checked during compilation (see the pcre2api @@ -952,7 +957,7 @@ be aborted. Using alternative character tables

-The value specified for the /tables modifier must be one of the digits 0, +The value specified for the tables modifier must be one of the digits 0, 1, or 2. It causes a specific set of built-in character tables to be passed to pcre2_compile(). This is used in the PCRE2 tests to check behaviour with different character tables. The digit specifies the tables as follows: @@ -1042,7 +1047,7 @@ The partial matching modifiers are provided with abbreviations because they appear frequently in tests.

-If the /posix modifier was present on the pattern, causing the POSIX +If the posix modifier was present on the pattern, causing the POSIX wrapper API to be used, the only option-setting modifiers that have any effect are notbol, notempty, and noteol, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to regexec(). @@ -1064,6 +1069,7 @@ pattern. altglobal alternative global matching callout_capture show captures at callout time callout_data=<n> set a value to pass via callouts + callout_error=<n>[:<m>] control callout error callout_fail=<n>[:<m>] control callout failure callout_none do not supply a callout function copy=<number or name> copy captured substring @@ -1159,15 +1165,22 @@ Testing callouts

A callout function is supplied when pcre2test calls the library matching functions, unless callout_none is specified. If callout_capture is -set, the current captured groups are output when a callout occurs. +set, the current captured groups are output when a callout occurs. The default +return from the callout function is zero, which allows matching to continue.

The callout_fail modifier can be given one or two numbers. If there is -only one number, 1 is returned instead of 0 when a callout of that number is -reached. If two numbers are given, 1 is returned when callout <n> is reached -for the <m>th time. Note that callouts with string arguments are always given -the number zero. See "Callouts" below for a description of the output when a -callout it taken. +only one number, 1 is returned instead of 0 (causing matching to backtrack) +when a callout of that number is reached. If two numbers (<n>:<m>) are given, 1 +is returned when callout <n> is reached and there have been at least <m> +callouts. The callout_error modifier is similar, except that +PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be +aborted. If both these modifiers are set for the same callout number, +callout_error takes precedence. +

+

+Note that callouts with string arguments are always given the number zero. See +"Callouts" below for a description of the output when a callout it taken.

The callout_data modifier can be given an unsigned or a negative number. @@ -1180,7 +1193,7 @@ Finding all matches in a string

Searching for all possible matches within a subject can be requested by the -global or /altglobal modifier. After finding a match, the matching +global or altglobal modifier. After finding a match, the matching function is called again to search the remainder of the subject. The difference between global and altglobal is that the former uses the start_offset argument to pcre2_match() or pcre2_dfa_match() @@ -1480,7 +1493,7 @@ unset substring is shown as "<unset>", as for the second data line. If the strings contain any non-printing characters, they are output as \xhh escapes if the value is less than 256 and UTF mode is not set. Otherwise they are output as \x{hh...} escapes. See below for the definition of non-printing -characters. If the /aftertext modifier is set, the output for substring +characters. If the aftertext modifier is set, the output for substring 0 is followed by the the rest of the subject string, identified by "0+" like this:

@@ -1673,7 +1686,7 @@ therefore shown as hex escapes.
 

When pcre2test is outputting text that is a matched part of a subject string, it behaves in the same way, unless a different locale has been set for -the pattern (using the /locale modifier). In this case, the +the pattern (using the locale modifier). In this case, the isprint() function is used to distinguish printing and non-printing characters.

@@ -1766,7 +1779,7 @@ Cambridge, England.


REVISION

-Last updated: 04 November 2016 +Last updated: 28 December 2016
Copyright © 1997-2016 University of Cambridge.
-- cgit v1.2.1