summaryrefslogtreecommitdiff
path: root/doc/pcretest.txt
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2011-09-11 14:31:21 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2011-09-11 14:31:21 +0000
commit872e41011c69ee598dbdd32444dcde8fa30a23ee (patch)
treebbc0b9c2afdae0e564bc94b160ebf1a9fbe1744f /doc/pcretest.txt
parent3e3345effab1548229f5cf368f19ace0b64d782b (diff)
downloadpcre-872e41011c69ee598dbdd32444dcde8fa30a23ee.tar.gz
Final source and document tidies for 8.20-RC1.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@691 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc/pcretest.txt')
-rw-r--r--doc/pcretest.txt207
1 files changed, 118 insertions, 89 deletions
diff --git a/doc/pcretest.txt b/doc/pcretest.txt
index a7c42fa..999ee0c 100644
--- a/doc/pcretest.txt
+++ b/doc/pcretest.txt
@@ -71,25 +71,27 @@ COMMAND LINE OPTIONS
-S size On Unix-like systems, set the size of the run-time stack to
size megabytes.
- -s Behave as if each pattern has the /S modifier; in other
- words, force each pattern to be studied. If the /I or /D
- option is present on a pattern (requesting output about the
- compiled pattern), information about the result of studying
- is not included when studying is caused only by -s and nei-
- ther -i nor -d is present on the command line. This behaviour
- means that the output from tests that are run with and with-
- out -s should be identical, except when options that output
- information about the actual running of a match are set. The
- -M, -t, and -tm options, which give information about
- resources used, are likely to produce different output with
- and without -s. Output may also differ if the /C option is
- present on an individual pattern. This uses callouts to trace
- the the matching process, and this may be different between
- studied and non-studied patterns. If the pattern contains
- (*MARK) items there may also be differences, for the same
- reason. The -s command line option can be overridden for spe-
- cific patterns that should never be studied (see the /S
- option below).
+ -s or -s+ Behave as if each pattern has the /S modifier; in other
+ words, force each pattern to be studied. If -s+ is used, the
+ PCRE_STUDY_JIT_COMPILE flag is passed to pcre_study(), caus-
+ ing just-in-time optimization to be set up if it is avail-
+ able. If the /I or /D option is present on a pattern
+ (requesting output about the compiled pattern), information
+ about the result of studying is not included when studying is
+ caused only by -s and neither -i nor -d is present on the
+ command line. This behaviour means that the output from tests
+ that are run with and without -s should be identical, except
+ when options that output information about the actual running
+ of a match are set. The -M, -t, and -tm options, which give
+ information about resources used, are likely to produce dif-
+ ferent output with and without -s. Output may also differ if
+ the /C option is present on an individual pattern. This uses
+ callouts to trace the the matching process, and this may be
+ different between studied and non-studied patterns. If the
+ pattern contains (*MARK) items there may also be differences,
+ for the same reason. The -s command line option can be over-
+ ridden for specific patterns that should never be studied
+ (see the /S pattern modifier below).
-t Run each compile, study, and match many times with a timer,
and output resulting time per compile or match (in millisec-
@@ -245,74 +247,86 @@ PATTERN MODIFIERS
subject contains multiple copies of the same substring. If the + modi-
fier appears twice, the same action is taken for captured substrings.
In each case the remainder is output on the following line with a plus
- character following the capture number.
+ character following the capture number. Note that this modifier must
+ not immediately follow the /S modifier because /S+ has another meaning.
- The /= modifier requests that the values of all potential captured
- parentheses be output after a match by pcre_exec(). By default, only
+ The /= modifier requests that the values of all potential captured
+ parentheses be output after a match by pcre_exec(). By default, only
those up to the highest one actually used in the match are output (cor-
- responding to the return code from pcre_exec()). Values in the offsets
- vector corresponding to higher numbers should be set to -1, and these
- are output as "<unset>". This modifier gives a way of checking that
+ responding to the return code from pcre_exec()). Values in the offsets
+ vector corresponding to higher numbers should be set to -1, and these
+ are output as "<unset>". This modifier gives a way of checking that
this is happening.
- The /B modifier is a debugging feature. It requests that pcretest out-
- put a representation of the compiled byte code after compilation. Nor-
- mally this information contains length and offset values; however, if
- /Z is also present, this data is replaced by spaces. This is a special
+ The /B modifier is a debugging feature. It requests that pcretest out-
+ put a representation of the compiled byte code after compilation. Nor-
+ mally this information contains length and offset values; however, if
+ /Z is also present, this data is replaced by spaces. This is a special
feature for use in the automatic test scripts; it ensures that the same
output is generated for different internal link sizes.
- The /D modifier is a PCRE debugging feature, and is equivalent to /BI,
+ The /D modifier is a PCRE debugging feature, and is equivalent to /BI,
that is, both the /B and the /I modifiers.
The /F modifier causes pcretest to flip the byte order of the fields in
- the compiled pattern that contain 2-byte and 4-byte numbers. This
- facility is for testing the feature in PCRE that allows it to execute
+ the compiled pattern that contain 2-byte and 4-byte numbers. This
+ facility is for testing the feature in PCRE that allows it to execute
patterns that were compiled on a host with a different endianness. This
- feature is not available when the POSIX interface to PCRE is being
- used, that is, when the /P pattern modifier is specified. See also the
+ feature is not available when the POSIX interface to PCRE is being
+ used, that is, when the /P pattern modifier is specified. See also the
section about saving and reloading compiled patterns below.
- The /I modifier requests that pcretest output information about the
- compiled pattern (whether it is anchored, has a fixed first character,
- and so on). It does this by calling pcre_fullinfo() after compiling a
- pattern. If the pattern is studied, the results of that are also out-
+ The /I modifier requests that pcretest output information about the
+ compiled pattern (whether it is anchored, has a fixed first character,
+ and so on). It does this by calling pcre_fullinfo() after compiling a
+ pattern. If the pattern is studied, the results of that are also out-
put.
- The /K modifier requests pcretest to show names from backtracking con-
- trol verbs that are returned from calls to pcre_exec(). It causes
- pcretest to create a pcre_extra block if one has not already been cre-
+ The /K modifier requests pcretest to show names from backtracking con-
+ trol verbs that are returned from calls to pcre_exec(). It causes
+ pcretest to create a pcre_extra block if one has not already been cre-
ated by a call to pcre_study(), and to set the PCRE_EXTRA_MARK flag and
the mark field within it, every time that pcre_exec() is called. If the
- variable that the mark field points to is non-NULL for a match, non-
+ variable that the mark field points to is non-NULL for a match, non-
match, or partial match, pcretest prints the string to which it points.
For a match, this is shown on a line by itself, tagged with "MK:". For
a non-match it is added to the message.
- The /L modifier must be followed directly by the name of a locale, for
+ The /L modifier must be followed directly by the name of a locale, for
example,
/pattern/Lfr_FR
For this reason, it must be the last modifier. The given locale is set,
- pcre_maketables() is called to build a set of character tables for the
- locale, and this is then passed to pcre_compile() when compiling the
- regular expression. Without an /L (or /T) modifier, NULL is passed as
+ pcre_maketables() is called to build a set of character tables for the
+ locale, and this is then passed to pcre_compile() when compiling the
+ regular expression. Without an /L (or /T) modifier, NULL is passed as
the tables pointer; that is, /L applies only to the expression on which
it appears.
- The /M modifier causes the size of memory block used to hold the com-
+ The /M modifier causes the size of memory block used to hold the com-
piled pattern to be output.
- If the /S modifier appears once, it causes pcre_study() to be called
- after the expression has been compiled, and the results used when the
- expression is matched. If /S appears twice, it suppresses studying,
+ If the /S modifier appears once, it causes pcre_study() to be called
+ after the expression has been compiled, and the results used when the
+ expression is matched. If /S appears twice, it suppresses studying,
even if it was requested externally by the -s command line option. This
- makes it possible to specify that certain patterns are always studied,
+ makes it possible to specify that certain patterns are always studied,
and others are never studied, independently of -s. This feature is used
in the test files in a few cases where the output is different when the
pattern is studied.
+ If the /S modifier is immediately followed by a + character, the call
+ to pcre_study() is made with the PCRE_STUDY_JIT_COMPILE option,
+ requesting just-in-time optimization support if it is available. Note
+ that there is also a /+ modifier; it must not be given immediately
+ after /S because this will be misinterpreted. If JIT studying is suc-
+ cessful, it will automatically be used when pcre_exec() is run, except
+ when incompatible run-time options are specified. These include the
+ partial matching options; a complete list is given in the pcrejit docu-
+ mentation. See also the \J escape sequence below for a way of setting
+ the size of the JIT stack.
+
The /T modifier must be followed by a single digit. It causes a spe-
cific set of built-in character tables to be passed to pcre_compile().
It is used in the standard PCRE tests to check behaviour with different
@@ -392,6 +406,8 @@ DATA LINES
\Gname call pcre_get_named_substring() for substring
"name" after a successful match (name termin-
ated by next non-alphanumeric character)
+ \Jdd set up a JIT stack of dd kilobytes maximum (any
+ number of digits)
\L call pcre_get_substringlist() after a
successful match
\M discover the minimum MATCH_LIMIT and
@@ -444,17 +460,27 @@ DATA LINES
way of passing an empty line as data, since a real empty line termi-
nates the data input.
- If \M is present, pcretest calls pcre_exec() several times, with dif-
- ferent values in the match_limit and match_limit_recursion fields of
- the pcre_extra data structure, until it finds the minimum numbers for
- each parameter that allow pcre_exec() to complete. The match_limit num-
- ber is a measure of the amount of backtracking that takes place, and
- checking it out can be instructive. For most simple matches, the number
- is quite small, but for patterns with very large numbers of matching
- possibilities, it can become large very quickly with increasing length
- of subject string. The match_limit_recursion number is a measure of how
- much stack (or, if PCRE is compiled with NO_RECURSE, how much heap)
- memory is needed to complete the match attempt.
+ The \J escape provides a way of setting the maximum stack size that is
+ used by the just-in-time optimization code. It is ignored if JIT opti-
+ mization is not being used. Providing a stack that is larger than the
+ default 32K is necessary only for very complicated patterns.
+
+ If \M is present, pcretest calls pcre_exec() several times, with dif-
+ ferent values in the match_limit and match_limit_recursion fields of
+ the pcre_extra data structure, until it finds the minimum numbers for
+ each parameter that allow pcre_exec() to complete without error.
+ Because this is testing a specific feature of the normal interpretive
+ pcre_exec() execution, the use of any JIT optimization that might have
+ been set up by the /S+ qualifier of -s+ option is disabled.
+
+ The match_limit number is a measure of the amount of backtracking that
+ takes place, and checking it out can be instructive. For most simple
+ matches, the number is quite small, but for patterns with very large
+ numbers of matching possibilities, it can become large very quickly
+ with increasing length of subject string. The match_limit_recursion
+ number is a measure of how much stack (or, if PCRE is compiled with
+ NO_RECURSE, how much heap) memory is needed to complete the match
+ attempt.
When \O is used, the value specified may be higher or lower than the
size set by the -O command line option (or defaulted to 45); \O applies
@@ -720,19 +746,20 @@ SAVING AND RELOADING COMPILED PATTERNS
/pattern/im >/some/file
See the pcreprecompile documentation for a discussion about saving and
- re-using compiled patterns.
+ re-using compiled patterns. Note that if the pattern was successfully
+ studied with JIT optimization, the JIT data cannot be saved.
- The data that is written is binary. The first eight bytes are the
- length of the compiled pattern data followed by the length of the
- optional study data, each written as four bytes in big-endian order
- (most significant byte first). If there is no study data (either the
+ The data that is written is binary. The first eight bytes are the
+ length of the compiled pattern data followed by the length of the
+ optional study data, each written as four bytes in big-endian order
+ (most significant byte first). If there is no study data (either the
pattern was not studied, or studying did not return any data), the sec-
- ond length is zero. The lengths are followed by an exact copy of the
- compiled pattern. If there is additional study data, this follows imme-
- diately after the compiled pattern. After writing the file, pcretest
- expects to read a new pattern.
+ ond length is zero. The lengths are followed by an exact copy of the
+ compiled pattern. If there is additional study data, this (excluding
+ any JIT data) follows immediately after the compiled pattern. After
+ writing the file, pcretest expects to read a new pattern.
- A saved pattern can be reloaded into pcretest by specifying < and a
+ A saved pattern can be reloaded into pcretest by specifying < and a
file name instead of a pattern. The name of the file must not contain a
< character, as otherwise pcretest will interpret the line as a pattern
delimited by < characters. For example:
@@ -741,32 +768,34 @@ SAVING AND RELOADING COMPILED PATTERNS
Compiled pattern loaded from /some/file
No study data
- When the pattern has been loaded, pcretest proceeds to read data lines
- in the usual way.
+ If the pattern was previously studied with the JIT optimization, the
+ JIT information cannot be saved and restored, and so is lost. When the
+ pattern has been loaded, pcretest proceeds to read data lines in the
+ usual way.
- You can copy a file written by pcretest to a different host and reload
- it there, even if the new host has opposite endianness to the one on
- which the pattern was compiled. For example, you can compile on an i86
+ You can copy a file written by pcretest to a different host and reload
+ it there, even if the new host has opposite endianness to the one on
+ which the pattern was compiled. For example, you can compile on an i86
machine and run on a SPARC machine.
- File names for saving and reloading can be absolute or relative, but
- note that the shell facility of expanding a file name that starts with
+ File names for saving and reloading can be absolute or relative, but
+ note that the shell facility of expanding a file name that starts with
a tilde (~) is not available.
- The ability to save and reload files in pcretest is intended for test-
- ing and experimentation. It is not intended for production use because
- only a single pattern can be written to a file. Furthermore, there is
- no facility for supplying custom character tables for use with a
- reloaded pattern. If the original pattern was compiled with custom
- tables, an attempt to match a subject string using a reloaded pattern
- is likely to cause pcretest to crash. Finally, if you attempt to load
+ The ability to save and reload files in pcretest is intended for test-
+ ing and experimentation. It is not intended for production use because
+ only a single pattern can be written to a file. Furthermore, there is
+ no facility for supplying custom character tables for use with a
+ reloaded pattern. If the original pattern was compiled with custom
+ tables, an attempt to match a subject string using a reloaded pattern
+ is likely to cause pcretest to crash. Finally, if you attempt to load
a file that is not in the correct format, the result is undefined.
SEE ALSO
- pcre(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcrepartial(d),
- pcrepattern(3), pcreprecompile(3).
+ pcre(3), pcreapi(3), pcrecallout(3), pcrejit, pcrematching(3), pcrepar-
+ tial(d), pcrepattern(3), pcreprecompile(3).
AUTHOR
@@ -778,5 +807,5 @@ AUTHOR
REVISION
- Last updated: 01 August 2011
+ Last updated: 26 August 2011
Copyright (c) 1997-2011 University of Cambridge.