diff options
author | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2011-09-11 14:31:21 +0000 |
---|---|---|
committer | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2011-09-11 14:31:21 +0000 |
commit | 872e41011c69ee598dbdd32444dcde8fa30a23ee (patch) | |
tree | bbc0b9c2afdae0e564bc94b160ebf1a9fbe1744f /doc/pcretest.txt | |
parent | 3e3345effab1548229f5cf368f19ace0b64d782b (diff) | |
download | pcre-872e41011c69ee598dbdd32444dcde8fa30a23ee.tar.gz |
Final source and document tidies for 8.20-RC1.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@691 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc/pcretest.txt')
-rw-r--r-- | doc/pcretest.txt | 207 |
1 files changed, 118 insertions, 89 deletions
diff --git a/doc/pcretest.txt b/doc/pcretest.txt index a7c42fa..999ee0c 100644 --- a/doc/pcretest.txt +++ b/doc/pcretest.txt @@ -71,25 +71,27 @@ COMMAND LINE OPTIONS -S size On Unix-like systems, set the size of the run-time stack to size megabytes. - -s Behave as if each pattern has the /S modifier; in other - words, force each pattern to be studied. If the /I or /D - option is present on a pattern (requesting output about the - compiled pattern), information about the result of studying - is not included when studying is caused only by -s and nei- - ther -i nor -d is present on the command line. This behaviour - means that the output from tests that are run with and with- - out -s should be identical, except when options that output - information about the actual running of a match are set. The - -M, -t, and -tm options, which give information about - resources used, are likely to produce different output with - and without -s. Output may also differ if the /C option is - present on an individual pattern. This uses callouts to trace - the the matching process, and this may be different between - studied and non-studied patterns. If the pattern contains - (*MARK) items there may also be differences, for the same - reason. The -s command line option can be overridden for spe- - cific patterns that should never be studied (see the /S - option below). + -s or -s+ Behave as if each pattern has the /S modifier; in other + words, force each pattern to be studied. If -s+ is used, the + PCRE_STUDY_JIT_COMPILE flag is passed to pcre_study(), caus- + ing just-in-time optimization to be set up if it is avail- + able. If the /I or /D option is present on a pattern + (requesting output about the compiled pattern), information + about the result of studying is not included when studying is + caused only by -s and neither -i nor -d is present on the + command line. This behaviour means that the output from tests + that are run with and without -s should be identical, except + when options that output information about the actual running + of a match are set. The -M, -t, and -tm options, which give + information about resources used, are likely to produce dif- + ferent output with and without -s. Output may also differ if + the /C option is present on an individual pattern. This uses + callouts to trace the the matching process, and this may be + different between studied and non-studied patterns. If the + pattern contains (*MARK) items there may also be differences, + for the same reason. The -s command line option can be over- + ridden for specific patterns that should never be studied + (see the /S pattern modifier below). -t Run each compile, study, and match many times with a timer, and output resulting time per compile or match (in millisec- @@ -245,74 +247,86 @@ PATTERN MODIFIERS subject contains multiple copies of the same substring. If the + modi- fier appears twice, the same action is taken for captured substrings. In each case the remainder is output on the following line with a plus - character following the capture number. + character following the capture number. Note that this modifier must + not immediately follow the /S modifier because /S+ has another meaning. - The /= modifier requests that the values of all potential captured - parentheses be output after a match by pcre_exec(). By default, only + The /= modifier requests that the values of all potential captured + parentheses be output after a match by pcre_exec(). By default, only those up to the highest one actually used in the match are output (cor- - responding to the return code from pcre_exec()). Values in the offsets - vector corresponding to higher numbers should be set to -1, and these - are output as "<unset>". This modifier gives a way of checking that + responding to the return code from pcre_exec()). Values in the offsets + vector corresponding to higher numbers should be set to -1, and these + are output as "<unset>". This modifier gives a way of checking that this is happening. - The /B modifier is a debugging feature. It requests that pcretest out- - put a representation of the compiled byte code after compilation. Nor- - mally this information contains length and offset values; however, if - /Z is also present, this data is replaced by spaces. This is a special + The /B modifier is a debugging feature. It requests that pcretest out- + put a representation of the compiled byte code after compilation. Nor- + mally this information contains length and offset values; however, if + /Z is also present, this data is replaced by spaces. This is a special feature for use in the automatic test scripts; it ensures that the same output is generated for different internal link sizes. - The /D modifier is a PCRE debugging feature, and is equivalent to /BI, + The /D modifier is a PCRE debugging feature, and is equivalent to /BI, that is, both the /B and the /I modifiers. The /F modifier causes pcretest to flip the byte order of the fields in - the compiled pattern that contain 2-byte and 4-byte numbers. This - facility is for testing the feature in PCRE that allows it to execute + the compiled pattern that contain 2-byte and 4-byte numbers. This + facility is for testing the feature in PCRE that allows it to execute patterns that were compiled on a host with a different endianness. This - feature is not available when the POSIX interface to PCRE is being - used, that is, when the /P pattern modifier is specified. See also the + feature is not available when the POSIX interface to PCRE is being + used, that is, when the /P pattern modifier is specified. See also the section about saving and reloading compiled patterns below. - The /I modifier requests that pcretest output information about the - compiled pattern (whether it is anchored, has a fixed first character, - and so on). It does this by calling pcre_fullinfo() after compiling a - pattern. If the pattern is studied, the results of that are also out- + The /I modifier requests that pcretest output information about the + compiled pattern (whether it is anchored, has a fixed first character, + and so on). It does this by calling pcre_fullinfo() after compiling a + pattern. If the pattern is studied, the results of that are also out- put. - The /K modifier requests pcretest to show names from backtracking con- - trol verbs that are returned from calls to pcre_exec(). It causes - pcretest to create a pcre_extra block if one has not already been cre- + The /K modifier requests pcretest to show names from backtracking con- + trol verbs that are returned from calls to pcre_exec(). It causes + pcretest to create a pcre_extra block if one has not already been cre- ated by a call to pcre_study(), and to set the PCRE_EXTRA_MARK flag and the mark field within it, every time that pcre_exec() is called. If the - variable that the mark field points to is non-NULL for a match, non- + variable that the mark field points to is non-NULL for a match, non- match, or partial match, pcretest prints the string to which it points. For a match, this is shown on a line by itself, tagged with "MK:". For a non-match it is added to the message. - The /L modifier must be followed directly by the name of a locale, for + The /L modifier must be followed directly by the name of a locale, for example, /pattern/Lfr_FR For this reason, it must be the last modifier. The given locale is set, - pcre_maketables() is called to build a set of character tables for the - locale, and this is then passed to pcre_compile() when compiling the - regular expression. Without an /L (or /T) modifier, NULL is passed as + pcre_maketables() is called to build a set of character tables for the + locale, and this is then passed to pcre_compile() when compiling the + regular expression. Without an /L (or /T) modifier, NULL is passed as the tables pointer; that is, /L applies only to the expression on which it appears. - The /M modifier causes the size of memory block used to hold the com- + The /M modifier causes the size of memory block used to hold the com- piled pattern to be output. - If the /S modifier appears once, it causes pcre_study() to be called - after the expression has been compiled, and the results used when the - expression is matched. If /S appears twice, it suppresses studying, + If the /S modifier appears once, it causes pcre_study() to be called + after the expression has been compiled, and the results used when the + expression is matched. If /S appears twice, it suppresses studying, even if it was requested externally by the -s command line option. This - makes it possible to specify that certain patterns are always studied, + makes it possible to specify that certain patterns are always studied, and others are never studied, independently of -s. This feature is used in the test files in a few cases where the output is different when the pattern is studied. + If the /S modifier is immediately followed by a + character, the call + to pcre_study() is made with the PCRE_STUDY_JIT_COMPILE option, + requesting just-in-time optimization support if it is available. Note + that there is also a /+ modifier; it must not be given immediately + after /S because this will be misinterpreted. If JIT studying is suc- + cessful, it will automatically be used when pcre_exec() is run, except + when incompatible run-time options are specified. These include the + partial matching options; a complete list is given in the pcrejit docu- + mentation. See also the \J escape sequence below for a way of setting + the size of the JIT stack. + The /T modifier must be followed by a single digit. It causes a spe- cific set of built-in character tables to be passed to pcre_compile(). It is used in the standard PCRE tests to check behaviour with different @@ -392,6 +406,8 @@ DATA LINES \Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin- ated by next non-alphanumeric character) + \Jdd set up a JIT stack of dd kilobytes maximum (any + number of digits) \L call pcre_get_substringlist() after a successful match \M discover the minimum MATCH_LIMIT and @@ -444,17 +460,27 @@ DATA LINES way of passing an empty line as data, since a real empty line termi- nates the data input. - If \M is present, pcretest calls pcre_exec() several times, with dif- - ferent values in the match_limit and match_limit_recursion fields of - the pcre_extra data structure, until it finds the minimum numbers for - each parameter that allow pcre_exec() to complete. The match_limit num- - ber is a measure of the amount of backtracking that takes place, and - checking it out can be instructive. For most simple matches, the number - is quite small, but for patterns with very large numbers of matching - possibilities, it can become large very quickly with increasing length - of subject string. The match_limit_recursion number is a measure of how - much stack (or, if PCRE is compiled with NO_RECURSE, how much heap) - memory is needed to complete the match attempt. + The \J escape provides a way of setting the maximum stack size that is + used by the just-in-time optimization code. It is ignored if JIT opti- + mization is not being used. Providing a stack that is larger than the + default 32K is necessary only for very complicated patterns. + + If \M is present, pcretest calls pcre_exec() several times, with dif- + ferent values in the match_limit and match_limit_recursion fields of + the pcre_extra data structure, until it finds the minimum numbers for + each parameter that allow pcre_exec() to complete without error. + Because this is testing a specific feature of the normal interpretive + pcre_exec() execution, the use of any JIT optimization that might have + been set up by the /S+ qualifier of -s+ option is disabled. + + The match_limit number is a measure of the amount of backtracking that + takes place, and checking it out can be instructive. For most simple + matches, the number is quite small, but for patterns with very large + numbers of matching possibilities, it can become large very quickly + with increasing length of subject string. The match_limit_recursion + number is a measure of how much stack (or, if PCRE is compiled with + NO_RECURSE, how much heap) memory is needed to complete the match + attempt. When \O is used, the value specified may be higher or lower than the size set by the -O command line option (or defaulted to 45); \O applies @@ -720,19 +746,20 @@ SAVING AND RELOADING COMPILED PATTERNS /pattern/im >/some/file See the pcreprecompile documentation for a discussion about saving and - re-using compiled patterns. + re-using compiled patterns. Note that if the pattern was successfully + studied with JIT optimization, the JIT data cannot be saved. - The data that is written is binary. The first eight bytes are the - length of the compiled pattern data followed by the length of the - optional study data, each written as four bytes in big-endian order - (most significant byte first). If there is no study data (either the + The data that is written is binary. The first eight bytes are the + length of the compiled pattern data followed by the length of the + optional study data, each written as four bytes in big-endian order + (most significant byte first). If there is no study data (either the pattern was not studied, or studying did not return any data), the sec- - ond length is zero. The lengths are followed by an exact copy of the - compiled pattern. If there is additional study data, this follows imme- - diately after the compiled pattern. After writing the file, pcretest - expects to read a new pattern. + ond length is zero. The lengths are followed by an exact copy of the + compiled pattern. If there is additional study data, this (excluding + any JIT data) follows immediately after the compiled pattern. After + writing the file, pcretest expects to read a new pattern. - A saved pattern can be reloaded into pcretest by specifying < and a + A saved pattern can be reloaded into pcretest by specifying < and a file name instead of a pattern. The name of the file must not contain a < character, as otherwise pcretest will interpret the line as a pattern delimited by < characters. For example: @@ -741,32 +768,34 @@ SAVING AND RELOADING COMPILED PATTERNS Compiled pattern loaded from /some/file No study data - When the pattern has been loaded, pcretest proceeds to read data lines - in the usual way. + If the pattern was previously studied with the JIT optimization, the + JIT information cannot be saved and restored, and so is lost. When the + pattern has been loaded, pcretest proceeds to read data lines in the + usual way. - You can copy a file written by pcretest to a different host and reload - it there, even if the new host has opposite endianness to the one on - which the pattern was compiled. For example, you can compile on an i86 + You can copy a file written by pcretest to a different host and reload + it there, even if the new host has opposite endianness to the one on + which the pattern was compiled. For example, you can compile on an i86 machine and run on a SPARC machine. - File names for saving and reloading can be absolute or relative, but - note that the shell facility of expanding a file name that starts with + File names for saving and reloading can be absolute or relative, but + note that the shell facility of expanding a file name that starts with a tilde (~) is not available. - The ability to save and reload files in pcretest is intended for test- - ing and experimentation. It is not intended for production use because - only a single pattern can be written to a file. Furthermore, there is - no facility for supplying custom character tables for use with a - reloaded pattern. If the original pattern was compiled with custom - tables, an attempt to match a subject string using a reloaded pattern - is likely to cause pcretest to crash. Finally, if you attempt to load + The ability to save and reload files in pcretest is intended for test- + ing and experimentation. It is not intended for production use because + only a single pattern can be written to a file. Furthermore, there is + no facility for supplying custom character tables for use with a + reloaded pattern. If the original pattern was compiled with custom + tables, an attempt to match a subject string using a reloaded pattern + is likely to cause pcretest to crash. Finally, if you attempt to load a file that is not in the correct format, the result is undefined. SEE ALSO - pcre(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcrepartial(d), - pcrepattern(3), pcreprecompile(3). + pcre(3), pcreapi(3), pcrecallout(3), pcrejit, pcrematching(3), pcrepar- + tial(d), pcrepattern(3), pcreprecompile(3). AUTHOR @@ -778,5 +807,5 @@ AUTHOR REVISION - Last updated: 01 August 2011 + Last updated: 26 August 2011 Copyright (c) 1997-2011 University of Cambridge. |