diff options
author | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2007-12-17 14:46:11 +0000 |
---|---|---|
committer | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2007-12-17 14:46:11 +0000 |
commit | f34b50f089383917ab527f3ab5f9c0a8144bbf0d (patch) | |
tree | 4d41d677b615b7a567ecb0d65a407790ce35ca93 /doc/pcretest.txt | |
parent | bcaa82a45b01cc5cf8689180e20514e5e14bb36f (diff) | |
download | pcre-f34b50f089383917ab527f3ab5f9c0a8144bbf0d.tar.gz |
Add .gz and .bz2 optional support to pcregrep.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@286 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc/pcretest.txt')
-rw-r--r-- | doc/pcretest.txt | 144 |
1 files changed, 80 insertions, 64 deletions
diff --git a/doc/pcretest.txt b/doc/pcretest.txt index 6883a26..889e38d 100644 --- a/doc/pcretest.txt +++ b/doc/pcretest.txt @@ -415,11 +415,27 @@ DEFAULT OUTPUT FROM PCRETEST data> xyz No match - If the strings contain any non-printing characters, they are output as - \0x escapes, or as \x{...} escapes if the /8 modifier was present on - the pattern. See below for the definition of non-printing characters. - If the pattern has the /+ modifier, the output for substring 0 is fol- - lowed by the the rest of the subject string, identified by "0+" like + Note that unset capturing substrings that are not followed by one that + is set are not returned by pcre_exec(), and are not shown by pcretest. + In the following example, there are two capturing substrings, but when + the first data line is matched, the second, unset substring is not + shown. An "internal" unset substring is shown as "<unset>", as for the + second data line. + + re> /(a)|(b)/ + data> a + 0: a + 1: a + data> b + 0: b + 1: <unset> + 2: b + + If the strings contain any non-printing characters, they are output as + \0x escapes, or as \x{...} escapes if the /8 modifier was present on + the pattern. See below for the definition of non-printing characters. + If the pattern has the /+ modifier, the output for substring 0 is fol- + lowed by the the rest of the subject string, identified by "0+" like this: re> /cat/+ @@ -427,7 +443,7 @@ DEFAULT OUTPUT FROM PCRETEST 0: cat 0+ aract - If the pattern has the /g or /G modifier, the results of successive + If the pattern has the /g or /G modifier, the results of successive matching attempts are output in sequence, like this: re> /\Bi(\w\w)/g @@ -441,24 +457,24 @@ DEFAULT OUTPUT FROM PCRETEST "No match" is output only if the first match attempt fails. - If any of the sequences \C, \G, or \L are present in a data line that - is successfully matched, the substrings extracted by the convenience + If any of the sequences \C, \G, or \L are present in a data line that + is successfully matched, the substrings extracted by the convenience functions are output with C, G, or L after the string number instead of a colon. This is in addition to the normal full list. The string length - (that is, the return from the extraction function) is given in paren- + (that is, the return from the extraction function) is given in paren- theses after each string for \C and \G. Note that whereas patterns can be continued over several lines (a plain ">" prompt is used for continuations), data lines may not. However new- - lines can be included in data by means of the \n escape (or \r, \r\n, + lines can be included in data by means of the \n escape (or \r, \r\n, etc., depending on the newline sequence setting). OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION - When the alternative matching function, pcre_dfa_exec(), is used (by - means of the \D escape sequence or the -dfa command line option), the - output consists of a list of all the matches that start at the first + When the alternative matching function, pcre_dfa_exec(), is used (by + means of the \D escape sequence or the -dfa command line option), the + output consists of a list of all the matches that start at the first point in the subject where there is at least one match. For example: re> /(tang|tangerine|tan)/ @@ -467,8 +483,8 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION 1: tang 2: tan - (Using the normal matching function on this data finds only "tang".) - The longest matching string is always given first (and numbered zero). + (Using the normal matching function on this data finds only "tang".) + The longest matching string is always given first (and numbered zero). If /g is present on the pattern, the search for further matches resumes at the end of the longest match. For example: @@ -482,16 +498,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION 1: tan 0: tan - Since the matching function does not support substring capture, the - escape sequences that are concerned with captured substrings are not + Since the matching function does not support substring capture, the + escape sequences that are concerned with captured substrings are not relevant. RESTARTING AFTER A PARTIAL MATCH When the alternative matching function has given the PCRE_ERROR_PARTIAL - return, indicating that the subject partially matched the pattern, you - can restart the match with additional subject data by means of the \R + return, indicating that the subject partially matched the pattern, you + can restart the match with additional subject data by means of the \R escape sequence. For example: re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ @@ -500,30 +516,30 @@ RESTARTING AFTER A PARTIAL MATCH data> n05\R\D 0: n05 - For further information about partial matching, see the pcrepartial + For further information about partial matching, see the pcrepartial documentation. CALLOUTS - If the pattern contains any callout requests, pcretest's callout func- - tion is called during matching. This works with both matching func- + If the pattern contains any callout requests, pcretest's callout func- + tion is called during matching. This works with both matching func- tions. By default, the called function displays the callout number, the - start and current positions in the text at the callout time, and the + start and current positions in the text at the callout time, and the next pattern item to be tested. For example, the output --->pqrabcdef 0 ^ ^ \d - indicates that callout number 0 occurred for a match attempt starting - at the fourth character of the subject string, when the pointer was at - the seventh character of the data, and when the next pattern item was - \d. Just one circumflex is output if the start and current positions + indicates that callout number 0 occurred for a match attempt starting + at the fourth character of the subject string, when the pointer was at + the seventh character of the data, and when the next pattern item was + \d. Just one circumflex is output if the start and current positions are the same. Callouts numbered 255 are assumed to be automatic callouts, inserted as - a result of the /C pattern modifier. In this case, instead of showing - the callout number, the offset in the pattern, preceded by a plus, is + a result of the /C pattern modifier. In this case, instead of showing + the callout number, the offset in the pattern, preceded by a plus, is output. For example: re> /\d?[A-E]\*/C @@ -535,86 +551,86 @@ CALLOUTS +10 ^ ^ 0: E* - The callout function in pcretest returns zero (carry on matching) by - default, but you can use a \C item in a data line (as described above) + The callout function in pcretest returns zero (carry on matching) by + default, but you can use a \C item in a data line (as described above) to change this. - Inserting callouts can be helpful when using pcretest to check compli- - cated regular expressions. For further information about callouts, see + Inserting callouts can be helpful when using pcretest to check compli- + cated regular expressions. For further information about callouts, see the pcrecallout documentation. NON-PRINTING CHARACTERS - When pcretest is outputting text in the compiled version of a pattern, - bytes other than 32-126 are always treated as non-printing characters + When pcretest is outputting text in the compiled version of a pattern, + bytes other than 32-126 are always treated as non-printing characters are are therefore shown as hex escapes. - When pcretest is outputting text that is a matched part of a subject - string, it behaves in the same way, unless a different locale has been - set for the pattern (using the /L modifier). In this case, the + When pcretest is outputting text that is a matched part of a subject + string, it behaves in the same way, unless a different locale has been + set for the pattern (using the /L modifier). In this case, the isprint() function to distinguish printing and non-printing characters. SAVING AND RELOADING COMPILED PATTERNS - The facilities described in this section are not available when the + The facilities described in this section are not available when the POSIX inteface to PCRE is being used, that is, when the /P pattern mod- ifier is specified. When the POSIX interface is not in use, you can cause pcretest to write - a compiled pattern to a file, by following the modifiers with > and a + a compiled pattern to a file, by following the modifiers with > and a file name. For example: /pattern/im >/some/file - See the pcreprecompile documentation for a discussion about saving and + See the pcreprecompile documentation for a discussion about saving and re-using compiled patterns. - The data that is written is binary. The first eight bytes are the - length of the compiled pattern data followed by the length of the - optional study data, each written as four bytes in big-endian order - (most significant byte first). If there is no study data (either the + The data that is written is binary. The first eight bytes are the + length of the compiled pattern data followed by the length of the + optional study data, each written as four bytes in big-endian order + (most significant byte first). If there is no study data (either the pattern was not studied, or studying did not return any data), the sec- - ond length is zero. The lengths are followed by an exact copy of the + ond length is zero. The lengths are followed by an exact copy of the compiled pattern. If there is additional study data, this follows imme- - diately after the compiled pattern. After writing the file, pcretest + diately after the compiled pattern. After writing the file, pcretest expects to read a new pattern. A saved pattern can be reloaded into pcretest by specifing < and a file - name instead of a pattern. The name of the file must not contain a < - character, as otherwise pcretest will interpret the line as a pattern + name instead of a pattern. The name of the file must not contain a < + character, as otherwise pcretest will interpret the line as a pattern delimited by < characters. For example: re> </some/file Compiled regex loaded from /some/file No study data - When the pattern has been loaded, pcretest proceeds to read data lines + When the pattern has been loaded, pcretest proceeds to read data lines in the usual way. - You can copy a file written by pcretest to a different host and reload - it there, even if the new host has opposite endianness to the one on - which the pattern was compiled. For example, you can compile on an i86 + You can copy a file written by pcretest to a different host and reload + it there, even if the new host has opposite endianness to the one on + which the pattern was compiled. For example, you can compile on an i86 machine and run on a SPARC machine. - File names for saving and reloading can be absolute or relative, but - note that the shell facility of expanding a file name that starts with + File names for saving and reloading can be absolute or relative, but + note that the shell facility of expanding a file name that starts with a tilde (~) is not available. - The ability to save and reload files in pcretest is intended for test- - ing and experimentation. It is not intended for production use because - only a single pattern can be written to a file. Furthermore, there is - no facility for supplying custom character tables for use with a - reloaded pattern. If the original pattern was compiled with custom - tables, an attempt to match a subject string using a reloaded pattern - is likely to cause pcretest to crash. Finally, if you attempt to load + The ability to save and reload files in pcretest is intended for test- + ing and experimentation. It is not intended for production use because + only a single pattern can be written to a file. Furthermore, there is + no facility for supplying custom character tables for use with a + reloaded pattern. If the original pattern was compiled with custom + tables, an attempt to match a subject string using a reloaded pattern + is likely to cause pcretest to crash. Finally, if you attempt to load a file that is not in the correct format, the result is undefined. SEE ALSO - pcre(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcrepartial(d), + pcre(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcrepartial(d), pcrepattern(3), pcreprecompile(3). @@ -627,5 +643,5 @@ AUTHOR REVISION - Last updated: 11 September 2007 + Last updated: 19 November 2007 Copyright (c) 1997-2007 University of Cambridge. |