diff options
author | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2009-10-19 14:38:48 +0000 |
---|---|---|
committer | ph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15> | 2009-10-19 14:38:48 +0000 |
commit | 6613bb987ce876549e6bfd94e62ce0d909879ff2 (patch) | |
tree | db4118c102fb750976d5c2081a2c30e1a0dc2c7e /doc/html | |
parent | 606118f31912c2fbd660221f878db223287d3c5a (diff) | |
download | pcre-6613bb987ce876549e6bfd94e62ce0d909879ff2.tar.gz |
Final doc and source tidies for 8.00
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@469 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc/html')
-rw-r--r-- | doc/html/pcrepartial.html | 20 | ||||
-rw-r--r-- | doc/html/pcrepattern.html | 44 |
2 files changed, 34 insertions, 30 deletions
diff --git a/doc/html/pcrepartial.html b/doc/html/pcrepartial.html index 459464f..040ac88 100644 --- a/doc/html/pcrepartial.html +++ b/doc/html/pcrepartial.html @@ -165,7 +165,7 @@ so returns that when PCRE_PARTIAL_HARD is set. </P> <br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br> <P> -If a pattern ends with one of sequences \w or \W, which test for word +If a pattern ends with one of sequences \b or \B, which test for word boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive results. Consider this pattern: <pre> @@ -269,7 +269,7 @@ Consider an unanchored pattern that matches dates: data> The date is 23ja\P Partial match: 23ja </pre> -The this stage, an application could discard the text preceding "23ja", add on +At this stage, an application could discard the text preceding "23ja", add on text from the next segment, and call <b>pcre_exec()</b> again. Unlike <b>pcre_dfa_exec()</b>, the entire matching string must always be available, and the complete matching process occurs for each call, so more memory and more @@ -347,7 +347,8 @@ matching multi-segment data. The example above then behaves differently: <P> 4. Patterns that contain alternatives at the top level which do not all start with the same pattern item may not work as expected when -<b>pcre_dfa_exec()</b> is used. For example, consider this pattern: +PCRE_DFA_RESTART is used with <b>pcre_dfa_exec()</b>. For example, consider this +pattern: <pre> 1234|3789 </pre> @@ -363,7 +364,7 @@ patterns or patterns such as: 1234|ABCD </pre> where no string can be a partial match for both alternatives. This is not a -problem if \fPpcre_exec()\fP is used, because the entire match has to be rerun +problem if <b>pcre_exec()</b> is used, because the entire match has to be rerun each time: <pre> re> /1234|3789/ @@ -371,8 +372,13 @@ each time: Partial match: 123 data> 1237890 0: 3789 - -</PRE> +</pre> +Of course, instead of using PCRE_DFA_PARTIAL, the same technique of re-running +the entire match can also be used with <b>pcre_dfa_exec()</b>. Another +possibility is to work with two buffers. If a partial match at offset <i>n</i> +in the first buffer is followed by "no match" when PCRE_DFA_RESTART is used on +the second buffer, you can then try a new match starting at offset <i>n+1</i> in +the first buffer. </P> <br><a name="SEC10" href="#TOC1">AUTHOR</a><br> <P> @@ -385,7 +391,7 @@ Cambridge CB2 3QH, England. </P> <br><a name="SEC11" href="#TOC1">REVISION</a><br> <P> -Last updated: 29 September 2009 +Last updated: 19 October 2009 <br> Copyright © 1997-2009 University of Cambridge. <br> diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html index 619024a..192014f 100644 --- a/doc/html/pcrepattern.html +++ b/doc/html/pcrepattern.html @@ -2050,27 +2050,24 @@ ways the + and * repeats can carve up the subject, and all have to be tested before failure can be reported. </P> <P> -At the end of a match, the values set for any capturing subpatterns are those -from the outermost level of the recursion at which the subpattern value is set. -If you want to obtain intermediate values, a callout function can be used (see -below and the +At the end of a match, the values of capturing parentheses are those from +the outermost level. If you want to obtain intermediate values, a callout +function can be used (see below and the <a href="pcrecallout.html"><b>pcrecallout</b></a> documentation). If the pattern above is matched against <pre> (ab(cd)ef) </pre> -the value for the capturing parentheses is "ef", which is the last value taken -on at the top level. If additional parentheses are added, giving -<pre> - \( ( ( [^()]++ | (?R) )* ) \) - ^ ^ - ^ ^ -</pre> -the string they capture is "ab(cd)ef", the contents of the top level -parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE -has to obtain extra memory to store data during a recursion, which it does by -using <b>pcre_malloc</b>, freeing it via <b>pcre_free</b> afterwards. If no -memory can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error. +the value for the inner capturing parentheses (numbered 2) is "ef", which is +the last value taken on at the top level. If a capturing subpattern is not +matched at the top level, its final value is unset, even if it is (temporarily) +set at a deeper level. +</P> +<P> +If there are more than 15 capturing parentheses in a pattern, PCRE has to +obtain extra memory to store data during a recursion, which it does by using +<b>pcre_malloc</b>, freeing it via <b>pcre_free</b> afterwards. If no memory can +be obtained, the match fails with the PCRE_ERROR_NOMEMORY error. </P> <P> Do not confuse the (?R) item with the condition (R), which tests for recursion. @@ -2183,10 +2180,11 @@ is used, it does match "sense and responsibility" as well as the other two strings. Another example is given in the discussion of DEFINE above. </P> <P> -Like recursive subpatterns, a "subroutine" call is always treated as an atomic +Like recursive subpatterns, a subroutine call is always treated as an atomic group. That is, once it has matched some of the subject string, it is never re-entered, even if it contains untried alternatives and there is a subsequent -matching failure. +matching failure. Any capturing parentheses that are set during the subroutine +call revert to their previous values afterwards. </P> <P> When a subpattern is used as a subroutine, processing options such as @@ -2267,10 +2265,10 @@ failing negative assertion, they cause an error if encountered by <b>pcre_dfa_exec()</b>. </P> <P> -If any of these verbs are used in an assertion subpattern, their effect is -confined to that subpattern; it does not extend to the surrounding pattern. -Note that assertion subpatterns are processed as anchored at the point where -they are tested. +If any of these verbs are used in an assertion or subroutine subpattern +(including recursive subpatterns), their effect is confined to that subpattern; +it does not extend to the surrounding pattern. Note that such subpatterns are +processed as anchored at the point where they are tested. </P> <P> The new verbs make use of what was previously invalid syntax: an opening @@ -2388,7 +2386,7 @@ Cambridge CB2 3QH, England. </P> <br><a name="SEC28" href="#TOC1">REVISION</a><br> <P> -Last updated: 04 October 2009 +Last updated: 18 October 2009 <br> Copyright © 1997-2009 University of Cambridge. <br> |