summaryrefslogtreecommitdiff
path: root/doc/html
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2009-10-19 14:38:48 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2009-10-19 14:38:48 +0000
commit6613bb987ce876549e6bfd94e62ce0d909879ff2 (patch)
treedb4118c102fb750976d5c2081a2c30e1a0dc2c7e /doc/html
parent606118f31912c2fbd660221f878db223287d3c5a (diff)
downloadpcre-6613bb987ce876549e6bfd94e62ce0d909879ff2.tar.gz
Final doc and source tidies for 8.00
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@469 2f5784b3-3f2a-0410-8824-cb99058d5e15
Diffstat (limited to 'doc/html')
-rw-r--r--doc/html/pcrepartial.html20
-rw-r--r--doc/html/pcrepattern.html44
2 files changed, 34 insertions, 30 deletions
diff --git a/doc/html/pcrepartial.html b/doc/html/pcrepartial.html
index 459464f..040ac88 100644
--- a/doc/html/pcrepartial.html
+++ b/doc/html/pcrepartial.html
@@ -165,7 +165,7 @@ so returns that when PCRE_PARTIAL_HARD is set.
</P>
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br>
<P>
-If a pattern ends with one of sequences \w or \W, which test for word
+If a pattern ends with one of sequences \b or \B, which test for word
boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive
results. Consider this pattern:
<pre>
@@ -269,7 +269,7 @@ Consider an unanchored pattern that matches dates:
data&#62; The date is 23ja\P
Partial match: 23ja
</pre>
-The this stage, an application could discard the text preceding "23ja", add on
+At this stage, an application could discard the text preceding "23ja", add on
text from the next segment, and call <b>pcre_exec()</b> again. Unlike
<b>pcre_dfa_exec()</b>, the entire matching string must always be available, and
the complete matching process occurs for each call, so more memory and more
@@ -347,7 +347,8 @@ matching multi-segment data. The example above then behaves differently:
<P>
4. Patterns that contain alternatives at the top level which do not all
start with the same pattern item may not work as expected when
-<b>pcre_dfa_exec()</b> is used. For example, consider this pattern:
+PCRE_DFA_RESTART is used with <b>pcre_dfa_exec()</b>. For example, consider this
+pattern:
<pre>
1234|3789
</pre>
@@ -363,7 +364,7 @@ patterns or patterns such as:
1234|ABCD
</pre>
where no string can be a partial match for both alternatives. This is not a
-problem if \fPpcre_exec()\fP is used, because the entire match has to be rerun
+problem if <b>pcre_exec()</b> is used, because the entire match has to be rerun
each time:
<pre>
re&#62; /1234|3789/
@@ -371,8 +372,13 @@ each time:
Partial match: 123
data&#62; 1237890
0: 3789
-
-</PRE>
+</pre>
+Of course, instead of using PCRE_DFA_PARTIAL, the same technique of re-running
+the entire match can also be used with <b>pcre_dfa_exec()</b>. Another
+possibility is to work with two buffers. If a partial match at offset <i>n</i>
+in the first buffer is followed by "no match" when PCRE_DFA_RESTART is used on
+the second buffer, you can then try a new match starting at offset <i>n+1</i> in
+the first buffer.
</P>
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
<P>
@@ -385,7 +391,7 @@ Cambridge CB2 3QH, England.
</P>
<br><a name="SEC11" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 29 September 2009
+Last updated: 19 October 2009
<br>
Copyright &copy; 1997-2009 University of Cambridge.
<br>
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
index 619024a..192014f 100644
--- a/doc/html/pcrepattern.html
+++ b/doc/html/pcrepattern.html
@@ -2050,27 +2050,24 @@ ways the + and * repeats can carve up the subject, and all have to be tested
before failure can be reported.
</P>
<P>
-At the end of a match, the values set for any capturing subpatterns are those
-from the outermost level of the recursion at which the subpattern value is set.
-If you want to obtain intermediate values, a callout function can be used (see
-below and the
+At the end of a match, the values of capturing parentheses are those from
+the outermost level. If you want to obtain intermediate values, a callout
+function can be used (see below and the
<a href="pcrecallout.html"><b>pcrecallout</b></a>
documentation). If the pattern above is matched against
<pre>
(ab(cd)ef)
</pre>
-the value for the capturing parentheses is "ef", which is the last value taken
-on at the top level. If additional parentheses are added, giving
-<pre>
- \( ( ( [^()]++ | (?R) )* ) \)
- ^ ^
- ^ ^
-</pre>
-the string they capture is "ab(cd)ef", the contents of the top level
-parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE
-has to obtain extra memory to store data during a recursion, which it does by
-using <b>pcre_malloc</b>, freeing it via <b>pcre_free</b> afterwards. If no
-memory can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
+the value for the inner capturing parentheses (numbered 2) is "ef", which is
+the last value taken on at the top level. If a capturing subpattern is not
+matched at the top level, its final value is unset, even if it is (temporarily)
+set at a deeper level.
+</P>
+<P>
+If there are more than 15 capturing parentheses in a pattern, PCRE has to
+obtain extra memory to store data during a recursion, which it does by using
+<b>pcre_malloc</b>, freeing it via <b>pcre_free</b> afterwards. If no memory can
+be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
</P>
<P>
Do not confuse the (?R) item with the condition (R), which tests for recursion.
@@ -2183,10 +2180,11 @@ is used, it does match "sense and responsibility" as well as the other two
strings. Another example is given in the discussion of DEFINE above.
</P>
<P>
-Like recursive subpatterns, a "subroutine" call is always treated as an atomic
+Like recursive subpatterns, a subroutine call is always treated as an atomic
group. That is, once it has matched some of the subject string, it is never
re-entered, even if it contains untried alternatives and there is a subsequent
-matching failure.
+matching failure. Any capturing parentheses that are set during the subroutine
+call revert to their previous values afterwards.
</P>
<P>
When a subpattern is used as a subroutine, processing options such as
@@ -2267,10 +2265,10 @@ failing negative assertion, they cause an error if encountered by
<b>pcre_dfa_exec()</b>.
</P>
<P>
-If any of these verbs are used in an assertion subpattern, their effect is
-confined to that subpattern; it does not extend to the surrounding pattern.
-Note that assertion subpatterns are processed as anchored at the point where
-they are tested.
+If any of these verbs are used in an assertion or subroutine subpattern
+(including recursive subpatterns), their effect is confined to that subpattern;
+it does not extend to the surrounding pattern. Note that such subpatterns are
+processed as anchored at the point where they are tested.
</P>
<P>
The new verbs make use of what was previously invalid syntax: an opening
@@ -2388,7 +2386,7 @@ Cambridge CB2 3QH, England.
</P>
<br><a name="SEC28" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 04 October 2009
+Last updated: 18 October 2009
<br>
Copyright &copy; 1997-2009 University of Cambridge.
<br>