summaryrefslogtreecommitdiff
path: root/doc/html
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2018-04-27 16:48:35 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2018-04-27 16:48:35 +0000
commit4ebfdd3679ae46341b25f1aa3ba95480d6c514d1 (patch)
treed5ae62b471ade56cf26b5de716bdbfbf93ca7bf0 /doc/html
parent9a167eac7981483a4b1636e1ac3497965cecc8d7 (diff)
downloadpcre2-4ebfdd3679ae46341b25f1aa3ba95480d6c514d1.tar.gz
Re-factor pcre2_dfa_match() to use the heap instead of the stack for workspace
vectors when doing recursive function calls. git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@932 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html')
-rw-r--r--doc/html/NON-AUTOTOOLS-BUILD.txt19
-rw-r--r--doc/html/README.txt12
-rw-r--r--doc/html/pcre2_dfa_match.html6
-rw-r--r--doc/html/pcre2api.html69
-rw-r--r--doc/html/pcre2build.html9
-rw-r--r--doc/html/pcre2callout.html14
-rw-r--r--doc/html/pcre2pattern.html39
-rw-r--r--doc/html/pcre2perform.html18
-rw-r--r--doc/html/pcre2test.html58
9 files changed, 145 insertions, 99 deletions
diff --git a/doc/html/NON-AUTOTOOLS-BUILD.txt b/doc/html/NON-AUTOTOOLS-BUILD.txt
index 0775794..0bf4507 100644
--- a/doc/html/NON-AUTOTOOLS-BUILD.txt
+++ b/doc/html/NON-AUTOTOOLS-BUILD.txt
@@ -10,6 +10,7 @@ This document contains the following sections:
Calling conventions in Windows environments
Comments about Win32 builds
Building PCRE2 on Windows with CMake
+ Building PCRE2 on Windows with Visual Studio
Testing with RunTest.bat
Building PCRE2 on native z/OS and z/VM
@@ -328,6 +329,18 @@ cache can be deleted by selecting "File > Delete Cache".
most recent build configuration is targeted by the tests. A summary of
test results is presented. Complete test output is subsequently
available for review in Testing\Temporary under your build dir.
+
+
+BUILDING PCRE2 ON WINDOWS WITH VISUAL STUDIO
+
+The code currently cannot be compiled without a stdint.h header, which is
+available only in relatively recent versions of Visual Studio. However, this
+portable and permissively-licensed implementation of the header worked without
+issue:
+
+ http://www.azillionmonkeys.com/qed/pstdint.h
+
+Just rename it and drop it into the top level of the build tree.
TESTING WITH RUNTEST.BAT
@@ -382,6 +395,6 @@ Everything in that location, source and executable, is in EBCDIC and native
z/OS file formats. The port provides an API for LE languages such as COBOL and
for the z/OS and z/VM versions of the Rexx languages.
-===============================
-Last Updated: 13 September 2017
-===============================
+===========================
+Last Updated: 19 April 2018
+===========================
diff --git a/doc/html/README.txt b/doc/html/README.txt
index 66b756b..e4729ac 100644
--- a/doc/html/README.txt
+++ b/doc/html/README.txt
@@ -241,9 +241,11 @@ library. They are also documented in the pcre2build man page.
discussion in the pcre2api man page (search for pcre2_set_match_limit).
. There is a separate counter that limits the depth of nested backtracking
- during a matching process, which indirectly limits the amount of heap memory
- that is used. This also has a default of ten million, which is essentially
- "unlimited". You can change the default by setting, for example,
+ (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
+ matching process, which indirectly limits the amount of heap memory that is
+ used, and in the case of pcre2_dfa_match() the amount of stack as well. This
+ counter also has a default of ten million, which is essentially "unlimited".
+ You can change the default by setting, for example,
--with-match-limit-depth=5000
@@ -251,7 +253,7 @@ library. They are also documented in the pcre2build man page.
pcre2_set_depth_limit).
. You can also set an explicit limit on the amount of heap memory used by
- the pcre2_match() interpreter:
+ the pcre2_match() and pcre2_dfa_match() interpreters:
--with-heap-limit=500
@@ -885,4 +887,4 @@ The distribution should contain the files listed below.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 25 February 2018
+Last updated: 27 April 2018
diff --git a/doc/html/pcre2_dfa_match.html b/doc/html/pcre2_dfa_match.html
index 36d7976..8702cca 100644
--- a/doc/html/pcre2_dfa_match.html
+++ b/doc/html/pcre2_dfa_match.html
@@ -46,9 +46,9 @@ just once (except when processing lookaround assertions). This function is
<i>wscount</i> Number of elements in the vector
</pre>
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
-up a callout function or specify the match and/or the recursion depth limits.
-The <i>length</i> and <i>startoffset</i> values are code units, not characters.
-The options are:
+up a callout function or specify the heap limit or the match or the recursion
+depth limits. The <i>length</i> and <i>startoffset</i> values are code units, not
+characters. The options are:
<pre>
PCRE2_ANCHORED Match only at the first position
PCRE2_ENDANCHORED Pattern can match only at end of subject
diff --git a/doc/html/pcre2api.html b/doc/html/pcre2api.html
index ba3b2ca..7498afb 100644
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@@ -951,14 +951,15 @@ offset limit. In other words, whichever limit comes first is used.
<br>
The <i>heap_limit</i> parameter specifies, in units of kilobytes, the maximum
amount of heap memory that <b>pcre2_match()</b> may use to hold backtracking
-information when running an interpretive match. This limit does not apply to
-matching with the JIT optimization, which has its own memory control
-arrangements (see the
+information when running an interpretive match. This limit also applies to
+<b>pcre2_dfa_match()</b>, which may use the heap when processing patterns with a
+lot of nested pattern recursion or lookarounds or atomic groups. This limit
+does not apply to matching with the JIT optimization, which has its own memory
+control arrangements (see the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
-documentation for more details), nor does it apply to <b>pcre2_dfa_match()</b>.
-If the limit is reached, the negative error code PCRE2_ERROR_HEAPLIMIT is
-returned. The default limit is set when PCRE2 is built; the default default is
-very large and is essentially "unlimited".
+documentation for more details). If the limit is reached, the negative error
+code PCRE2_ERROR_HEAPLIMIT is returned. The default limit is set when PCRE2 is
+built; the default default is very large and is essentially "unlimited".
</P>
<P>
A value for the heap limit may also be supplied by an item at the start of a
@@ -978,6 +979,12 @@ Heap memory is used only if the initial vector is too small. If the heap limit
is set to a value less than 21 (in particular, zero) no heap memory will be
used. In this case, only patterns that do not have a lot of nested backtracking
can be successfully processed.
+</P>
+<P>
+Similarly, for <b>pcre2_dfa_match()</b>, a vector on the system stack is used
+when processing pattern recursions, lookarounds, or atomic groups, and only if
+this is not big enough is heap memory used. In this case, too, setting a value
+of zero disables the use of the heap.
<br>
<br>
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
@@ -1035,11 +1042,22 @@ backtracking.
<P>
The depth limit is not relevant, and is ignored, when matching is done using
JIT compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which
-uses it to limit the depth of internal recursive function calls that implement
-atomic groups, lookaround assertions, and pattern recursions. This is,
-therefore, an indirect limit on the amount of system stack that is used. A
-recursive pattern such as /(.)(?1)/, when matched to a very long string using
-<b>pcre2_dfa_match()</b>, can use a great deal of stack.
+uses it to limit the depth of nested internal recursive function calls that
+implement atomic groups, lookaround assertions, and pattern recursions. This
+limits, indirectly, the amount of system stack this is used. It was more useful
+in versions before 10.32, when stack memory was used for local workspace
+vectors for recursive function calls. From version 10.32, only local variables
+are allocated on the stack and as each call uses only a few hundred bytes, even
+a small stack can support quite a lot of recursion.
+</P>
+<P>
+If the depth of internal recursive function calls is great enough, local
+workspace vectors are allocated on the heap from version 10.32 onwards, so the
+depth limit also indirectly limits the amount of heap memory that is used. A
+recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string
+using <b>pcre2_dfa_match()</b>, can use a great deal of memory. However, it is
+probably better to limit heap usage directly by calling
+<b>pcre2_set_heap_limit()</b>.
</P>
<P>
The default value for the depth limit can be set when PCRE2 is built; the
@@ -1096,15 +1114,16 @@ and the 2-bit and 4-bit indicate 16-bit and 32-bit support, respectively.
PCRE2_CONFIG_DEPTHLIMIT
</pre>
The output is a uint32_t integer that gives the default limit for the depth of
-nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions
-and lookarounds in <b>pcre2_dfa_match()</b>. Further details are given with
-<b>pcre2_set_depth_limit()</b> above.
+nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions,
+lookarounds, and atomic groups in <b>pcre2_dfa_match()</b>. Further details are
+given with <b>pcre2_set_depth_limit()</b> above.
<pre>
PCRE2_CONFIG_HEAPLIMIT
</pre>
The output is a uint32_t integer that gives, in kilobytes, the default limit
-for the amount of heap memory used by <b>pcre2_match()</b>. Further details are
-given with <b>pcre2_set_heap_limit()</b> above.
+for the amount of heap memory used by <b>pcre2_match()</b> or
+<b>pcre2_dfa_match()</b>. Further details are given with
+<b>pcre2_set_heap_limit()</b> above.
<pre>
PCRE2_CONFIG_JIT
</pre>
@@ -3510,17 +3529,7 @@ capture.
Calls to the convenience functions that extract substrings by name
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
DFA match. The convenience functions that extract substrings by number never
-return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
-slightly different:
-<pre>
- PCRE2_ERROR_UNAVAILABLE
-</pre>
-The ovector is not big enough to include a slot for the given substring number.
-<pre>
- PCRE2_ERROR_UNSET
-</pre>
-There is a slot in the ovector for this substring, but there were insufficient
-matches to fill it.
+return PCRE2_ERROR_NOSUBSTRING.
</P>
<P>
The matched strings are stored in the ovector in reverse order of length; that
@@ -3594,9 +3603,9 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 31 December 2017
+Last updated: 27 April 2018
<br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/html/pcre2build.html b/doc/html/pcre2build.html
index edf24e8..c9d9324 100644
--- a/doc/html/pcre2build.html
+++ b/doc/html/pcre2build.html
@@ -295,9 +295,10 @@ change this by a setting such as
--with-heap-limit=500
</pre>
which limits the amount of heap to 500 kilobytes. This limit applies only to
-interpretive matching in pcre2_match(). It does not apply when JIT (which has
-its own memory arrangements) is used, nor does it apply to
-<b>pcre2_dfa_match()</b>.
+interpretive matching in <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, which
+may also use the heap for internal workspace when processing complicated
+patterns. This limit does not apply when JIT (which has its own memory
+arrangements) is used.
</P>
<P>
You can also explicitly limit the depth of nested backtracking in the
@@ -573,7 +574,7 @@ Cambridge, England.
</P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 25 February 2018
+Last updated: 26 April 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>
diff --git a/doc/html/pcre2callout.html b/doc/html/pcre2callout.html
index 2adf21a..4ff1673 100644
--- a/doc/html/pcre2callout.html
+++ b/doc/html/pcre2callout.html
@@ -310,10 +310,12 @@ PCRE2_UNSET.
</P>
<P>
For DFA matching, the <i>offset_vector</i> field points to the ovector that was
-passed to the matching function in the match data block, but it holds no useful
-information at callout time because <b>pcre2_dfa_match()</b> does not support
-substring capturing. The value of <i>capture_top</i> is always 1 and the value
-of <i>capture_last</i> is always 0 for DFA matching.
+passed to the matching function in the match data block for callouts at the top
+level, but to an internal ovector during the processing of pattern recursions,
+lookarounds, and atomic groups. However, these ovectors hold no useful
+information because <b>pcre2_dfa_match()</b> does not support substring
+capturing. The value of <i>capture_top</i> is always 1 and the value of
+<i>capture_last</i> is always 0 for DFA matching.
</P>
<P>
The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
@@ -461,9 +463,9 @@ Cambridge, England.
</P>
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 22 December 2017
+Last updated: 26 April 2018
<br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html
index c495cba..1131c2a 100644
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@@ -173,12 +173,12 @@ the application to apply the JIT optimization by calling
Setting match resource limits
</b><br>
<P>
-The pcre2_match() function contains a counter that is incremented every time it
-goes round its main loop. The caller of <b>pcre2_match()</b> can set a limit on
-this counter, which therefore limits the amount of computing resource used for
-a match. The maximum depth of nested backtracking can also be limited; this
-indirectly restricts the amount of heap memory that is used, but there is also
-an explicit memory limit that can be set.
+The <b>pcre2_match()</b> function contains a counter that is incremented every
+time it goes round its main loop. The caller of <b>pcre2_match()</b> can set a
+limit on this counter, which therefore limits the amount of computing resource
+used for a match. The maximum depth of nested backtracking can also be limited;
+this indirectly restricts the amount of heap memory that is used, but there is
+also an explicit memory limit that can be set.
</P>
<P>
These facilities are provided to catch runaway matches that are provoked by
@@ -195,20 +195,22 @@ where d is any number of decimal digits. However, the value of the setting must
be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one
-setting of one of these limits, the lower value is used.
+setting of one of these limits, the lower value is used. The heap limit is
+specified in kilobytes.
</P>
<P>
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
still recognized for backwards compatibility.
</P>
<P>
-The heap limit applies only when the <b>pcre2_match()</b> interpreter is used
-for matching. It does not apply to JIT or DFA matching. The match limit is used
-(but in a different way) when JIT is being used, or when
-<b>pcre2_dfa_match()</b> is called, to limit computing resource usage by those
-matching functions. The depth limit is ignored by JIT but is relevant for DFA
-matching, which uses function recursion for recursions within the pattern. In
-this case, the depth limit controls the amount of system stack that is used.
+The heap limit applies only when the <b>pcre2_match()</b> or
+<b>pcre2_dfa_match()</b> interpreters are used for matching. It does not apply
+to JIT. The match limit is used (but in a different way) when JIT is being
+used, or when <b>pcre2_dfa_match()</b> is called, to limit computing resource
+usage by those matching functions. The depth limit is ignored by JIT but is
+relevant for DFA matching, which uses function recursion for recursions within
+the pattern and for lookaround assertions and atomic groups. In this case, the
+depth limit controls the depth of such recursion.
<a name="newlines"></a></P>
<br><b>
Newline conventions
@@ -2818,11 +2820,6 @@ matched at the top level, its final captured value is unset, even if it was
(temporarily) set at a deeper level during the matching process.
</P>
<P>
-If there are more than 15 capturing parentheses in a pattern, PCRE2 has to
-obtain extra memory from the heap to store data during a recursion. If no
-memory can be obtained, the match fails with the PCRE2_ERROR_NOMEMORY error.
-</P>
-<P>
Do not confuse the (?R) item with the condition (R), which tests for recursion.
Consider this pattern, which matches text in angle brackets, allowing for
arbitrary nesting. Only digits are allowed in nested brackets (that is, when
@@ -3479,9 +3476,9 @@ Cambridge, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 12 September 2017
+Last updated: 25 April 2018
<br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/html/pcre2perform.html b/doc/html/pcre2perform.html
index 28f4f73..7ff3b87 100644
--- a/doc/html/pcre2perform.html
+++ b/doc/html/pcre2perform.html
@@ -93,9 +93,17 @@ may also reduce the memory requirements.
<P>
In contrast to <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b> does use recursive
function calls, but only for processing atomic groups, lookaround assertions,
-and recursion within the pattern. Too much nested recursion may cause stack
-issues. The "match depth" parameter can be used to limit the depth of function
-recursion in <b>pcre2_dfa_match()</b>.
+and recursion within the pattern. The original version of the code used to
+allocate quite large internal workspace vectors on the stack, which caused some
+problems for some patterns in environments with small stacks. From release
+10.32 the code for <b>pcre2_dfa_match()</b> has been re-factored to use heap
+memory when necessary for internal workspace when recursing, though recursive
+function calls are still used.
+</P>
+<P>
+The "match depth" parameter can be used to limit the depth of function
+recursion, and the "match heap" parameter to limit heap memory in
+<b>pcre2_dfa_match()</b>.
</P>
<br><a name="SEC4" href="#TOC1">PROCESSING TIME</a><br>
<P>
@@ -244,9 +252,9 @@ Cambridge, England.
</P>
<br><a name="SEC6" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 08 April 2017
+Last updated: 25 April 2018
<br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html
index 7d98d90..d6e5345 100644
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@@ -1199,7 +1199,7 @@ pattern.
get=&#60;number or name&#62; extract captured substring
getall extract all captured substrings
/g global global matching
- heap_limit=&#60;n&#62; set a limit on heap memory
+ heap_limit=&#60;n&#62; set a limit on heap memory (Kbytes)
jitstack=&#60;n&#62; set size of JIT stack
mark show mark values
match_limit=&#60;n&#62; set a match limit
@@ -1438,20 +1438,17 @@ Finding minimum limits
<P>
If the <b>find_limits</b> modifier is present on a subject line, <b>pcre2test</b>
calls the relevant matching function several times, setting different values in
-the match context via <b>pcre2_set_heap_limit(), \fBpcre2_set_match_limit()</b>,
-or <b>pcre2_set_depth_limit()</b> until it finds the minimum values for each
-parameter that allows the match to complete without error.
+the match context via <b>pcre2_set_heap_limit()</b>,
+<b>pcre2_set_match_limit()</b>, or <b>pcre2_set_depth_limit()</b> until it finds
+the minimum values for each parameter that allows the match to complete without
+error. If JIT is being used, only the match limit is relevant.
</P>
<P>
-If JIT is being used, only the match limit is relevant. If DFA matching is
-being used, only the depth limit is relevant.
-</P>
-<P>
-The <i>match_limit</i> number is a measure of the amount of backtracking
-that takes place, and learning the minimum value can be instructive. For most
-simple matches, the number is quite small, but for patterns with very large
-numbers of matching possibilities, it can become large very quickly with
-increasing length of subject string.
+When using this modifier, the pattern should not contain any limit settings
+such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
+lower than the minimum matching value, the minimum value cannot be found
+because <b>pcre2_set_match_limit()</b> etc. are only able to reduce the value of
+an in-pattern limit; they cannot increase it.
</P>
<P>
For non-DFA matching, the minimum <i>depth_limit</i> number is a measure of how
@@ -1460,6 +1457,22 @@ searched). In the case of DFA matching, <i>depth_limit</i> controls the depth of
recursive calls of the internal function that is used for handling pattern
recursion, lookaround assertions, and atomic groups.
</P>
+<P>
+For non-DFA matching, the <i>match_limit</i> number is a measure of the amount
+of backtracking that takes place, and learning the minimum value can be
+instructive. For most simple matches, the number is quite small, but for
+patterns with very large numbers of matching possibilities, it can become large
+very quickly with increasing length of subject string. In the case of DFA
+matching, <i>match_limit</i> controls the total number of calls, both recursive
+and non-recursive, to the internal matching function, thus controlling the
+overall amount of computing resource that is used.
+</P>
+<P>
+For both kinds of matching, the <i>heap_limit</i> number (which is in kilobytes)
+limits the amount of heap memory used for matching. A value of zero disables
+the use of any heap memory; many simple pattern matches can be done without
+using the heap, so this is not an unreasonable setting.
+</P>
<br><b>
Showing MARK names
</b><br>
@@ -1476,13 +1489,14 @@ Showing memory usage
<P>
The <b>memory</b> modifier causes <b>pcre2test</b> to log the sizes of all heap
memory allocation and freeing calls that occur during a call to
-<b>pcre2_match()</b>. These occur only when a match requires a bigger vector
-than the default for remembering backtracking points. In many cases there will
-be no heap memory used and therefore no additional output. No heap memory is
-allocated during matching with <b>pcre2_dfa_match</b> or with JIT, so in those
-cases the <b>memory</b> modifier never has any effect. For this modifier to
-work, the <b>null_context</b> modifier must not be set on both the pattern and
-the subject, though it can be set on one or the other.
+<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>. These occur only when a match
+requires a bigger vector than the default for remembering backtracking points
+(<b>pcre2_match()</b>) or for internal workspace (<b>pcre2_dfa_match()</b>). In
+many cases there will be no heap memory used and therefore no additional
+output. No heap memory is allocated during matching with JIT, so in that case
+the <b>memory</b> modifier never has any effect. For this modifier to work, the
+<b>null_context</b> modifier must not be set on both the pattern and the
+subject, though it can be set on one or the other.
</P>
<br><b>
Setting a starting offset
@@ -1982,9 +1996,9 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 21 December 2017
+Last updated: 25 April 2018
<br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.