summaryrefslogtreecommitdiff
path: root/doc/html/pcre2pattern.html
diff options
context:
space:
mode:
authorph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2018-06-17 14:13:28 +0000
committerph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>2018-06-17 14:13:28 +0000
commit1326caa549bd96e614b91db87fffee2a4de07dfc (patch)
tree8aca4b7bd292cbc509c930d29f14acdb0241091b /doc/html/pcre2pattern.html
parenta2e7b9bd05a1b3eed13b4b94b7d32b592642cfcc (diff)
downloadpcre2-1326caa549bd96e614b91db87fffee2a4de07dfc.tar.gz
Typos in documentation and comments noted by Jason Hood.
git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@936 6239d852-aaf2-0410-a92c-79f79f948069
Diffstat (limited to 'doc/html/pcre2pattern.html')
-rw-r--r--doc/html/pcre2pattern.html106
1 files changed, 53 insertions, 53 deletions
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html
index 1131c2a..bd71551 100644
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@@ -31,7 +31,7 @@ please consult the man page, in case the conversion went wrong.
<li><a name="TOC16" href="#SEC16">NAMED SUBPATTERNS</a>
<li><a name="TOC17" href="#SEC17">REPETITION</a>
<li><a name="TOC18" href="#SEC18">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
-<li><a name="TOC19" href="#SEC19">BACK REFERENCES</a>
+<li><a name="TOC19" href="#SEC19">BACKREFERENCES</a>
<li><a name="TOC20" href="#SEC20">ASSERTIONS</a>
<li><a name="TOC21" href="#SEC21">CONDITIONAL SUBPATTERNS</a>
<li><a name="TOC22" href="#SEC22">COMMENTS</a>
@@ -196,7 +196,7 @@ be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
for it to have any effect. In other words, the pattern writer can lower the
limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used. The heap limit is
-specified in kilobytes.
+specified in kibibytes (units of 1024 bytes).
</P>
<P>
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
@@ -342,7 +342,7 @@ In particular, if you want to match a backslash, you write \\.
</P>
<P>
In a UTF mode, only ASCII numbers and letters have any special meaning after a
-backslash. All other characters (in particular, those whose codepoints are
+backslash. All other characters (in particular, those whose code points are
greater than 127) are treated as literals.
</P>
<P>
@@ -390,7 +390,7 @@ these escapes are as follows:
\r carriage return (hex 0D)
\t tab (hex 09)
\0dd character with octal code 0dd
- \ddd character with octal code ddd, or back reference
+ \ddd character with octal code ddd, or backreference
\o{ddd..} character with octal code ddd..
\xhh character with hex code hh
\x{hhh..} character with hex code hhh.. (default mode)
@@ -438,13 +438,13 @@ follows is itself an octal digit.
The escape \o must be followed by a sequence of octal digits, enclosed in
braces. An error occurs if this is not the case. This escape is a recent
addition to Perl; it provides way of specifying character code points as octal
-numbers greater than 0777, and it also allows octal numbers and back references
+numbers greater than 0777, and it also allows octal numbers and backreferences
to be unambiguously specified.
</P>
<P>
For greater clarity and unambiguity, it is best to avoid following \ by a
digit greater than zero. Instead, use \o{} or \x{} to specify character
-numbers, and \g{} to specify back references. The following paragraphs
+numbers, and \g{} to specify backreferences. The following paragraphs
describe the old, ambiguous syntax.
</P>
<P>
@@ -455,7 +455,7 @@ and Perl has changed over time, causing PCRE2 also to change.
Outside a character class, PCRE2 reads the digit and any following digits as a
decimal number. If the number is less than 10, begins with the digit 8 or 9, or
if there are at least that many previous capturing left parentheses in the
-expression, the entire sequence is taken as a <i>back reference</i>. A
+expression, the entire sequence is taken as a <i>backreference</i>. A
description of how this works is given
<a href="#backreferences">later,</a>
following the discussion of
@@ -470,13 +470,13 @@ for themselves. For example, outside a character class:
<pre>
\040 is another way of writing an ASCII space
\40 is the same, provided there are fewer than 40 previous capturing subpatterns
- \7 is always a back reference
- \11 might be a back reference, or another way of writing a tab
+ \7 is always a backreference
+ \11 might be a backreference, or another way of writing a tab
\011 is always a tab
\0113 is a tab followed by the character "3"
- \113 might be a back reference, otherwise the character with octal code 113
- \377 might be a back reference, otherwise the value 255 (decimal)
- \81 is always a back reference .sp
+ \113 might be a backreference, otherwise the character with octal code 113
+ \377 might be a backreference, otherwise the value 255 (decimal)
+ \81 is always a backreference .sp
</pre>
Note that octal values of 100 or greater that are specified using this syntax
must not be introduced by a leading zero, because no more than three octal
@@ -512,10 +512,10 @@ limited to certain values, as follows:
8-bit non-UTF mode no greater than 0xff
16-bit non-UTF mode no greater than 0xffff
32-bit non-UTF mode no greater than 0xffffffff
- All UTF modes no greater than 0x10ffff and a valid codepoint
+ All UTF modes no greater than 0x10ffff and a valid code point
</pre>
-Invalid Unicode codepoints are all those in the range 0xd800 to 0xdfff (the
-so-called "surrogate" codepoints). The check for these can be disabled by the
+Invalid Unicode code points are all those in the range 0xd800 to 0xdfff (the
+so-called "surrogate" code points). The check for these can be disabled by the
caller of <b>pcre2_compile()</b> by setting the option
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
</P>
@@ -544,12 +544,12 @@ is set, \U matches a "U" character, and \u can be used to define a character
by code point, as described above.
</P>
<br><b>
-Absolute and relative back references
+Absolute and relative backreferences
</b><br>
<P>
The sequence \g followed by a signed or unsigned number, optionally enclosed
-in braces, is an absolute or relative back reference. A named back reference
-can be coded as \g{name}. Back references are discussed
+in braces, is an absolute or relative backreference. A named backreference
+can be coded as \g{name}. backreferences are discussed
<a href="#backreferences">later,</a>
following the discussion of
<a href="#subpattern">parenthesized subpatterns.</a>
@@ -563,7 +563,7 @@ a number enclosed either in angle brackets or single quotes, is an alternative
syntax for referencing a subpattern as a "subroutine". Details are discussed
<a href="#onigurumasubroutines">later.</a>
Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
-synonymous. The former is a back reference; the latter is a
+synonymous. The former is a backreference; the latter is a
<a href="#subpatternsassubroutines">subroutine</a>
call.
<a name="genericchartypes"></a></P>
@@ -694,7 +694,7 @@ line, U+0085). Because this is an atomic group, the two-character sequence is
treated as a single unit that cannot be split.
</P>
<P>
-In other modes, two additional characters whose codepoints are greater than 255
+In other modes, two additional characters whose code points are greater than 255
are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029).
Unicode support is not needed for these characters to be recognized.
</P>
@@ -729,8 +729,8 @@ Unicode character properties
When PCRE2 is built with Unicode support (the default), three additional escape
sequences that match characters with specific properties are available. In
8-bit non-UTF-8 mode, these sequences are of course limited to testing
-characters whose codepoints are less than 256, but they do work in this mode.
-In 32-bit non-UTF mode, codepoints greater than 0x10ffff (the Unicode limit)
+characters whose code points are less than 256, but they do work in this mode.
+In 32-bit non-UTF mode, code points greater than 0x10ffff (the Unicode limit)
may be encountered. These are all treated as being in the Common script and
with an unassigned type. The extra escape sequences are:
<pre>
@@ -1037,7 +1037,7 @@ joiner" characters. Characters with the "mark" property always have the
modifier). Extending characters are allowed before the modifier.
</P>
<P>
-7. Do not break within emoji zwj sequences (zero-width jointer followed by
+7. Do not break within emoji zwj sequences (zero-width joiner followed by
"glue after ZWJ" or "base glue after ZWJ").
</P>
<P>
@@ -1731,7 +1731,7 @@ numbers underneath show in which buffer the captured content will be stored.
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
# 1 2 2 3 2 3 4
</pre>
-A back reference to a numbered subpattern uses the most recent value that is
+A backreference to a numbered subpattern uses the most recent value that is
set for that number by any subpattern. The following pattern matches "abcabc"
or "defdef":
<pre>
@@ -1771,7 +1771,7 @@ have different names, but PCRE2 does not.
In PCRE2, a subpattern can be named in one of three ways: (?&#60;name&#62;...) or
(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. References to capturing
parentheses from other parts of the pattern, such as
-<a href="#backreferences">back references,</a>
+<a href="#backreferences">backreferences,</a>
<a href="#recursion">recursion,</a>
and
<a href="#conditions">conditions,</a>
@@ -1811,7 +1811,7 @@ for the first (and in this example, the only) subpattern of that name that
matched. This saves searching to find which numbered subpattern it was.
</P>
<P>
-If you make a back reference to a non-unique named subpattern from elsewhere in
+If you make a backreference to a non-unique named subpattern from elsewhere in
the pattern, the subpatterns to which the name refers are checked in the order
in which they appear in the overall pattern. The first one that is set is used
for the reference. For example, this pattern matches both "foofoo" and
@@ -1859,7 +1859,7 @@ items:
the \R escape sequence
an escape such as \d or \pL that matches a single character
a character class
- a back reference
+ a backreference
a parenthesized subpattern (including most assertions)
a subroutine call to a subpattern (recursive or otherwise)
</pre>
@@ -1980,7 +1980,7 @@ alternatively, using ^ to indicate anchoring explicitly.
</P>
<P>
However, there are some cases where the optimization cannot be used. When .*
-is inside capturing parentheses that are the subject of a back reference
+is inside capturing parentheses that are the subject of a backreference
elsewhere in the pattern, a match at the start may fail where a later one
succeeds. Consider, for example:
<pre>
@@ -2121,30 +2121,30 @@ an atomic group, like this:
</pre>
sequences of non-digits cannot be broken, and failure happens quickly.
<a name="backreferences"></a></P>
-<br><a name="SEC19" href="#TOC1">BACK REFERENCES</a><br>
+<br><a name="SEC19" href="#TOC1">BACKREFERENCES</a><br>
<P>
Outside a character class, a backslash followed by a digit greater than 0 (and
-possibly further digits) is a back reference to a capturing subpattern earlier
+possibly further digits) is a backreference to a capturing subpattern earlier
(that is, to its left) in the pattern, provided there have been that many
previous capturing left parentheses.
</P>
<P>
However, if the decimal number following the backslash is less than 8, it is
-always taken as a back reference, and causes an error only if there are not
+always taken as a backreference, and causes an error only if there are not
that many capturing left parentheses in the entire pattern. In other words, the
parentheses that are referenced need not be to the left of the reference for
-numbers less than 8. A "forward back reference" of this type can make sense
+numbers less than 8. A "forward backreference" of this type can make sense
when a repetition is involved and the subpattern to the right has participated
in an earlier iteration.
</P>
<P>
-It is not possible to have a numerical "forward back reference" to a subpattern
+It is not possible to have a numerical "forward backreference" to a subpattern
whose number is 8 or more using this syntax because a sequence such as \50 is
interpreted as a character defined in octal. See the subsection entitled
"Non-printing characters"
<a href="#digitsafterbackslash">above</a>
for further details of the handling of digits following a backslash. There is
-no such problem when named parentheses are used. A back reference to any
+no such problem when named parentheses are used. A backreference to any
subpattern is possible using named parentheses (see below).
</P>
<P>
@@ -2175,7 +2175,7 @@ of forward reference can be useful it patterns that repeat. Perl does not
support the use of + in this way.
</P>
<P>
-A back reference matches whatever actually matched the capturing subpattern in
+A backreference matches whatever actually matched the capturing subpattern in
the current subject string, rather than anything matching the subpattern
itself (see
<a href="#subpatternsassubroutines">"Subpatterns as subroutines"</a>
@@ -2185,7 +2185,7 @@ below for a way of doing that). So the pattern
</pre>
matches "sense and sensibility" and "response and responsibility", but not
"sense and responsibility". If caseful matching is in force at the time of the
-back reference, the case of letters is relevant. For example,
+backreference, the case of letters is relevant. For example,
<pre>
((?i)rah)\s+\1
</pre>
@@ -2193,10 +2193,10 @@ matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original
capturing subpattern is matched caselessly.
</P>
<P>
-There are several different ways of writing back references to named
+There are several different ways of writing backreferences to named
subpatterns. The .NET syntax \k{name} and the Perl syntax \k&#60;name&#62; or
\k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified
-back reference syntax, in which \g can be used for both numeric and named
+backreference syntax, in which \g can be used for both numeric and named
references, is also supported. We could rewrite the above example in any of
the following ways:
<pre>
@@ -2209,30 +2209,30 @@ A subpattern that is referenced by name may appear in the pattern before or
after the reference.
</P>
<P>
-There may be more than one back reference to the same subpattern. If a
-subpattern has not actually been used in a particular match, any back
-references to it always fail by default. For example, the pattern
+There may be more than one backreference to the same subpattern. If a
+subpattern has not actually been used in a particular match, any backreferences
+to it always fail by default. For example, the pattern
<pre>
(a|(bc))\2
</pre>
always fails if it starts to match "a" rather than "bc". However, if the
-PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a back reference to an
+PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a backreference to an
unset value matches an empty string.
</P>
<P>
Because there may be many capturing parentheses in a pattern, all digits
-following a backslash are taken as part of a potential back reference number.
+following a backslash are taken as part of a potential backreference number.
If the pattern continues with a digit character, some delimiter must be used to
-terminate the back reference. If the PCRE2_EXTENDED option is set, this can be
+terminate the backreference. If the PCRE2_EXTENDED option is set, this can be
white space. Otherwise, the \g{ syntax or an empty comment (see
<a href="#comments">"Comments"</a>
below) can be used.
</P>
<br><b>
-Recursive back references
+Recursive backreferences
</b><br>
<P>
-A back reference that occurs inside the parentheses to which it refers fails
+A backreference that occurs inside the parentheses to which it refers fails
when the subpattern is first used, so, for example, (a\1) never matches.
However, such references can be useful inside repeated subpatterns. For
example, the pattern
@@ -2240,14 +2240,14 @@ example, the pattern
(a|b\1)+
</pre>
matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
-the subpattern, the back reference matches the character string corresponding
+the subpattern, the backreference matches the character string corresponding
to the previous iteration. In order for this to work, the pattern must be such
-that the first iteration does not need to match the back reference. This can be
+that the first iteration does not need to match the backreference. This can be
done using alternation, as in the example above, or by a quantifier with a
minimum of zero.
</P>
<P>
-Back references of this type cause the group that they reference to be treated
+backreferences of this type cause the group that they reference to be treated
as an
<a href="#atomicgroup">atomic group.</a>
Once the whole group has been matched, a subsequent matching failure cannot
@@ -2397,10 +2397,10 @@ that is, a "subroutine" call into a group that is already active,
is not supported.
</P>
<P>
-Perl does not support back references in lookbehinds. PCRE2 does support them,
+Perl does not support backreferences in lookbehinds. PCRE2 does support them,
but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option
must not be set, there must be no use of (?| in the pattern (it creates
-duplicate subpattern numbers), and if the back reference is by name, the name
+duplicate subpattern numbers), and if the backreference is by name, the name
must be unique. Of course, the referenced subpattern must itself be of fixed
length. The following pattern matches words containing at least two characters
that begin and end with the same character:
@@ -2882,7 +2882,7 @@ in PCRE2 these values can be referenced. Consider this pattern:
^(.)(\1|a(?2))
</pre>
This pattern matches "bab". The first capturing parentheses match "b", then in
-the second group, when the back reference \1 fails to match "b", the second
+the second group, when the backreference \1 fails to match "b", the second
alternative matches "a" and then recurses. In the recursion, \1 does now match
"b" and so the whole match succeeds. This match used to fail in Perl, but in
later versions (I tried 5.024) it now works.
@@ -2943,7 +2943,7 @@ plus or a minus sign it is taken as a relative reference. For example:
(abc)(?i:\g&#60;-1&#62;)
</pre>
Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
-synonymous. The former is a back reference; the latter is a subroutine call.
+synonymous. The former is a backreference; the latter is a subroutine call.
</P>
<br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
<P>