summaryrefslogtreecommitdiff
path: root/pod/perlre.pod
diff options
context:
space:
mode:
authorWolfgang Laun <Wolfgang.Laun@alcatel.at>2007-02-04 17:26:14 +0100
committerRafael Garcia-Suarez <rgarciasuarez@gmail.com>2007-02-05 11:29:08 +0000
commit0d017f4d564175907ce6698d1a162341a850ea9d (patch)
treee478dfe1bce939c5ac517de2490f5c2cdad7a98b /pod/perlre.pod
parent59497074c2431b6cdcfd89466093c079af4a7bca (diff)
downloadperl-0d017f4d564175907ce6698d1a162341a850ea9d.tar.gz
minor improvements for perlre.pod
From: "Wolfgang Laun" <wolfgang.laun@gmail.com> Message-ID: <17de7ee80702040726v23f54266g3c352d353a30c430@mail.gmail.com> p4raw-id: //depot/perl@30126
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r--pod/perlre.pod191
1 files changed, 99 insertions, 92 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index d886d094a7..d913c8074a 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -16,6 +16,9 @@ operations, plus various examples of the same, see discussions of
C<m//>, C<s///>, C<qr//> and C<??> in L<perlop/"Regexp Quote-Like
Operators">.
+
+=head2 Modifiers
+
Matching operations can have various modifiers. Modifiers
that relate to the interpretation of the regular expression inside
are listed below. Modifiers that alter the way a regular expression
@@ -84,7 +87,7 @@ X</x>
=head3 Metacharacters
-The patterns used in Perl pattern matching derive from supplied in
+The patterns used in Perl pattern matching evolved from the ones supplied in
the Version 8 regex routines. (The routines are derived
(distantly) from Henry Spencer's freely redistributable reimplementation
of the V8 routines.) See L<Version 8 Regular Expressions> for
@@ -149,24 +152,24 @@ many times as possible (given a particular starting location) while still
allowing the rest of the pattern to match. If you want it to match the
minimum number of times possible, follow the quantifier with a "?". Note
that the meanings don't change, just the "greediness":
-X<metacharacter> X<greedy> X<greedyness>
+X<metacharacter> X<greedy> X<greediness>
X<?> X<*?> X<+?> X<??> X<{n}?> X<{n,}?> X<{n,m}?>
- *? Match 0 or more times
- +? Match 1 or more times
- ?? Match 0 or 1 time
- {n}? Match exactly n times
- {n,}? Match at least n times
- {n,m}? Match at least n but not more than m times
+ *? Match 0 or more times, not greedily
+ +? Match 1 or more times, not greedily
+ ?? Match 0 or 1 time, not greedily
+ {n}? Match exactly n times, not greedily
+ {n,}? Match at least n times, not greedily
+ {n,m}? Match at least n but not more than m times, not greedily
By default, when a quantified subpattern does not allow the rest of the
overall pattern to match, Perl will backtrack. However, this behaviour is
-sometimes undesirable. Thus Perl provides the "possesive" quantifier form
+sometimes undesirable. Thus Perl provides the "possessive" quantifier form
as well.
- *+ Match 0 or more times and give nothing back
- ++ Match 1 or more times and give nothing back
- ?+ Match 0 or 1 time and give nothing back
+ *+ Match 0 or more times and give nothing back
+ ++ Match 1 or more times and give nothing back
+ ?+ Match 0 or 1 time and give nothing back
{n}+ Match exactly n times and give nothing back (redundant)
{n,}+ Match at least n times and give nothing back
{n,m}+ Match at least n but not more than m times and give nothing back
@@ -183,7 +186,7 @@ string" problem can be most efficiently performed when written as:
/"(?:[^"\\]++|\\.)*+"/
-as we know that if the final quote does not match, bactracking will not
+as we know that if the final quote does not match, backtracking will not
help. See the independent subexpression C<< (?>...) >> for more details;
possessive quantifiers are just syntactic sugar for that construct. For
instance the above example could also be written as follows:
@@ -194,7 +197,7 @@ instance the above example could also be written as follows:
Because patterns are processed as double quoted strings, the following
also work:
-X<\t> X<\n> X<\r> X<\f> X<\a> X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
+X<\t> X<\n> X<\r> X<\f> X<\e> X<\a> X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
X<\0> X<\c> X<\N> X<\x>
\t tab (HT, TAB)
@@ -203,10 +206,10 @@ X<\0> X<\c> X<\N> X<\x>
\f form feed (FF)
\a alarm (bell) (BEL)
\e escape (think troff) (ESC)
- \033 octal char (think of a PDP-11)
- \x1B hex char
- \x{263a} wide hex char (Unicode SMILEY)
- \c[ control char
+ \033 octal char (example: ESC)
+ \x1B hex char (example: ESC)
+ \x{263a} wide hex char (example: Unicode SMILEY)
+ \cK control char (example: VT)
\N{name} named char
\l lowercase next char (think vi)
\u uppercase next char (think vi)
@@ -227,9 +230,9 @@ You'll need to write something like C<m/\Quser\E\@\Qhost/>.
=head3 Character classes
In addition, Perl defines the following:
-X<metacharacter>
X<\w> X<\W> X<\s> X<\S> X<\d> X<\D> X<\X> X<\p> X<\P> X<\C>
-X<word> X<whitespace>
+X<\g> X<\k> X<\N> X<\K> X<\v> X<\V>
+X<word> X<whitespace> X<character class> X<backreference>
\w Match a "word" character (alphanumeric plus "_")
\W Match a non-"word" character
@@ -265,12 +268,13 @@ to match a string of Perl-identifier characters (which isn't the same
as matching an English word). If C<use locale> is in effect, the list
of alphabetic characters generated by C<\w> is taken from the current
locale. See L<perllocale>. You may use C<\w>, C<\W>, C<\s>, C<\S>,
-C<\d>, and C<\D> within character classes, but if you try to use them
-as endpoints of a range, that's not a range, the "-" is understood
-literally. If Unicode is in effect, C<\s> matches also "\x{85}",
-"\x{2028}, and "\x{2029}", see L<perlunicode> for more details about
-C<\pP>, C<\PP>, and C<\X>, and L<perluniintro> about Unicode in general.
-You can define your own C<\p> and C<\P> properties, see L<perlunicode>.
+C<\d>, and C<\D> within character classes, but they aren't usable
+as either end of a range. If any of them precedes or follows a "-",
+the "-" is understood literally. If Unicode is in effect, C<\s> matches
+also "\x{85}", "\x{2028}, and "\x{2029}". See L<perlunicode> for more
+details about C<\pP>, C<\PP>, C<\X> and the possibility of defining
+your own C<\p> and C<\P> properties, and L<perluniintro> about Unicode
+in general.
X<\w> X<\W> X<word>
The POSIX character class syntax
@@ -278,7 +282,7 @@ X<character class>
[:class:]
-is also available. Note that the C<[> and C<]> braces are I<literal>;
+is also available. Note that the C<[> and C<]> brackets are I<literal>;
they must always be used within a character class expression.
# this is correct:
@@ -317,7 +321,7 @@ A GNU extension equivalent to C<[ \t]>, "all horizontal whitespace".
=item [2]
Not exactly equivalent to C<\s> since the C<[[:space:]]> includes
-also the (very rare) "vertical tabulator", "\ck", chr(11).
+also the (very rare) "vertical tabulator", "\cK" or chr(11) in ASCII.
=item [3]
@@ -331,7 +335,7 @@ whole character class. For example:
[01[:alpha:]%]
-matches zero, one, any alphabetic character, and the percentage sign.
+matches zero, one, any alphabetic character, and the percent sign.
The following equivalences to Unicode \p{} constructs and equivalent
backslash character classes (if available), will hold:
@@ -342,7 +346,7 @@ X<character class> X<\p> X<\p{}>
alpha IsAlpha
alnum IsAlnum
ascii IsASCII
- blank IsSpace
+ blank
cntrl IsCntrl
digit IsDigit \d
graph IsGraph
@@ -371,7 +375,7 @@ X<cntrl>
Any control character. Usually characters that don't produce output as
such but instead control the terminal somehow: for example newline and
backspace are control characters. All characters with ord() less than
-32 are most often classified as control characters (assuming ASCII,
+32 are usually classified as control characters (assuming ASCII,
the ISO Latin character sets, and Unicode), as is the character with
the ord() value of 127 (C<DEL>).
@@ -422,7 +426,7 @@ X<regular expression, zero-width assertion>
X<\b> X<\B> X<\A> X<\Z> X<\z> X<\G>
\b Match a word boundary
- \B Match a non-(word boundary)
+ \B Match except at a word boundary
\A Match only at beginning of string
\Z Match only at end of string, or before newline at the end
\z Match only at end of string
@@ -469,9 +473,10 @@ loop. Take care when using patterns that include C<\G> in an alternation.
=head3 Capture buffers
-The bracketing construct C<( ... )> creates capture buffers. To
-refer to the digit'th buffer use \<digit> within the
-match. Outside the match use "$" instead of "\". (The
+The bracketing construct C<( ... )> creates capture buffers. To refer
+to the current contents of a buffer later on, within the same pattern,
+use \1 for the first, \2 for the second, and so on.
+Outside the match use "$" instead of "\". (The
\<digit> notation works in certain circumstances outside
the match. See the warning below about \1 vs $1 for details.)
Referring back to another part of the match is called a
@@ -492,7 +497,7 @@ backreferences.
X<\g{1}> X<\g{-1}> X<\g{name}> X<relative backreference> X<named backreference>
In order to provide a safer and easier way to construct patterns using
-backrefs, in Perl 5.10 the C<\g{N}> notation is provided. The curly
+backreferences, Perl 5.10 provides the C<\g{N}> notation. The curly
brackets are optional, however omitting them is less safe as the meaning
of the pattern can be changed by text (such as digits) following it.
When N is a positive integer the C<\g{N}> notation is exactly equivalent
@@ -517,17 +522,16 @@ and would match the same as C</(Y) ( (X) \3 \1 )/x>.
Additionally, as of Perl 5.10 you may use named capture buffers and named
backreferences. The notation is C<< (?<name>...) >> to declare and C<< \k<name> >>
-to reference. You may also use single quotes instead of angle brackets to quote the
-name; and you may use the bracketed C<< \g{name} >> back reference syntax.
-The only difference between named capture buffers and unnamed ones is
-that multiple buffers may have the same name and that the contents of
-named capture buffers are available via the C<%+> hash. When multiple
-groups share the same name C<$+{name}> and C<< \k<name> >> refer to the
-leftmost defined group, thus it's possible to do things with named capture
-buffers that would otherwise require C<(??{})> code to accomplish. Named
-capture buffers are numbered just as normal capture buffers are and may be
-referenced via the magic numeric variables or via numeric backreferences
-as well as by name.
+to reference. You may also use apostrophes instead of angle brackets to delimit the
+name; and you may use the bracketed C<< \g{name} >> backreference syntax.
+It's possible to refer to a named capture buffer by absolute and relative number as well.
+Outside the pattern, a named capture buffer is available via the C<%+> hash.
+When different buffers within the same pattern have the same name, C<$+{name}>
+and C<< \k<name> >> refer to the leftmost defined group. (Thus it's possible
+to do things with named capture buffers that would otherwise require C<(??{})>
+code to accomplish.)
+X<named capture buffer> X<regular expression, named capture buffer>
+X<%+> X<$+{name}> X<\k{name}>
Examples:
@@ -539,7 +543,7 @@ Examples:
/(?<char>.)\k<char>/ # ... a different way
and print "'$+{char}' is the first doubled character\n";
- /(?<char>.)\1/ # ... mix and match
+ /(?'char'.)\1/ # ... mix and match
and print "'$1' is the first doubled character\n";
if (/Time: (..):(..):(..)/) { # parse out values
@@ -567,7 +571,7 @@ X<$+> X<$^N> X<$&> X<$`> X<$'>
X<$1> X<$2> X<$3> X<$4> X<$5> X<$6> X<$7> X<$8> X<$9>
-B<NOTE>: failed matches in Perl do not reset the match variables,
+B<NOTE>: Failed matches in Perl do not reset the match variables,
which makes it easier to write code that tests for a series of more
specific cases and remembers the best match.
@@ -655,10 +659,10 @@ One or more embedded pattern-match modifiers, to be turned on (or
turned off, if preceded by C<->) for the remainder of the pattern or
the remainder of the enclosing pattern group (if any). This is
particularly useful for dynamic patterns, such as those read in from a
-configuration file, read in as an argument, are specified in a table
-somewhere, etc. Consider the case that some of which want to be case
-sensitive and some do not. The case insensitive ones need to include
-merely C<(?i)> at the front of the pattern. For example:
+configuration file, taken from an argument, or specified in a table
+somewhere. Consider the case where some patterns want to be case
+sensitive and some do not: The case insensitive ones merely need to
+include C<(?i)> at the front of the pattern. For example:
$pattern = "foobar";
if ( /$pattern/i ) { }
@@ -672,9 +676,9 @@ These modifiers are restored at the end of the enclosing group. For example,
( (?i) blah ) \s+ \1
-will match a repeated (I<including the case>!) word C<blah> in any
-case, assuming C<x> modifier, and no C<i> modifier outside this
-group.
+will match C<blah> in any case, some spaces, and an exact (I<including the case>!)
+repetition of the previous word, assuming the C</x> modifier, and no C</i>
+modifier outside this group.
Note that the C<k> modifier is special in that it can only be enabled,
not disabled, and that its presence anywhere in a pattern has a global
@@ -783,17 +787,17 @@ only for fixed-width look-behind.
X<< (?<NAME>) >> X<(?'NAME')> X<named capture> X<capture>
A named capture buffer. Identical in every respect to normal capturing
-parens C<()> but for the additional fact that C<%+> may be used after
+parentheses C<()> but for the additional fact that C<%+> may be used after
a succesful match to refer to a named buffer. See C<perlvar> for more
details on the C<%+> hash.
If multiple distinct capture buffers have the same name then the
$+{NAME} will refer to the leftmost defined buffer in the match.
-The forms C<(?'NAME'pattern)> and C<(?<NAME>pattern)> are equivalent.
+The forms C<(?'NAME'pattern)> and C<< (?<NAME>pattern) >> are equivalent.
B<NOTE:> While the notation of this construct is the same as the similar
-function in .NET regexes, the behavior is not, in Perl the buffers are
+function in .NET regexes, the behavior is not. In Perl the buffers are
numbered sequentially regardless of being named or not. Thus in the
pattern
@@ -808,8 +812,8 @@ its Unicode extension (see L<utf8>),
though it isn't extended by the locale (see L<perllocale>).
B<NOTE:> In order to make things easier for programmers with experience
-with the Python or PCRE regex engines the pattern C<< (?PE<lt>NAMEE<gt>pattern) >>
-maybe be used instead of C<< (?<NAME>pattern) >>; however this form does not
+with the Python or PCRE regex engines, the pattern C<< (?PE<lt>NAMEE<gt>pattern) >>
+may be used instead of C<< (?<NAME>pattern) >>; however this form does not
support the use of single quotes as a delimiter for the name. This is
only available in Perl 5.10 or later.
@@ -822,14 +826,14 @@ the group is designated by name and not number. If multiple groups
have the same name then it refers to the leftmost defined group in
the current match.
-It is an error to refer to a name not defined by a C<(?<NAME>)>
+It is an error to refer to a name not defined by a C<< (?<NAME>) >>
earlier in the pattern.
Both forms are equivalent.
B<NOTE:> In order to make things easier for programmers with experience
-with the Python or PCRE regex engines the pattern C<< (?P=NAME) >>
-maybe be used instead of C<< \k<NAME> >> in Perl 5.10 or later.
+with the Python or PCRE regex engines, the pattern C<< (?P=NAME) >>
+may be used instead of C<< \k<NAME> >> in Perl 5.10 or later.
=item C<(?{ code })>
X<(?{})> X<regex, code in> X<regexp, code in> X<regular expression, code in>
@@ -873,7 +877,7 @@ C<local>ization are undone, so that
# location.
>x;
-will set C<$res = 4>. Note that after the match, $cnt returns to the globally
+will set C<$res = 4>. Note that after the match, C<$cnt> returns to the globally
introduced value, because the scopes that restrict C<local> operators
are unwound.
@@ -900,7 +904,7 @@ perilous C<use re 'eval'> pragma has been used (see L<re>), or the
variables contain results of C<qr//> operator (see
L<perlop/"qr/STRING/imosx">).
-This restriction is because of the wide-spread and remarkably convenient
+This restriction is due to the wide-spread and remarkably convenient
custom of using run-time determined strings as patterns. For example:
$re = <>;
@@ -915,7 +919,7 @@ so you should only do so if you are also using taint checking.
Better yet, use the carefully constrained evaluation within a Safe
compartment. See L<perlsec> for details about both these mechanisms.
-Because perl's regex engine is not currently re-entrant, interpolated
+Because Perl's regex engine is currently not re-entrant, interpolated
code may not invoke the regex engine either directly with C<m//> or C<s///>),
or indirectly with functions such as C<split>.
@@ -1036,7 +1040,7 @@ for later use:
}
B<Note> that this pattern does not behave the same way as the equivalent
-PCRE or Python construct of the same form. In perl you can backtrack into
+PCRE or Python construct of the same form. In Perl you can backtrack into
a recursed group, in PCRE and Python the recursed into group is treated
as atomic. Also, modifiers are resolved at compile time, so constructs
like (?i:(?1)) or (?:(?i)(?1)) do not affect how the sub-pattern will
@@ -1045,8 +1049,8 @@ be processed.
=item C<(?&NAME)>
X<(?&NAME)>
-Recurse to a named subpattern. Identical to (?PARNO) except that the
-parenthesis to recurse to is determined by name. If multiple parens have
+Recurse to a named subpattern. Identical to C<(?PARNO)> except that the
+parenthesis to recurse to is determined by name. If multiple parentheses have
the same name, then it recurses to the leftmost.
It is an error to refer to a name that is not declared somewhere in the
@@ -1054,7 +1058,7 @@ pattern.
B<NOTE:> In order to make things easier for programmers with experience
with the Python or PCRE regex engines the pattern C<< (?P>NAME) >>
-maybe be used instead of C<< (?&NAME) >> as of Perl 5.10.
+may be used instead of C<< (?&NAME) >> in Perl 5.10 or later.
=item C<(?(condition)yes-pattern|no-pattern)>
X<(?()>
@@ -1147,7 +1151,7 @@ An example of how this might be used is as follows:
)/x
Note that capture buffers matched inside of recursion are not accessible
-after the recursion returns, so the extra layer of capturing buffers are
+after the recursion returns, so the extra layer of capturing buffers is
necessary. Thus C<$+{NAME_PAT}> would not be defined even though
C<$+{NAME}> would be.
@@ -1260,7 +1264,7 @@ to inside of one of these constructs. The following equivalences apply:
=head2 Special Backtracking Control Verbs
B<WARNING:> These patterns are experimental and subject to change or
-removal in a future version of perl. Their usage in production code should
+removal in a future version of Perl. Their usage in production code should
be noted to avoid problems during upgrades.
These special patterns are generally of the form C<(*VERB:ARG)>. Unless
@@ -1308,7 +1312,7 @@ continues in B, which may also backtrack as necessary; however, should B
not match, then no further backtracking will take place, and the pattern
will fail outright at the current starting position.
-As a shortcut, X<\v> is exactly equivalent to C<(*PRUNE)>.
+As a shortcut, C<\v> is exactly equivalent to C<(*PRUNE)>.
The following example counts all the possible matching strings in a
pattern (without actually matching any of them).
@@ -1361,7 +1365,7 @@ of this pattern. This effectively means that the regex engine "skips" forward
to this position on failure and tries to match again, (assuming that
there is sufficient room to match).
-As a shortcut X<\V> is exactly equivalent to C<(*SKIP)>.
+As a shortcut C<\V> is exactly equivalent to C<(*SKIP)>.
The name of the C<(*SKIP:NAME)> pattern has special significance. If a
C<(*MARK:NAME)> was encountered while matching, then it is that position
@@ -1498,7 +1502,7 @@ for production code.
This pattern matches nothing and causes the end of successful matching at
the point at which the C<(*ACCEPT)> pattern was encountered, regardless of
whether there is actually more to match in the string. When inside of a
-nested pattern, such as recursion or a dynamically generated subbpattern
+nested pattern, such as recursion, or in a subpattern dynamically generated
via C<(??{})>, only the innermost pattern is ended immediately.
If the C<(*ACCEPT)> is inside of capturing buffers then the buffers are
@@ -1508,7 +1512,7 @@ For instance:
'AB' =~ /(A (A|B(*ACCEPT)|C) D)(E)/x;
will match, and C<$1> will be C<AB> and C<$2> will be C<B>, C<$3> will not
-be set. If another branch in the inner parens were matched, such as in the
+be set. If another branch in the inner parentheses were matched, such as in the
string 'ACDE', then the C<D> and C<E> would have to be matched as well.
=back
@@ -1521,11 +1525,11 @@ X<backtrack> X<backtracking>
NOTE: This section presents an abstract approximation of regular
expression behavior. For a more rigorous (and complicated) view of
the rules involved in selecting a match among possible alternatives,
-see L<Combining pieces together>.
+see L<Combining RE Pieces>.
A fundamental feature of regular expression matching involves the
notion called I<backtracking>, which is currently used (when needed)
-by all regular expression quantifiers, namely C<*>, C<*?>, C<+>,
+by all regular non-possessive expression quantifiers, namely C<*>, C<*?>, C<+>,
C<+?>, C<{n,m}>, and C<{n,m}?>. Backtracking is often optimized
internally, but the general principle outlined here is valid.
@@ -1573,7 +1577,7 @@ and the first "bar" thereafter.
if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
got <d is under the >
-Here's another example: let's say you'd like to match a number at the end
+Here's another example. Let's say you'd like to match a number at the end
of a string, and you also want to keep the preceding part of the match.
So you write this:
@@ -1698,9 +1702,9 @@ using the vertical bar. C</ab/> means match "a" AND (then) match "b",
although the attempted matches are made at different positions because "a"
is not a zero-width assertion, but a one-width assertion.
-B<WARNING>: particularly complicated regular expressions can take
+B<WARNING>: Particularly complicated regular expressions can take
exponential time to solve because of the immense number of possible
-ways they can use backtracking to try match. For example, without
+ways they can use backtracking to try for a match. For example, without
internal optimizations done by the regular expression engine, this will
take a painfully long time to run:
@@ -1732,9 +1736,12 @@ Any single character matches itself, unless it is a I<metacharacter>
with a special meaning described here or above. You can cause
characters that normally function as metacharacters to be interpreted
literally by prefixing them with a "\" (e.g., "\." matches a ".", not any
-character; "\\" matches a "\"). A series of characters matches that
-series of characters in the target string, so the pattern C<blurfl>
-would match "blurfl" in the target string.
+character; "\\" matches a "\"). This escape mechanism is also required
+for the character used as the pattern delimiter.
+
+A series of characters matches that series of characters in the target
+string, so the pattern C<blurfl> would match "blurfl" in the target
+string.
You can specify a character class, by enclosing a list of characters
in C<[]>, which will match any character from the list. If the
@@ -1755,7 +1762,7 @@ a range, the "-" is understood literally.
Note also that the whole range idea is rather unportable between
character sets--and even within character sets they may cause results
you probably didn't expect. A sound principle is to use only ranges
-that begin from and end at either alphabets of equal case ([a-e],
+that begin from and end at either alphabetics of equal case ([a-e],
[A-E]), or digits ([0-9]). Anything else is unsafe. If in doubt,
spell out the character sets in full.
@@ -1800,7 +1807,7 @@ match "0x1234 0x4321", but not "0x1234 01234", because subpattern
1 matched "0x", even though the rule C<0|0x> could potentially match
the leading 0 in the second number.
-=head2 Warning on \1 vs $1
+=head2 Warning on \1 Instead of $1
Some people get too used to writing things like:
@@ -1825,7 +1832,7 @@ C<${1}000>. The operation of interpolation should not be confused
with the operation of matching a backreference. Certainly they mean two
different things on the I<left> side of the C<s///>.
-=head2 Repeated patterns matching zero-length substring
+=head2 Repeated Patterns Matching a Zero-length Substring
B<WARNING>: Difficult material (and prose) ahead. This section needs a rewrite.
@@ -1838,7 +1845,7 @@ loops using regular expressions, with something as innocuous as:
'foo' =~ m{ ( o? )* }x;
-The C<o?> can match at the beginning of C<'foo'>, and since the position
+The C<o?> matches at the beginning of C<'foo'>, and since the position
in the string is not moved by the match, C<o?> would match again and again
because of the C<*> modifier. Another common way to create a similar cycle
is with the looping modifier C<//g>:
@@ -1901,7 +1908,7 @@ the matched string, and is reset by each assignment to pos().
Zero-length matches at the end of the previous match are ignored
during C<split>.
-=head2 Combining pieces together
+=head2 Combining RE Pieces
Each of the elementary pieces of regular expressions which were described
before (such as C<ab> or C<\Z>) could match at most one substring
@@ -2002,13 +2009,13 @@ One more rule is needed to understand how a match is determined for the
whole regular expression: a match at an earlier position is always better
than a match at a later position.
-=head2 Creating custom RE engines
+=head2 Creating Custom RE Engines
Overloaded constants (see L<overload>) provide a simple way to extend
the functionality of the RE engine.
Suppose that we want to enable a new RE escape-sequence C<\Y|> which
-matches at boundary between whitespace characters and non-whitespace
+matches at a boundary between whitespace characters and non-whitespace
characters. Note that C<(?=\S)(?<!\S)|(?!\S)(?<=\S)> matches exactly
at these positions, so we want to have each C<\Y|> in the place of the
more complicated version. We can create a module C<customre> to do