summaryrefslogtreecommitdiff
path: root/pod/perlre.pod
diff options
context:
space:
mode:
authorIvan Tubert-Brohman <itub@cpan.org>2005-10-12 15:20:18 -0400
committerRafael Garcia-Suarez <rgarciasuarez@gmail.com>2005-10-13 11:20:23 +0000
commitd74e8afc9309529cf5c6c4390fc311850865d506 (patch)
treee2e6f5cb76495c762f9de01020f6d7eae39011dd /pod/perlre.pod
parentfab416db1cda0a357b1699b6efa75dd50332ea26 (diff)
downloadperl-d74e8afc9309529cf5c6c4390fc311850865d506.tar.gz
POD index entries with X<>
Message-ID: <434D9A32.4050305@cpan.org> p4raw-id: //depot/perl@25748
Diffstat (limited to 'pod/perlre.pod')
-rw-r--r--pod/perlre.pod61
1 files changed, 61 insertions, 0 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 39110ffc95..23a7b0fa71 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -1,4 +1,5 @@
=head1 NAME
+X<regular expression> X<regex> X<regexp>
perlre - Perl regular expressions
@@ -24,6 +25,8 @@ L<perlop/"Gory details of parsing quoted constructs">.
=over 4
=item i
+X</i> X<regex, case-insensitive> X<regexp, case-insensitive>
+X<regular expression, case-insensitive>
Do case-insensitive pattern matching.
@@ -31,12 +34,15 @@ If C<use locale> is in effect, the case map is taken from the current
locale. See L<perllocale>.
=item m
+X</m> X<regex, multiline> X<regexp, multiline> X<regular expression, multiline>
Treat string as multiple lines. That is, change "^" and "$" from matching
the start or end of the string to matching the start or end of any
line anywhere within the string.
=item s
+X</s> X<regex, single-line> X<regexp, single-line>
+X<regular expression, single-line>
Treat string as single line. That is, change "." to match any character
whatsoever, even a newline, which normally it would not match.
@@ -46,6 +52,7 @@ while still allowing "^" and "$" to match, respectively, just after
and just before newlines within the string.
=item x
+X</x>
Extend your pattern's legibility by permitting whitespace and comments.
@@ -70,6 +77,7 @@ more readable. Note that you have to be careful not to include the
pattern delimiter in the comment--perl has no way of knowing you did
not intend to close the pattern early. See the C-comment deletion code
in L<perlop>.
+X</x>
=head2 Regular Expressions
@@ -81,6 +89,9 @@ details.
In particular the following metacharacters have their standard I<egrep>-ish
meanings:
+X<metacharacter>
+X<\> X<^> X<.> X<$> X<|> X<(> X<()> X<[> X<[]>
+
\ Quote the next metacharacter
^ Match the beginning of the line
@@ -100,12 +111,15 @@ newline within the string, and "$" will match before any newline. At the
cost of a little more overhead, you can do this by using the /m modifier
on the pattern match operator. (Older programs did this by setting C<$*>,
but this practice has been removed in perl 5.9.)
+X<^> X<$> X</m>
To simplify multi-line substitutions, the "." character never matches a
newline unless you use the C</s> modifier, which in effect tells Perl to pretend
the string is a single line--even if it isn't.
+X<.> X</s>
The following standard quantifiers are recognized:
+X<metacharacter> X<quantifier> X<*> X<+> X<?> X<{n}> X<{n,}> X<{n,m}>
* Match 0 or more times
+ Match 1 or more times
@@ -129,6 +143,8 @@ many times as possible (given a particular starting location) while still
allowing the rest of the pattern to match. If you want it to match the
minimum number of times possible, follow the quantifier with a "?". Note
that the meanings don't change, just the "greediness":
+X<metacharacter> X<greedy> X<greedyness>
+X<?> X<*?> X<+?> X<??> X<{n}?> X<{n,}?> X<{n,m}?>
*? Match 0 or more times
+? Match 1 or more times
@@ -139,6 +155,8 @@ that the meanings don't change, just the "greediness":
Because patterns are processed as double quoted strings, the following
also work:
+X<\t> X<\n> X<\r> X<\f> X<\a> X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
+X<\0> X<\c> X<\N> X<\x>
\t tab (HT, TAB)
\n newline (LF, NL)
@@ -168,6 +186,9 @@ while escaping will cause the literal string C<\$> to be matched.
You'll need to write something like C<m/\Quser\E\@\Qhost/>.
In addition, Perl defines the following:
+X<metacharacter>
+X<\w> X<\W> X<\s> X<\S> X<\d> X<\D> X<\X> X<\p> X<\P> X<\C>
+X<word> X<whitespace>
\w Match a "word" character (alphanumeric plus "_")
\W Match a non-"word" character
@@ -196,13 +217,18 @@ literally. If Unicode is in effect, C<\s> matches also "\x{85}",
"\x{2028}, and "\x{2029}", see L<perlunicode> for more details about
C<\pP>, C<\PP>, and C<\X>, and L<perluniintro> about Unicode in general.
You can define your own C<\p> and C<\P> properties, see L<perlunicode>.
+X<\w> X<\W> X<word>
The POSIX character class syntax
+X<character class>
[:class:]
is also available. The available classes and their backslash
equivalents (if available) are as follows:
+X<character class>
+X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph>
+X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit>
alpha
alnum
@@ -246,6 +272,7 @@ matches zero, one, any alphabetic character, and the percentage sign.
The following equivalences to Unicode \p{} constructs and equivalent
backslash character classes (if available), will hold:
+X<character class> X<\p> X<\p{}>
[:...:] \p{...} backslash
@@ -276,6 +303,7 @@ The assumedly non-obviously named classes are:
=over 4
=item cntrl
+X<cntrl>
Any control character. Usually characters that don't produce output as
such but instead control the terminal somehow: for example newline and
@@ -285,18 +313,22 @@ the ISO Latin character sets, and Unicode), as is the character with
the ord() value of 127 (C<DEL>).
=item graph
+X<graph>
Any alphanumeric or punctuation (special) character.
=item print
+X<print>
Any alphanumeric or punctuation (special) character or the space character.
=item punct
+X<punct>
Any punctuation (special) character.
=item xdigit
+X<xdigit>
Any hexadecimal digit. Though this may feel silly ([0-9A-Fa-f] would
work just fine) it is included for completeness.
@@ -305,6 +337,7 @@ work just fine) it is included for completeness.
You can negate the [::] character classes by prefixing the class name
with a '^'. This is a Perl extension. For example:
+X<character class, negation>
POSIX traditional Unicode
@@ -318,6 +351,10 @@ only supported within a character class. The POSIX character classes
use them will cause an error.
Perl defines the following zero-width assertions:
+X<zero-width assertion> X<assertion> X<regex, zero-width assertion>
+X<regexp, zero-width assertion>
+X<regular expression, zero-width assertion>
+X<\b> X<\B> X<\A> X<\Z> X<\z> X<\G>
\b Match a word boundary
\B Match a non-(word boundary)
@@ -338,6 +375,7 @@ won't match multiple times when the C</m> modifier is used, while
"^" and "$" will match at every internal line boundary. To match
the actual end of the string and not ignore an optional trailing
newline, use C<\z>.
+X<\b> X<\A> X<\Z> X<\z> X</m>
The C<\G> assertion can be used to chain global matches (using
C<m//g>), as described in L<perlop/"Regexp Quote-Like Operators">.
@@ -350,6 +388,7 @@ supported when anchored to the start of the pattern; while it
is permitted to use it elsewhere, as in C</(?<=\G..)./g>, some
such uses (C</.\G/g>, for example) currently cause problems, and
it is recommended that you avoid such usage for now.
+X<\G>
The bracketing construct C<( ... )> creates capture buffers. To
refer to the digit'th buffer use \<digit> within the
@@ -358,6 +397,8 @@ match. Outside the match use "$" instead of "\". (The
the match. See the warning below about \1 vs $1 for details.)
Referring back to another part of the match is called a
I<backreference>.
+X<regex, capture buffer> X<regexp, capture buffer>
+X<regular expression, capture buffer> X<backreference>
There is no limit to the number of captured substrings that you may
use. However Perl also uses \10, \11, etc. as aliases for \010,
@@ -393,11 +434,15 @@ after the matched string. And C<$^N> contains whatever was matched by
the most-recently closed group (submatch). C<$^N> can be used in
extended patterns (see below), for example to assign a submatch to a
variable.
+X<$+> X<$^N> X<$&> X<$`> X<$'>
The numbered match variables ($1, $2, $3, etc.) and the related punctuation
set (C<$+>, C<$&>, C<$`>, C<$'>, and C<$^N>) are all dynamically scoped
until the end of the enclosing block or until the next successful
match, whichever comes first. (See L<perlsyn/"Compound Statements">.)
+X<$+> X<$^N> X<$&> X<$`> X<$'>
+X<$1> X<$2> X<$3> X<$4> X<$5> X<$6> X<$7> X<$8> X<$9>
+
B<NOTE>: failed matches in Perl do not reset the match variables,
which makes it easier to write code that tests for a series of more
@@ -416,6 +461,7 @@ if you can, but if you can't (and some algorithms really appreciate
them), once you've used them once, use them at will, because you've
already paid the price. As of 5.005, C<$&> is not so costly as the
other two.
+X<$&> X<$`> X<$'>
Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
C<\w>, C<\n>. Unlike some other regular expression languages, there
@@ -463,6 +509,7 @@ expressions, and 2) whenever you see one, you should stop and
=over 10
=item C<(?#text)>
+X<(?#)>
A comment. The text is ignored. If the C</x> modifier enables
whitespace formatting, a simple C<#> will suffice. Note that Perl closes
@@ -470,6 +517,7 @@ the comment as soon as it sees a C<)>, so there is no way to put a literal
C<)> in the comment.
=item C<(?imsx-imsx)>
+X<(?)>
One or more embedded pattern-match modifiers, to be turned on (or
turned off, if preceded by C<->) for the remainder of the pattern or
@@ -497,6 +545,7 @@ case, assuming C<x> modifier, and no C<i> modifier outside this
group.
=item C<(?:pattern)>
+X<(?:)>
=item C<(?imsx-imsx:pattern)>
@@ -522,11 +571,13 @@ is equivalent to the more verbose
/(?:(?s-i)more.*than).*million/i
=item C<(?=pattern)>
+X<(?=)> X<look-ahead, positive> X<lookahead, positive>
A zero-width positive look-ahead assertion. For example, C</\w+(?=\t)/>
matches a word followed by a tab, without including the tab in C<$&>.
=item C<(?!pattern)>
+X<(?!)> X<look-ahead, negative> X<lookahead, negative>
A zero-width negative look-ahead assertion. For example C</foo(?!bar)/>
matches any occurrence of "foo" that isn't followed by "bar". Note
@@ -546,18 +597,21 @@ Sometimes it's still easier just to say:
For look-behind see below.
=item C<(?<=pattern)>
+X<(?<=)> X<look-behind, positive> X<lookbehind, positive>
A zero-width positive look-behind assertion. For example, C</(?<=\t)\w+/>
matches a word that follows a tab, without including the tab in C<$&>.
Works only for fixed-width look-behind.
=item C<(?<!pattern)>
+X<(?<!)> X<look-behind, negative> X<lookbehind, negative>
A zero-width negative look-behind assertion. For example C</(?<!bar)foo/>
matches any occurrence of "foo" that does not follow "bar". Works
only for fixed-width look-behind.
=item C<(?{ code })>
+X<(?{})> X<regex, code in> X<regexp, code in> X<regular expression, code in>
B<WARNING>: This extended regular expression feature is considered
highly experimental, and may be changed or deleted without notice.
@@ -632,6 +686,9 @@ Better yet, use the carefully constrained evaluation within a Safe
compartment. See L<perlsec> for details about both these mechanisms.
=item C<(??{ code })>
+X<(??{})>
+X<regex, postponed> X<regexp, postponed> X<regular expression, postponed>
+X<regex, recursive> X<regexp, recursive> X<regular expression, recursive>
B<WARNING>: This extended regular expression feature is considered
highly experimental, and may be changed or deleted without notice.
@@ -659,6 +716,7 @@ The following pattern matches a parenthesized group:
}x;
=item C<< (?>pattern) >>
+X<backtrack> X<backtracking>
B<WARNING>: This extended regular expression feature is considered
highly experimental, and may be changed or deleted without notice.
@@ -752,6 +810,7 @@ Which one you pick depends on which of these expressions better reflects
the above specification of comments.
=item C<(?(condition)yes-pattern|no-pattern)>
+X<(?()>
=item C<(?(condition)yes-pattern)>
@@ -775,6 +834,7 @@ themselves.
=back
=head2 Backtracking
+X<backtrack> X<backtracking>
NOTE: This section presents an abstract approximation of regular
expression behavior. For a more rigorous (and complicated) view of
@@ -981,6 +1041,7 @@ where side-effects of look-ahead I<might> have influenced the
following match, see L<C<< (?>pattern) >>>.
=head2 Version 8 Regular Expressions
+X<regular expression, version 8> X<regex, version 8> X<regexp, version 8>
In case you're not familiar with the "regular" Version 8 regex
routines, here are the pattern-matching rules not described above.