diff options
author | Rafael Garcia-Suarez <rgarciasuarez@gmail.com> | 2003-08-10 12:32:47 +0000 |
---|---|---|
committer | Rafael Garcia-Suarez <rgarciasuarez@gmail.com> | 2003-08-10 12:32:47 +0000 |
commit | 30487ceba2ac5c35e693d7aba544e73d6a7dc3f0 (patch) | |
tree | 4e78c95184dc1cd5061efc525dbe69fbac045e7a /pod/perlreref.pod | |
parent | c90eafb3a162eee5a1341ef84b7efe8ad94c3fca (diff) | |
download | perl-30487ceba2ac5c35e693d7aba544e73d6a7dc3f0.tar.gz |
Add the perlreref manpage, by Iain Truskett
(regular expressions quick reference.)
Regenerate the table of contents.
p4raw-id: //depot/perl@20593
Diffstat (limited to 'pod/perlreref.pod')
-rw-r--r-- | pod/perlreref.pod | 284 |
1 files changed, 284 insertions, 0 deletions
diff --git a/pod/perlreref.pod b/pod/perlreref.pod new file mode 100644 index 0000000000..8aad32719a --- /dev/null +++ b/pod/perlreref.pod @@ -0,0 +1,284 @@ +=head1 NAME + +perlreref - Perl Regular Expressions Reference + +=head1 DESCRIPTION + +This is a quick reference to Perl's regular expressions. +For full information see L<perlre> and L<perlop>, as well +as the L<references|/"SEE ALSO"> section in this document. + +=head1 OPERATORS + + =~ determines to which variable the regex is applied. + In its absence, C<$_> is used. + + $var =~ /foo/; + + m/pattern/igmsoxc searches a string for a pattern match, + applying the given options. + + i case-Insensitive + g Global - all occurrences + m Multiline mode - ^ and $ match internal lines + s match as a Single line - . matches \n + o compile pattern Once + x eXtended legibility - free whitespace and comments + c don't reset pos on fails when using /g + + If C<pattern> is an empty string, the last I<successfully> match + regex is used. Delimiters other than C</> may be used for both this + operator and the following ones. + + qr/pattern/imsox lets you store a regex in a variable, + or pass one around. Modifiers as for C<m//> and are stored + within the regex. + + s/pattern/replacement/igmsoxe substitutes matches of + C<pattern> with C<replacement>. Modifiers as for C<m//> + with addition of C<e>: + + e Evaluate replacement as an expression + + 'e' may be specified multiple times. 'replacement' is interpreted + as a double quoted string unless a single-quote (') is the delimiter. + + ?pattern? is like C<m/pattern/> but matches only once. No alternate + delimiters can be used. Must be reset with 'reset'. + +=head1 SYNTAX + + \ Escapes the character(s) immediately following it + . Matches any single character except a newline (unless /s is used) + ^ Matches at the beginning of the string (or line, if /m is used) + $ Matches at the end of the string (or line, if /m is used) + * Matches the preceding element 0 or more times + + Matches the preceding element 1 or more times + ? Matches the preceding element 0 or 1 times + {...} Specifies a range of occurrences for the element preceding it + [...] Matches any one of the characters contained within the brackets + (...) Groups regular expressions + | Matches either the expression preceding or following it + \1, \2 ... The text from the Nth group + +=head2 ESCAPE SEQUENCES + +These work as in normal strings. + + \a Alarm (beep) + \e Escape + \f Formfeed + \n Newline + \r Carriage return + \t Tab + \038 Any octal ASCII value + \x7f Any hexadecimal ASCII value + \x{263a} A wide hexadecimal value + \cx Control-x + \N{name} A named character + + \l Lowercase until next character + \u Uppercase until next character + \L Lowercase until \E + \U Uppercase until \E + \Q Disable pattern metacharacters until \E + \E End case modification + +This one works differently from normal strings: + + \b An assertion, not backspace, except in a character class + +=head2 CHARACTER CLASSES + + [amy] Match 'a', 'm' or 'y' + [f-j] Dash specifies "range" + [f-j-] Dash escaped or at start or end means 'dash' + [^f-j] Caret indicates "match char any _except_ these" + +The following work within or without a character class: + + \d A digit, same as [0-9] + \D A nondigit, same as [^0-9] + \w A word character (alphanumeric), same as [a-zA-Z_0-9] + \W A non-word character, [^a-zA-Z_0-9] + \s A whitespace character, same as [ \t\n\r\f] + \S A non-whitespace character, [^ \t\n\r\f] + \C Match a byte (with Unicode. '.' matches char) + \pP Match P-named (Unicode) property + \p{...} Match Unicode property with long name + \PP Match non-P + \P{...} Match lack of Unicode property with long name + \X Match extended unicode sequence + +POSIX character classes and their Unicode and Perl equivalents: + + alnum IsAlnum Alphanumeric + alpha IsAlpha Alphabetic + ascii IsASCII Any ASCII char + blank IsSpace [ \t] Horizontal whitespace (GNU) + cntrl IsCntrl Control characters + digit IsDigit \d Digits + graph IsGraph Alphanumeric and punctuation + lower IsLower Lowercase chars (locale aware) + print IsPrint Alphanumeric, punct, and space + punct IsPunct Punctuation + space IsSpace [\s\ck] Whitespace + IsSpacePerl \s Perl's whitespace definition + upper IsUpper Uppercase chars (locale aware) + word IsWord \w Alphanumeric plus _ (Perl) + xdigit IsXDigit [\dA-Fa-f] Hexadecimal digit + +Within a character class: + + POSIX traditional Unicode + [:digit:] \d \p{IsDigit} + [:^digit:] \D \P{IsDigit} + +=head2 ANCHORS + +All are zero-width assertions. + + ^ Match string start (or line, if /m is used) + $ Match string end (or line, if /m is used) or before newline + \b Match word boundary (between \w and \W) + \B Match except at word boundary + \A Match string start (regardless of /m) + \Z Match string end (preceding optional newline) + \z Match absolute string end + \G Match where previous m//g left off + \c Suppresses resetting of search position when used with /g. + Without \c, search pattern is reset to the beginning of the string + +=head2 QUANTIFIERS + +Quantifiers are greedy by default --- match the B<longest> leftmost. + + Maximal Minimal Allowed range + ------- ------- ------------- + {n,m} {n,m}? Must occur at least n times but no more than m times + {n,} {n,}? Must occur at least n times + {n} {n}? Must match exactly n times + * *? 0 or more times (same as {0,}) + + +? 1 or more times (same as {1,}) + ? ?? 0 or 1 time (same as {0,1}) + +=head2 EXTENDED CONSTRUCTS + + (?#text) A comment + (?:...) Cluster without capturing + (?imxs-imsx:...) Enable/disable option (as per m//) + (?=...) Zero-width positive lookahead assertion + (?!...) Zero-width negative lookahead assertion + (?<...) Zero-width positive lookbehind assertion + (?<!...) Zero-width negative lookbehind assertion + (?>...) Grab what we can, prohibit backtracking + (?{ code }) Embedded code, return value becomes $^R + (??{ code }) Dynamic regex, return value used as regex + (?(cond)yes|no) cond being int corresponding to capturing parens + (?(cond)yes) or a lookaround/eval zero-width assertion + +=head1 VARIABLES + + $_ Default variable for operators to use + $* Enable multiline matching (deprecated; not in 5.8.1+) + + $& Entire matched string + $` Everything prior to matched string + $' Everything after to matched string + +The use of those last three will slow down B<all> regex use +within your program. Consult L<perlvar> for C<@LAST_MATCH_START> +to see equivalent expressions that won't cause slow down. +See also L<Devel::SawAmpersand>. + + $1, $2 ... hold the Xth captured expr + $+ Last parenthesized pattern match + $^N Holds the most recently closed capture + $^R Holds the result of the last (?{...}) expr + @- Offsets of starts of groups. [0] holds start of whole match + @+ Offsets of ends of groups. [0] holds end of whole match + +Capture groups are numbered according to their I<opening> paren. + +=head1 FUNCTIONS + + lc Lowercase a string + lcfirst Lowercase first char of a string + uc Uppercase a string + ucfirst Titlecase first char of a string + pos Return or set current match position + quotemeta Quote metacharacters + reset Reset ?pattern? status + study Analyze string for optimizing matching + + split Use regex to split a string into parts + +=head1 AUTHOR + +Iain Truskett. + +This document may be distributed under the same terms as Perl itself. + +=head1 SEE ALSO + +=over 4 + +=item * + +L<perlretut> for a tutorial on regular expressions. + +=item * + +L<perlrequick> for a rapid tutorial. + +=item * + +L<perlre> for more details. + +=item * + +L<perlvar> for details on the variables. + +=item * + +L<perlop> for details on the operators. + +=item * + +L<perlfunc> for details on the functions. + +=item * + +L<perlfaq6> for FAQs on regular expressions. + +=item * + +The L<re> module to alter behaviour and aid +debugging. + +=item * + +L<perldebug/"Debugging regular expressions"> + +=item * + +L<perluniintro>, L<perlunicode>, L<charnames> and L<locale> +for details on regexes and internationalisation. + +=item * + +I<Mastering Regular Expressions> by Jeffrey Friedl +(F<http://regex.info/>) for a thorough grounding and +reference on the topic. + +=back + +=head1 THANKS + +David P.C. Wollmann, +Richard Soderberg, +Sean M. Burke, +Tom Christiansen, +and +Jeffrey Goff +for useful advice. |