summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKarl Williamson <khw@khw-desktop.(none)>2010-03-27 22:43:32 -0600
committerRafael Garcia-Suarez <rgs@consttype.org>2010-03-28 15:57:17 +0200
commitd0b161077624458de6e6b915c2c6d48c04aca5e4 (patch)
treee1dabdf375482bfb3d9d6e288b0248d74f050b09
parent272d2fccdd8295527af922a2af84ef7205338f65 (diff)
downloadperl-d0b161077624458de6e6b915c2c6d48c04aca5e4.tar.gz
Remove duplicate information and refer to other pods
Things were getting out of sync.
-rw-r--r--pod/perlre.pod157
1 files changed, 28 insertions, 129 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod
index 014f921d01..12a111903e 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -166,7 +166,7 @@ X<metacharacter> X<quantifier> X<*> X<+> X<?> X<{n}> X<{n,}> X<{n,m}>
as a regular character. In particular, the lower bound
is not optional.) The "*" quantifier is equivalent to C<{0,}>, the "+"
quantifier to C<{1,}>, and the "?" quantifier to C<{0,1}>. n and m are limited
-to integral values less than a preset limit defined when perl is built.
+to non-negative integral values less than a preset limit defined when perl is built.
This is usually 32766 on the most common platforms. The actual limit can
be seen in the error message generated by code such as this:
@@ -223,7 +223,7 @@ instance the above example could also be written as follows:
Because patterns are processed as double quoted strings, the following
also work:
X<\t> X<\n> X<\r> X<\f> X<\e> X<\a> X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
-X<\0> X<\c> X<\N> X<\x>
+X<\0> X<\c> X<\N{}> X<\x>
\t tab (HT, TAB)
\n newline (LF, NL)
@@ -256,9 +256,7 @@ You'll need to write something like C<m/\Quser\E\@\Qhost/>.
=head3 Character Classes and other Special Escapes
In addition, Perl defines the following:
-X<\w> X<\W> X<\s> X<\S> X<\d> X<\D> X<\X> X<\p> X<\P> X<\C>
-X<\g> X<\k> X<\N> X<\K> X<\v> X<\V> X<\h> X<\H>
-X<word> X<whitespace> X<character class> X<backreference>
+X<\g> X<\k> X<\K> X<backreference>
\w Match a "word" character (alphanumeric plus "_")
\W Match a non-"word" character
@@ -288,29 +286,10 @@ X<word> X<whitespace> X<character class> X<backreference>
\H Not horizontal whitespace
\R Linebreak
-A C<\w> matches a single alphanumeric character (an alphabetic
-character, or a decimal digit) or C<_>, not a whole word. Use C<\w+>
-to match a string of Perl-identifier characters (which isn't the same
-as matching an English word). If C<use locale> is in effect, the list
-of alphabetic characters generated by C<\w> is taken from the current
-locale. See L<perllocale>. You may use C<\w>, C<\W>, C<\s>, C<\S>,
-C<\d>, and C<\D> within character classes, but they aren't usable
-as either end of a range. If any of them precedes or follows a "-",
-the "-" is understood literally. If Unicode is in effect, C<\s> matches
-also "\x{85}", "\x{2028}", and "\x{2029}". See L<perlunicode> for more
-details about C<\pP>, C<\PP>, C<\X> and the possibility of defining
-your own C<\p> and C<\P> properties, and L<perluniintro> about Unicode
-in general.
-X<\w> X<\W> X<word>
-
-C<\R> will atomically match a linebreak, including the network line-ending
-"\x0D\x0A". Specifically, X<\R> is exactly equivalent to
-
- (?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])
-
-B<Note:> C<\R> has no special meaning inside of a character class;
-use C<\v> instead (vertical whitespace).
-X<\R>
+See L<perlrecharclass/Backslashed sequences> for details on
+on C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, C<\D>, C<\p>, C<\P>, C<\N>, C<\v>, C<\V>,
+C<\h>, and C<\H>.
+See L<perlrebackslash/Misc> for details on C<\R> and C<\X>.
Note that C<\N> has two meanings. When of the form C<\N{NAME}>, it matches the
character whose name is C<NAME>; and similarly when of the form
@@ -331,113 +310,33 @@ they must always be used within a character class expression.
# this is not, and will generate a warning:
$string =~ /[:alpha:]/;
-The following table shows the mapping of POSIX character class
-names, common escapes, literal escape sequences and their equivalent
-Unicode style property names.
-X<character class> X<\p> X<\p{}>
-X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph>
-X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit>
-
-B<Note:> up to Perl 5.10 the property names used were shared with
-standard Unicode properties, this was changed in Perl 5.11, see
-L<perl5110delta> for details.
-
- POSIX Esc Class Property Note
- --------------------------------------------------------
- alnum [0-9A-Za-z] IsPosixAlnum
- alpha [A-Za-z] IsPosixAlpha
- ascii [\000-\177] IsASCII
- blank [\011 ] IsPosixBlank [1]
- cntrl [\0-\37\177] IsPosixCntrl
- digit \d [0-9] IsPosixDigit
- graph [!-~] IsPosixGraph
- lower [a-z] IsPosixLower
- print [ -~] IsPosixPrint
- punct [!-/:-@[-`{-~] IsPosixPunct
- space [\11-\15 ] IsPosixSpace [2]
- \s [\11\12\14\15 ] IsPerlSpace [2]
- upper [A-Z] IsPosixUpper
- word \w [0-9A-Z_a-z] IsPerlWord [3]
- xdigit [0-9A-Fa-f] IsXDigit
-
-=over
-
-=item [1]
-
-A GNU extension equivalent to C<[ \t]>, "all horizontal whitespace".
-
-=item [2]
-
-Note that C<\s> and C<[[:space:]]> are B<not> equivalent as C<[[:space:]]>
-includes also the (very rare) "vertical tabulator", "\cK" or chr(11) in
-ASCII.
-
-=item [3]
-
-A Perl extension, see above.
-
-=back
-
-For example use C<[:upper:]> to match all the uppercase characters.
-Note that the C<[]> are part of the C<[::]> construct, not part of the
-whole character class. For example:
-
- [01[:alpha:]%]
-
-matches zero, one, any alphabetic character, and the percent sign.
-
-The other named classes are:
-
-=over 4
-
-=item cntrl
-X<cntrl>
-
-Any control character. Usually characters that don't produce output as
-such but instead control the terminal somehow: for example newline and
-backspace are control characters. All characters with ord() less than
-32 are usually classified as control characters (assuming ASCII,
-the ISO Latin character sets, and Unicode), as is the character with
-the ord() value of 127 (C<DEL>).
-
-=item graph
-X<graph>
-
-Any alphanumeric or punctuation (special) character.
-
-=item print
-X<print>
-
-Any alphanumeric or punctuation (special) character or the space character.
-
-=item punct
-X<punct>
-
-Any punctuation (special) character.
-
-=item xdigit
-X<xdigit>
-
-Any hexadecimal digit. Though this may feel silly ([0-9A-Fa-f] would
-work just fine) it is included for completeness.
-
-=back
+The following Posix-style character classes are available:
+
+ [[:alpha:]] Any alphabetical character.
+ [[:alnum:]] Any alphanumerical character.
+ [[:ascii:]] Any character in the ASCII character set.
+ [[:blank:]] A GNU extension, equal to a space or a horizontal tab
+ [[:cntrl:]] Any control character.
+ [[:digit:]] Any decimal digit, equivalent to "\d".
+ [[:graph:]] Any printable character, excluding a space.
+ [[:lower:]] Any lowercase character.
+ [[:print:]] Any printable character, including a space.
+ [[:punct:]] Any graphical character excluding "word" characters.
+ [[:space:]] Any whitespace character. "\s" plus the vertical tab ("\cK").
+ [[:upper:]] Any uppercase character.
+ [[:word:]] A Perl extension, equivalent to "\w".
+ [[:xdigit:]] Any hexadecimal digit.
You can negate the [::] character classes by prefixing the class name
-with a '^'. This is a Perl extension. For example:
-X<character class, negation>
-
- POSIX traditional Unicode
+with a '^'. This is a Perl extension.
- [[:^digit:]] \D \P{IsPosixDigit}
- [[:^space:]] \S \P{IsPosixSpace}
- [[:^word:]] \W \P{IsPerlWord}
-
-Perl respects the POSIX standard in that POSIX character classes are
-only supported within a character class. The POSIX character classes
+The POSIX character classes
[.cc.] and [=cc=] are recognized but B<not> supported and trying to
use them will cause an error.
+Details on POSIX character classes are in
+L<perlrecharclass/Posix Character Classes>.
+
=head3 Assertions
Perl defines the following zero-width assertions: