diff options
author | Steve Purkis <Steve.Purkis@multimap.com> | 2006-01-20 07:35:06 -0500 |
---|---|---|
committer | Nicholas Clark <nick@ccl4.org> | 2006-02-01 19:30:52 +0000 |
commit | 5496314a41c61bc06e565c745abc1dc795ce4db3 (patch) | |
tree | 097a6aff6e4191485f244fd82d801bf5b13e444c /pod | |
parent | 70fb64f63d6cf0a6c7ededf95d88e9321d4efe68 (diff) | |
download | perl-5496314a41c61bc06e565c745abc1dc795ce4db3.tar.gz |
[[:...:]] is equivalent to \p{...}, not [:...:], tweaked from
Subject: Re: [:...:] and \p{...} character class equivalence in utf8 regexps
Message-Id: <0DAE5956-3ECC-4692-A0C9-C62C8F790C97@multimap.com>
Date: Fri, 20 Jan 2006 12:35:06 -0500
p4raw-id: //depot/perl@27042
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perlre.pod | 25 |
1 files changed, 17 insertions, 8 deletions
diff --git a/pod/perlre.pod b/pod/perlre.pod index f24e97157b..32a7e6fcf7 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -224,8 +224,17 @@ X<character class> [:class:] -is also available. The available classes and their backslash -equivalents (if available) are as follows: +is also available. Note that the C<[> and C<]> braces are I<literal>; +they must always be used within a character class expression. + + # this is correct: + $string =~ /[[:alpha:]]/; + + # this is not, and will generate a warning: + $string =~ /[:alpha:]/; + +The available classes and their backslash equivalents (if available) are +as follows: X<character class> X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph> X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit> @@ -274,7 +283,7 @@ The following equivalences to Unicode \p{} constructs and equivalent backslash character classes (if available), will hold: X<character class> X<\p> X<\p{}> - [:...:] \p{...} backslash + [[:...:]] \p{...} backslash alpha IsAlpha alnum IsAlnum @@ -292,7 +301,7 @@ X<character class> X<\p> X<\p{}> word IsWord xdigit IsXDigit -For example C<[:lower:]> and C<\p{IsLower}> are equivalent. +For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent. If the C<utf8> pragma is not used but the C<locale> pragma is, the classes correlate with the usual isalpha(3) interface (except for @@ -339,11 +348,11 @@ You can negate the [::] character classes by prefixing the class name with a '^'. This is a Perl extension. For example: X<character class, negation> - POSIX traditional Unicode + POSIX traditional Unicode - [:^digit:] \D \P{IsDigit} - [:^space:] \S \P{IsSpace} - [:^word:] \W \P{IsWord} + [[:^digit:]] \D \P{IsDigit} + [[:^space:]] \S \P{IsSpace} + [[:^word:]] \W \P{IsWord} Perl respects the POSIX standard in that POSIX character classes are only supported within a character class. The POSIX character classes |