diff options
author | Karl Williamson <public@khwilliamson.com> | 2012-02-05 16:17:54 -0700 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2012-02-09 10:13:59 -0700 |
commit | 8129baca5dd762540c807db6ddf8d2e9fa4121b2 (patch) | |
tree | 4d31863c0b1e9786d230244c280e4af88757a20c | |
parent | 67addccf238c3d67d84f7dc1f5b4a2e791bf68da (diff) | |
download | perl-8129baca5dd762540c807db6ddf8d2e9fa4121b2.tar.gz |
perrebackslash, perlrecharclass: Note locale effects
This adds text to specify what happens under 'use locale'.
-rw-r--r-- | pod/perlrebackslash.pod | 3 | ||||
-rw-r--r-- | pod/perlrecharclass.pod | 22 |
2 files changed, 18 insertions, 7 deletions
diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod index cc72a1f14e..98435e5b8c 100644 --- a/pod/perlrebackslash.pod +++ b/pod/perlrebackslash.pod @@ -618,6 +618,9 @@ C<\R> can match a sequence of more than one character, it cannot be put inside a bracketed character class; C</[\R]/> is an error; use C<\v> instead. C<\R> was introduced in perl 5.10.0. +Note that this does not respect any locale that might be in effect; it +matches according to the platform's native character set. + Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>, and more importantly because Unicode recommends such a regular expression metacharacter, and suggests C<\R> as its notation. diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod index f50699b951..06d206b2f8 100644 --- a/pod/perlrecharclass.pod +++ b/pod/perlrecharclass.pod @@ -252,24 +252,30 @@ Which rules apply are determined as described in L<perlre/Which character set mo Any character not matched by C<\s> is matched by C<\S>. C<\h> matches any character considered horizontal whitespace; -this includes the space and tab characters and several others +this includes the platform's space and tab characters and several others listed in the table below. C<\H> matches any character -not considered horizontal whitespace. +not considered horizontal whitespace. They use the platform's native +character set, and do not consider any locale that may otherwise be in +use. C<\v> matches any character considered vertical whitespace; -this includes the carriage return and line feed characters (newline) +this includes the platform's carriage return and line feed characters (newline) plus several other characters, all listed in the table below. C<\V> matches any character not considered vertical whitespace. +They use the platform's native character set, and do not consider any +locale that may otherwise be in use. C<\R> matches anything that can be considered a newline under Unicode rules. It's not a character class, as it can match a multi-character sequence. Therefore, it cannot be used inside a bracketed character -class; use C<\v> instead (vertical whitespace). +class; use C<\v> instead (vertical whitespace). It uses the platform's +native character set, and does not consider any locale that may +otherwise be in use. Details are discussed in L<perlrebackslash>. Note that unlike C<\s> (and C<\d> and C<\w>), C<\h> and C<\v> always match -the same characters, without regard to other factors, such as whether the -source string is in UTF-8 format. +the same characters, without regard to other factors, such as the active +locale or whether the source string is in UTF-8 format. One might think that C<\s> is equivalent to C<[\h\v]>. This is not true. The difference is that the vertical tab (C<"\x0b">) is not matched by @@ -777,7 +783,9 @@ The POSIX class matches the same as its Full-range counterpart. =item if locale rules are in effect ... -The POSIX class matches according to the locale. +The POSIX class matches according to the locale, except that +C<word> uses the platform's native underscore character, no matter what +the locale is. =item if Unicode rules are in effect or if on an EBCDIC platform ... |