summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2012-02-05 16:17:54 -0700
committerKarl Williamson <public@khwilliamson.com>2012-02-09 10:13:59 -0700
commit8129baca5dd762540c807db6ddf8d2e9fa4121b2 (patch)
tree4d31863c0b1e9786d230244c280e4af88757a20c
parent67addccf238c3d67d84f7dc1f5b4a2e791bf68da (diff)
downloadperl-8129baca5dd762540c807db6ddf8d2e9fa4121b2.tar.gz
perrebackslash, perlrecharclass: Note locale effects
This adds text to specify what happens under 'use locale'.
-rw-r--r--pod/perlrebackslash.pod3
-rw-r--r--pod/perlrecharclass.pod22
2 files changed, 18 insertions, 7 deletions
diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod
index cc72a1f14e..98435e5b8c 100644
--- a/pod/perlrebackslash.pod
+++ b/pod/perlrebackslash.pod
@@ -618,6 +618,9 @@ C<\R> can match a sequence of more than one character, it cannot be put
inside a bracketed character class; C</[\R]/> is an error; use C<\v>
instead. C<\R> was introduced in perl 5.10.0.
+Note that this does not respect any locale that might be in effect; it
+matches according to the platform's native character set.
+
Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>,
and more importantly because Unicode recommends such a regular expression
metacharacter, and suggests C<\R> as its notation.
diff --git a/pod/perlrecharclass.pod b/pod/perlrecharclass.pod
index f50699b951..06d206b2f8 100644
--- a/pod/perlrecharclass.pod
+++ b/pod/perlrecharclass.pod
@@ -252,24 +252,30 @@ Which rules apply are determined as described in L<perlre/Which character set mo
Any character not matched by C<\s> is matched by C<\S>.
C<\h> matches any character considered horizontal whitespace;
-this includes the space and tab characters and several others
+this includes the platform's space and tab characters and several others
listed in the table below. C<\H> matches any character
-not considered horizontal whitespace.
+not considered horizontal whitespace. They use the platform's native
+character set, and do not consider any locale that may otherwise be in
+use.
C<\v> matches any character considered vertical whitespace;
-this includes the carriage return and line feed characters (newline)
+this includes the platform's carriage return and line feed characters (newline)
plus several other characters, all listed in the table below.
C<\V> matches any character not considered vertical whitespace.
+They use the platform's native character set, and do not consider any
+locale that may otherwise be in use.
C<\R> matches anything that can be considered a newline under Unicode
rules. It's not a character class, as it can match a multi-character
sequence. Therefore, it cannot be used inside a bracketed character
-class; use C<\v> instead (vertical whitespace).
+class; use C<\v> instead (vertical whitespace). It uses the platform's
+native character set, and does not consider any locale that may
+otherwise be in use.
Details are discussed in L<perlrebackslash>.
Note that unlike C<\s> (and C<\d> and C<\w>), C<\h> and C<\v> always match
-the same characters, without regard to other factors, such as whether the
-source string is in UTF-8 format.
+the same characters, without regard to other factors, such as the active
+locale or whether the source string is in UTF-8 format.
One might think that C<\s> is equivalent to C<[\h\v]>. This is not true.
The difference is that the vertical tab (C<"\x0b">) is not matched by
@@ -777,7 +783,9 @@ The POSIX class matches the same as its Full-range counterpart.
=item if locale rules are in effect ...
-The POSIX class matches according to the locale.
+The POSIX class matches according to the locale, except that
+C<word> uses the platform's native underscore character, no matter what
+the locale is.
=item if Unicode rules are in effect or if on an EBCDIC platform ...