summaryrefslogtreecommitdiff
path: root/pod/perlunifaq.pod
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2010-12-01 16:33:54 -0700
committerFather Chrysostomos <sprout@cpan.org>2010-12-01 18:23:45 -0800
commit20db750130061015fab1ffed94ff374c2bd38af3 (patch)
treef5852919978cf5cb1d80e098e92a3eef5972abf1 /pod/perlunifaq.pod
parent4ee7c0eabacb52cfaad975a33feeb842bbf347b3 (diff)
downloadperl-20db750130061015fab1ffed94ff374c2bd38af3.tar.gz
Document Unicode doc fix
Diffstat (limited to 'pod/perlunifaq.pod')
-rw-r--r--pod/perlunifaq.pod42
1 files changed, 21 insertions, 21 deletions
diff --git a/pod/perlunifaq.pod b/pod/perlunifaq.pod
index 877e4d15e6..9fd2b38056 100644
--- a/pod/perlunifaq.pod
+++ b/pod/perlunifaq.pod
@@ -138,27 +138,27 @@ concern, and you can just C<eval> dumped data as always.
=head2 Why do some characters not uppercase or lowercase correctly?
-It seemed like a good idea at the time, to keep the semantics the same for
-standard strings, when Perl got Unicode support. The plan is to fix this
-in the future, and the casing component has in fact mostly been fixed, but we
-have to deal with the fact that Perl treats equal strings differently,
-depending on the internal state.
-
-First the casing. Just put a C<use feature 'unicode_strings'> near the
-beginning of your program. Within its lexical scope, C<uc>, C<lc>, C<ucfirst>,
-C<lcfirst>, and the regular expression escapes C<\U>, C<\L>, C<\u>, C<\l> use
-Unicode semantics for changing case regardless of whether the UTF8 flag is on
-or not. However, if you pass strings to subroutines in modules outside the
-pragma's scope, they currently likely won't behave this way, and you have to
-try one of the solutions below. There is another exception as well: if you
-have furnished your own casing functions to override the default, these will
-not be called unless the UTF8 flag is on)
-
-This remains a problem for the regular expression constructs
-C</.../i>, C<(?i:...)>, and C</[[:posix:]]/>.
-
-To force Unicode semantics, you can upgrade the internal representation to
-by doing C<utf8::upgrade($string)>. This can be used
+Starting in Perl 5.14 (and partially in Perl 5.12), just put a
+C<use feature 'unicode_strings'> near the beginning of your program.
+Within its lexical scope you shouldn't have this problem. It also is
+automatically enabled under C<use feature ':5.12'> or using C<-E> on the
+command line for Perl 5.12 or higher.
+
+The rationale for requiring this is to not break older programs that
+rely on the way things worked before Unicode came along. Those older
+programs knew only about the ASCII character set, and so may not work
+properly for additional characters. When a string is encoded in UTF-8,
+Perl assumes that the program is prepared to deal with Unicode, but when
+the string isn't, Perl assumes that only ASCII (unless it is an EBCDIC
+platform) is wanted, and so those characters that are not ASCII
+characters aren't recognized as to what they would be in Unicode.
+C<use feature 'unicode_strings'> tells Perl to treat all characters as
+Unicode, whether the string is encoded in UTF-8 or not, thus avoiding
+the problem.
+
+However, on earlier Perls, or if you pass strings to subroutines outside
+the feature's scope, you can force Unicode semantics by changing the
+encoding to UTF-8 by doing C<utf8::upgrade($string)>. This can be used
safely on any string, as it checks and does not change strings that have
already been upgraded.