If Unicode keys are entered to a hash, a bit is turned on.

If the bit is on, when the keys are fetched from the hash (%h, each %h, keys %h), the Unicodified versions of the keys are returned if needed. This solution errs on the size of over-Unicodifying, the old solution erred on the side of under-Unicodifying. As long as the hash keys can be a mix of byte and Unicode strings, a perfect fit is hard to come by. p4raw-id: //depot/perl@15407
author: Jarkko Hietaniemi <jhi@iki.fi> 2002-03-22 04:07:13 +0000
committer: Jarkko Hietaniemi <jhi@iki.fi> 2002-03-22 04:07:13 +0000
commit: 574c8022b1fdc7312bf9a5af037c8f777b60b6db (patch)
tree: 06b4317b44c20a0a8683822193a3359385f3c9bf /pod/perlunicode.pod
parent: 3fbcfac442ddabdaab668242ba16ca26c5edd56c (diff)
download: perl-574c8022b1fdc7312bf9a5af037c8f777b60b6db.tar.gz
1 files changed, 12 insertions, 20 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 4cb83252f0..9ba32ee3e0 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -113,8 +113,8 @@ Character semantics have the following effects:
 
 =item *
 
-Strings and patterns may contain characters that have an ordinal value
-larger than 255.
+Strings (including hash keys) and regular expression patterns may
+contain characters that have an ordinal value larger than 255.
 
 If you use a Unicode editor to edit your program, Unicode characters
 may occur directly within the literal strings in one of the various
@@ -128,18 +128,20 @@ hexadecimal, into the curlies. For instance, a smiley face is C<\x{263A}>.
 This works only for characters with a code 0x100 and above.
 
 Additionally, if you
+
    use charnames ':full';
+
 you can use the C<\N{...}> notation, putting the official Unicode character
 name within the curlies. For example, C<\N{WHITE SMILING FACE}>.
 This works for all characters that have names.
 
 =item *
 
-If an appropriate L<encoding> is specified,
-identifiers within the Perl script may contain Unicode alphanumeric
-characters, including ideographs.  (You are currently on your own when
-it comes to using the canonical forms of characters--Perl doesn't
-(yet) attempt to canonicalize variable names for you.)
+If an appropriate L<encoding> is specified, identifiers within the
+Perl script may contain Unicode alphanumeric characters, including
+ideographs.  (You are currently on your own when it comes to using the
+canonical forms of characters--Perl doesn't (yet) attempt to
+canonicalize variable names for you.)
 
 =item *
 
@@ -846,8 +848,7 @@ B<any subsequent file open>, is UTF-8.
 
 Perl tries really hard to work both with Unicode and the old byte
 oriented world: most often this is nice, but sometimes this causes
-problems.  See L</BUGS> for example how sometimes using locales
-with Unicode can help with these problems.
+problems.
 
 =back
 
@@ -959,19 +960,10 @@ Use of locales with Unicode data may lead to odd results.  Currently
 there is some attempt to apply 8-bit locale info to characters in the
 range 0..255, but this is demonstrably incorrect for locales that use
 characters above that range when mapped into Unicode.  It will also
-tend to run slower.  Avoidance of locales is strongly encouraged,
-with one known expection, see the next paragraph.
-
-If the keys of a hash are "mixed", that is, some keys are Unicode,
-while some keys are "byte", the keys may behave differently in regular
-expressions since the definition of character classes like C</\w/>
-is different for byte strings and character strings.  This problem can
-sometimes be helped by using an appropriate locale (see L<perllocale>).
-Another way is to force all the strings to be character encoded by
-using utf8::upgrade() (see L<utf8>).
+tend to run slower.  Use of locales with Unicode is discouraged.
 
 Some functions are slower when working on UTF-8 encoded strings than
-on byte encoded strings. All functions that need to hop over
+on byte encoded strings.  All functions that need to hop over
 characters such as length(), substr() or index() can work B<much>
 faster when the underlying data are byte-encoded. Witness the
 following benchmark:
author	Jarkko Hietaniemi <jhi@iki.fi>	2002-03-22 04:07:13 +0000
committer	Jarkko Hietaniemi <jhi@iki.fi>	2002-03-22 04:07:13 +0000
commit	574c8022b1fdc7312bf9a5af037c8f777b60b6db (patch)
tree	06b4317b44c20a0a8683822193a3359385f3c9bf /pod/perlunicode.pod
parent	3fbcfac442ddabdaab668242ba16ca26c5edd56c (diff)
download	perl-574c8022b1fdc7312bf9a5af037c8f777b60b6db.tar.gz