Re: [PATCH] (Re: [PATCH] unicode/utf8 pod)

Message-ID: <20070304150019.GN4723@c4.convolution.nl> p4raw-id: //depot/perl@30493
author: Juerd Waalboer <#####@juerd.nl> 2007-03-04 17:00:19 +0100
committer: H.Merijn Brand <h.m.brand@xs4all.nl> 2007-03-07 13:23:23 +0000
commit: 2575c402a8f9be55f848bdfb219afbf912c50ac1 (patch)
tree: c21a19c42deaa2dba098c38d74338a7c01328c28 /pod/perlpacktut.pod
parent: 2a6a970fa1b36c99c83fd3fdd48253c1b567db9b (diff)
download: perl-2575c402a8f9be55f848bdfb219afbf912c50ac1.tar.gz
1 files changed, 18 insertions, 6 deletions
diff --git a/pod/perlpacktut.pod b/pod/perlpacktut.pod
index 1cb127e0b9..d907b1805c 100644
--- a/pod/perlpacktut.pod
+++ b/pod/perlpacktut.pod
@@ -633,24 +633,36 @@ The UTF-8 encoding avoids this by storing the most common (from a western
 point of view) characters in a single byte while encoding the rarer
 ones in three or more bytes.
 
-So what has this got to do with C<pack>? Well, if you want to convert
-between a Unicode number and its UTF-8 representation you can do so by
-using template code C<U>. As an example, let's produce the UTF-8
-representation of the Euro currency symbol (code number 0x20AC):
+Perl uses UTF-8, internally, for most Unicode strings.
+
+So what has this got to do with C<pack>? Well, if you want to compose a
+Unicode string (that is internally encoded as UTF-8), you can do so by
+using template code C<U>. As an example, let's produce the Euro currency
+symbol (code number 0x20AC):
 
    $UTF8{Euro} = pack( 'U', 0x20AC );
+   # Equivalent to: $UTF8{Euro} = "\x{20ac}";
 
-Inspecting C<$UTF8{Euro}> shows that it contains 3 bytes: "\xe2\x82\xac". The
-round trip can be completed with C<unpack>:
+Inspecting C<$UTF8{Euro}> shows that it contains 3 bytes:
+"\xe2\x82\xac". However, it contains only 1 character, number 0x20AC.
+The round trip can be completed with C<unpack>:
 
    $Unicode{Euro} = unpack( 'U', $UTF8{Euro} );
 
+Unpacking using the C<U> template code also works on UTF-8 encoded byte
+strings.
+
 Usually you'll want to pack or unpack UTF-8 strings:
 
    # pack and unpack the Hebrew alphabet
    my $alefbet = pack( 'U*', 0x05d0..0x05ea );
    my @hebrew = unpack( 'U*', $utf );
 
+Please note: in the general case, you're better off using
+Encode::decode_utf8 to decode a UTF-8 encoded byte string to a Perl
+unicode string, and Encode::encode_utf8 to encode a Perl unicode string
+to UTF-8 bytes. These functions provide means of handling invalid byte
+sequences and generally have a friendlier interface.
 
 =head2 Another Portable Binary Encoding
author	Juerd Waalboer <#####@juerd.nl>	2007-03-04 17:00:19 +0100
committer	H.Merijn Brand <h.m.brand@xs4all.nl>	2007-03-07 13:23:23 +0000
commit	2575c402a8f9be55f848bdfb219afbf912c50ac1 (patch)
tree	c21a19c42deaa2dba098c38d74338a7c01328c28 /pod/perlpacktut.pod
parent	2a6a970fa1b36c99c83fd3fdd48253c1b567db9b (diff)
download	perl-2575c402a8f9be55f848bdfb219afbf912c50ac1.tar.gz