summaryrefslogtreecommitdiff
path: root/pod/perl592delta.pod
diff options
context:
space:
mode:
authorRafael Garcia-Suarez <rgarciasuarez@gmail.com>2005-03-13 21:14:36 +0000
committerRafael Garcia-Suarez <rgarciasuarez@gmail.com>2005-03-13 21:14:36 +0000
commita8cf0b1d6622369ce66f477c7ed494f6ff1fd35c (patch)
tree863a963650279dcc742fdab9f3cb8d49fcc678ad /pod/perl592delta.pod
parentd5c61f7c3478189627500a82494061b415064f59 (diff)
downloadperl-a8cf0b1d6622369ce66f477c7ed494f6ff1fd35c.tar.gz
Document pack changes in perldelta
p4raw-id: //depot/perl@24035
Diffstat (limited to 'pod/perl592delta.pod')
-rw-r--r--pod/perl592delta.pod28
1 files changed, 28 insertions, 0 deletions
diff --git a/pod/perl592delta.pod b/pod/perl592delta.pod
index 2002132d9f..fc251a872a 100644
--- a/pod/perl592delta.pod
+++ b/pod/perl592delta.pod
@@ -10,6 +10,34 @@ differences between 5.8.0 and 5.9.1.
=head1 Incompatible Changes
+=head2 Packing and UTF-8 strings
+
+The semantics of pack() and unpack() regarding UTF-8-encoded data has been
+clarified. B<The character mode is now the default.> Notably, code that
+uses C<pack("a*", $string)> to see through the encoding of string will now
+simply return $string.
+
+To be consistent with pack(), the C<C0> in unpack() templates indicates
+that the data is to be processed in character mode, i.e. character by
+character; at the contrary, C<U0> in unpack() indicates UTF-8 mode, where
+the packed string is processed in its UTF-8-encoded Unicode form on a byte
+by byte basis. This is reversed with regard to perl 5.8.X.
+
+Moreover, C<C0> and C<U0> can also be used in pack() templates to specify
+respectively character and byte modes.
+
+C<C0> and C<U0> in the middle of a pack format now switch to the specified
+encoding mode, honoring parens grouping. Previously, parens were ignored.
+
+Also, there is a new pack() character format, C<W>, which is intended to
+replace the old C<C>. C<C> is kept for unsigned chars coded on eight bits.
+C<W> represents unsigned character values, which can be greater than 255.
+It is therefore more robust when dealing with potentially UTF-8-encoded
+data (as C<C> will wrap values outside the range 0..255).
+
+In practice, that means that pack formats are now encoding-neutral, except
+C<C>.
+
=head1 Core Enhancements
=head1 Modules and Pragmata