pod/perl592delta.pod


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

=head1 NAME

perldelta - what is new for perl v5.9.2

=head1 DESCRIPTION

This document describes differences between the 5.9.1 and the 5.9.2
development releases. See L<perl590delta> and L<perl591delta> for the
differences between 5.8.0 and 5.9.1.

=head1 Incompatible Changes

=head2 Packing and UTF-8 strings

The semantics of pack() and unpack() regarding UTF-8-encoded data has been
changed. Processing is now by default character per character instead of
byte per byte on the underlying encoding. Notably, code that used things
like C<pack("a*", $string)> to see through the encoding of string will now
simply get back the original $string. Packed strings can also get upgraded
during processing when you store upgraded characters. You can get the old
behaviour by using C<use bytes>.

To be consistent with pack(), the C<C0> in unpack() templates indicates
that the data is to be processed in character mode, i.e. character by
character; at the contrary, C<U0> in unpack() indicates UTF-8 mode, where
the packed string is processed in its UTF-8-encoded Unicode form on a byte
by byte basis. This is reversed with regard to perl 5.8.X.

Moreover, C<C0> and C<U0> can also be used in pack() templates to specify
respectively character and byte modes.

C<C0> and C<U0> in the middle of a pack or unpack format now switch to the
specified encoding mode, honoring parens grouping. Previously, parens were
ignored.

Also, there is a new pack() character format, C<W>, which is intended to
replace the old C<C>. C<C> is kept for unsigned chars coded as bytes in
the strings internal representation. C<W> represents unsigned (logical)
character values, which can be greater than 255. It is therefore more
robust when dealing with potentially UTF-8-encoded data (as C<C> will wrap
values outside the range 0..255, and not respect the string encoding).

In practice, that means that pack formats are now encoding-neutral, except
C<C>.

For consistency, C<A> in unpack() format now trims all Unicode whitespace
from the end of the string. Before perl 5.9.2, it used to strip only the
classical ASCII space characters.

=head1 Core Enhancements

=head2 Regexp debug flags

A new variable, ${^RE_DEBUG_FLAGS}, controls what debug flags are in
effect for the regular expression engine when running under C<use re
"debug">. See L<re> for details.

=head1 Modules and Pragmata

=head1 Utility Changes

=head1 Documentation

=head1 Performance Enhancements

=head2 Trie optimization for regexp engine

The regexp engine is now able to factorize common prefixes and suffixes in
regular expressions. A new special variable, ${^RE_TRIE_MAXBUFF}, has been
added to fine tune this optimization.

=head1 Installation and Configuration Improvements

=head1 Selected Bug Fixes

=head1 New or Changed Diagnostics

=head1 Changed Internals

=head1 Known Problems

=head2 Platform Specific Problems

=head1 Reporting Bugs

If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://bugs.perl.org/ .  There may also be
information at http://www.perl.org/ , the Perl Home Page.

If you believe you have an unreported bug, please run the B<perlbug>
program included with your release.  Be sure to trim your bug down
to a tiny but sufficient test case.  Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.

=head1 SEE ALSO

The F<Changes> file for exhaustive details on what changed.

The F<INSTALL> file for how to build Perl.

The F<README> file for general stuff.

The F<Artistic> and F<Copying> files for copyright information.

=cut