summaryrefslogtreecommitdiff
path: root/pod/perl592delta.pod
blob: dd0404ac42924efe348f97d8ba073385ce8201b9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
=head1 NAME

perldelta - what is new for perl v5.9.2

=head1 DESCRIPTION

This document describes differences between the 5.9.1 and the 5.9.2
development releases. See L<perl590delta> and L<perl591delta> for the
differences between 5.8.0 and 5.9.1.

=head1 Incompatible Changes

=head2 Packing and UTF-8 strings

The semantics of pack() and unpack() regarding UTF-8-encoded data has been
changed. Processing is now by default character per character instead of
byte per byte on the underlying encoding. Notably, code that used things
like C<pack("a*", $string)> to see through the encoding of string will now
simply get back the original $string. Packed strings can also get upgraded
during processing when you store upgraded characters. You can get the old
behaviour by using C<use bytes>.

To be consistent with pack(), the C<C0> in unpack() templates indicates
that the data is to be processed in character mode, i.e. character by
character; at the contrary, C<U0> in unpack() indicates UTF-8 mode, where
the packed string is processed in its UTF-8-encoded Unicode form on a byte
by byte basis. This is reversed with regard to perl 5.8.X.

Moreover, C<C0> and C<U0> can also be used in pack() templates to specify
respectively character and byte modes.

C<C0> and C<U0> in the middle of a pack or unpack format now switch to the
specified encoding mode, honoring parens grouping. Previously, parens were
ignored.

Also, there is a new pack() character format, C<W>, which is intended to
replace the old C<C>. C<C> is kept for unsigned chars coded as bytes in
the strings internal representation. C<W> represents unsigned (logical)
character values, which can be greater than 255. It is therefore more
robust when dealing with potentially UTF-8-encoded data (as C<C> will wrap
values outside the range 0..255, and not respect the string encoding).

In practice, that means that pack formats are now encoding-neutral, except
C<C>.

For consistency, C<A> in unpack() format now trims all Unicode whitespace
from the end of the string. Before perl 5.9.2, it used to strip only the
classical ASCII space characters.

=head2 The -C option can no longer be used on the #! line

It wasn't working anyway.

=head1 Core Enhancements

=head2 Regexp debug flags

A new variable, ${^RE_DEBUG_FLAGS}, controls what debug flags are in
effect for the regular expression engine when running under C<use re
"debug">. See L<re> for details.

=head2 Byte-order modifiers for pack() and unpack()

There are two new byte-order modifiers, C<E<gt>> (big-endian) and C<E<lt>>
(little-endian), that can be appended to most pack() and unpack() template
characters and groups to force a certain byte-order for that type or group.
See L<perlfunc/pack> and L<perlpacktut> for details.

=head1 Modules and Pragmata

=head2 New modules

=over 4

=item *

C<Module::CoreList>, by Richard Clamp, is a small handy module that tells
you what versions of core modules ship with any versions of Perl 5. It
comes with a command-line frontend, C<corelist>.

=back

=head2 Updated And Improved Modules and Pragmata

=head1 Utility Changes

=over 4

=item *

The C<corelist> utility is now installed with perl (see L</"New Modules">
above).

=item *

C<h2ph> and C<h2xs> have been made a bit more robust with regard to
"modern" C code.

=back

=head1 Documentation

=head1 Performance Enhancements

=head2 Trie optimization for regexp engine

The regexp engine is now able to factorize common prefixes and suffixes in
regular expressions. A new special variable, ${^RE_TRIE_MAXBUF}, has been
added to fine tune this optimization.

=head1 Installation and Configuration Improvements

=head1 Selected Bug Fixes

=head1 New or Changed Diagnostics

The warning I<Newline in left-justified string> has been removed.

The error I<Too late for "-T" option> has been reformulated to be more
descriptive.

There is a new compilation error, I<Illegal declaration of subroutine>.

=head1 Changed Internals

=head1 Known Problems

=head2 Platform Specific Problems

=head1 Reporting Bugs

If you find what you think is a bug, you might check the articles
recently posted to the comp.lang.perl.misc newsgroup and the perl
bug database at http://bugs.perl.org/ .  There may also be
information at http://www.perl.org/ , the Perl Home Page.

If you believe you have an unreported bug, please run the B<perlbug>
program included with your release.  Be sure to trim your bug down
to a tiny but sufficient test case.  Your bug report, along with the
output of C<perl -V>, will be sent off to perlbug@perl.org to be
analysed by the Perl porting team.

=head1 SEE ALSO

The F<Changes> file for exhaustive details on what changed.

The F<INSTALL> file for how to build Perl.

The F<README> file for general stuff.

The F<Artistic> and F<Copying> files for copyright information.

=cut