summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRicardo Signes <rjbs@cpan.org>2012-03-05 21:25:31 -0500
committerRicardo Signes <rjbs@cpan.org>2012-03-05 21:25:31 -0500
commit2a9794a2818bab2d883a7bc5854aa6687ac635b0 (patch)
tree6a64fd83ddf2af4c7a5f81a3fdcd6f796b04ce11
parent31cb72072b6370cf13dcfeb8defe9578960f6629 (diff)
downloadperl-rjbs/delta.tar.gz
begin filling the 5.16.0 delta from 5.15.8rjbs/delta
This is largely a copy and paste job.
-rw-r--r--Porting/perl5160delta.pod319
1 files changed, 318 insertions, 1 deletions
diff --git a/Porting/perl5160delta.pod b/Porting/perl5160delta.pod
index 71d86648ef..e55b6dfa30 100644
--- a/Porting/perl5160delta.pod
+++ b/Porting/perl5160delta.pod
@@ -160,6 +160,87 @@ $tied_variable)>.
=head2 Unicode Support
+=head3 Supports (I<almost>) Unicode 6.1
+
+Besides the addition of whole new scripts, and new characters in
+existing scripts, this new version of Unicode, as always, makes some
+changes to existing characters. One change that may trip up some
+applications is that the General Category of two characters in the
+Latin-1 range, PILCROW SIGN and SECTION SIGN, has been changed from
+Other_Symbol to Other_Punctuation. The same change has been made for
+a character in each of Tibetan, Ethiopic, and Aegean.
+The code points U+3248..U+324F (CIRCLED NUMBER TEN ON BLACK SQUARE
+through CIRCLED NUMBER EIGHTY ON BLACK SQUARE) have had their General
+Category changed from Other_Symbol to Other_Numeric. The Line Break
+property has changes for Hebrew and Japanese; and as a consequence of
+other changes in 6.1, the Perl regular expression construct C<\X> now
+works differently for some characters in Thai and Lao.
+
+New aliases (synonyms) have been defined for many property values;
+these, along with the previously existing ones, are all cross indexed in
+L<perluniprops>.
+
+The return value of C<charnames::viacode()> is affected by other
+changes:
+
+ Code point Old Name New Name
+ U+000A LINE FEED (LF) LINE FEED
+ U+000C FORM FEED (FF) FORM FEED
+ U+000D CARRIAGE RETURN (CR) CARRIAGE RETURN
+ U+0085 NEXT LINE (NEL) NEXT LINE
+ U+008E SINGLE-SHIFT 2 SINGLE-SHIFT-2
+ U+008F SINGLE-SHIFT 3 SINGLE-SHIFT-3
+ U+0091 PRIVATE USE 1 PRIVATE USE-1
+ U+0092 PRIVATE USE 2 PRIVATE USE-2
+ U+2118 SCRIPT CAPITAL P WEIERSTRASS ELLIPTIC FUNCTION
+
+Perl will accept any of these names as input, but
+C<charnames::viacode()> now returns the new name of each pair. The
+change for U+2118 is considered by Unicode to be a correction, that is
+the original name was a mistake (but again, it will remain forever valid
+to use it to refer to U+2118). But most of these changes are the
+fallout of the mistake Unicode 6.0 made in naming a character used in
+Japanese cell phones to be "BELL", which conflicts with the long
+standing industry use of (and Unicode's recommendation to use) that name
+to mean the ASCII control character at U+0007. As a result, that name
+has been deprecated in Perl since v5.14; and any use of it will raise a
+warning message (unless turned off). The name "ALERT" is now the
+preferred name for this code point, with "BEL" being an acceptable short
+form. The name for the new cell phone character, at code point U+1F514,
+remains undefined in this version of Perl (hence we don't quite
+implement all of Unicode 6.1), but starting in v5.18, BELL will mean
+this character, and not U+0007.
+
+Unicode has taken steps to make sure that this sort of mistake does not
+happen again. The Standard now includes all the generally accepted
+names and abbreviations for control characters, whereas previously it
+didn't (though there were recommended names for most of them, which Perl
+used). This means that most of those recommended names are now
+officially in the Standard. Unicode did not recommend names for the
+four code points listed above between U+008E and U+008F, and in
+standardizing them Unicode subtly changed the names that Perl had
+previously given them, by replacing the final blank in each name by a
+hyphen. Unicode also officially accepts names that Perl had deprecated,
+such as FILE SEPARATOR. Now the only deprecated name is BELL.
+Finally, Perl now uses the new official names instead of the old
+(now considered obsolete) names for the first four code points in the
+list above (the ones which have the parentheses in them).
+
+Now that the names have been placed in the Unicode standard, these kinds
+of changes should not happen again, though corrections, such as to
+U+2118, are still possible.
+
+Unicode also added some name abbreviations, which Perl now accepts:
+SP for SPACE;
+TAB for CHARACTER TABULATION;
+NEW LINE, END OF LINE, NL, and EOL for LINE FEED;
+LOCKING-SHIFT ONE for SHIFT OUT;
+LOCKING-SHIFT ZERO for SHIFT IN;
+and ZWNBSP for ZERO WIDTH NO-BREAK SPACE.
+
+More details on this version of Unicode are provided in
+L<http://www.unicode.org/versions/Unicode6.1.0/>.
+
=head3 C<use charnames> is no longer needed for C<\N{I<name>}>
When C<\N{I<name>}> is encountered, the C<charnames> module is now
@@ -246,7 +327,35 @@ Also, single-character Unicode punctuation variables (like $‰) are now
supported [perl #69032]. They are also supported with C<our> and C<my>,
but that is a mistake that will be fixed before 5.16.
-=head2 The Unicode C<Script_Extensions> property is now supported.
+=head3 Improved ability to mix locales and Unicode, including UTF-8 locales
+
+An optional parameter has been added to C<use locale>
+
+ use locale ':not_characters';
+
+which tells Perl to use all but the C<LC_CTYPE> and C<LC_COLLATE>
+portions of the current locale. Instead, the character set is assumed
+to be Unicode. This allows locales and Unicode to be seamlessly mixed,
+including the increasingly frequent UTF-8 locales. When using this
+hybrid form of locales, the C<:locale> layer to the L<open> pragma can
+be used to interface with the file system, and there are CPAN modules
+available for ARGV and environment variable conversions.
+
+Full details are in L<perllocale>.
+
+=head3 New function C<fc> and corresponding escape sequence C<\F> for Unicode foldcase
+
+Unicode foldcase is an extension to lowercase that gives better results
+when comparing two strings case-insensitively. It has long been used
+internally in regular expression C</i> matching. Now it is available
+explicitly through the new C<fc> function call (enabled by
+S<C<"use feature 'fc'">>, or C<use v5.16>, or explicitly callable via
+C<CORE::fc>) or through the new C<\F> sequence in double-quotish
+strings.
+
+Full details are in L<perlfunc/fc>.
+
+=head3 The Unicode C<Script_Extensions> property is now supported.
New in Unicode 6.0, this is an improved C<Script> property. Details
are in L<perlunicode/Scripts>.
@@ -275,6 +384,12 @@ string. This cannot be fixed without changing its API. It is not
called from CPAN. The documentation now describes how to use it
safely.
+=head3 Added C<is_utf8_char_buf()>
+
+This function is designed to replace the deprecated L</is_utf8_char()>
+function. It includes an extra parameter to make sure it doesn't read
+past the end of the input buffer.
+
=head3 Other C<is_utf8_foo()> functions, as well as C<utf8_to_foo()>, etc.
Most of the other XS-callable functions that take UTF-8 encoded input
@@ -397,8 +512,20 @@ The C<__FILE__>, C<__LINE__> and C<__PACKAGE__> tokens can now be written
with an empty pair of parentheses after them. This makes them parse the
same way as C<time>, C<fork> and other built-in functions.
+=head2 C<_> in subroutine prototypes
+
+The C<_> character in subroutine prototypes is now allowed before C<@> or
+C<%>.
+
=head1 Security
+=head2 Use C<is_utf8_char_buf()> and not C<is_utf8_char()>
+
+The latter function is now deprecated because its API is insufficient to
+guarantee that it doesn't read (up to 12 bytes in the worst case) beyond
+the end of its input string. See
+L<is_utf8_char_buf()|/Added is_utf8_char_buf()>.
+
=head2 C<File::Glob::bsd_glob()> memory error with GLOB_ALTDIRFUNC (CVE-2011-2728).
Calling C<File::Glob::bsd_glob> with the unsupported flag
@@ -430,6 +557,12 @@ file most likely for applications to have used is
F<lib/unicore/ToDigit.pl>. L<Unicode::UCD/prop_invmap()> can be used to
get at its data instead.
+=head2 C<is_utf8_char()>
+
+This function is deprecated because it could read beyond the end of the
+input string. Use the new L<is_utf8_char_buf()|/Added is_utf8_char_buf()>
+instead.
+
=head1 Future Deprecations
This section serves as a notice of feature that are I<likely> to be
@@ -536,6 +669,29 @@ sfio, stdio
=head1 Incompatible Changes
+=head2 Special blocks called in void context
+
+Special blocks (C<BEGIN>, C<CHECK>, C<INIT>, C<UNITCHECK>, C<END>) are now
+called in void context. This avoids wasteful copying of the result of the
+last statement [perl #108794].
+
+=head2 The C<overloading> pragma and regexp objects
+
+With C<no overloading>, regular expression objects returned by C<qr//> are
+now stringified as "Regexp=REGEXP(0xbe600d)" instead of the regular
+expression itself [perl #108780].
+
+=head2 Two XS typemap Entries removed
+
+Two presumably unused XS typemap entries have been removed from the
+core typemap: T_DATAUNIT and T_CALLBACK. If you are, against all odds,
+a user of these, please see the instructions on how to regain them
+in L<perlxstypemap>.
+
+=head2 Unicode 6.1 has incompatibilities with Unicode 6.0
+
+These are detailed in L</Supports (almost) Unicode 6.1> above.
+
=head2 Borland compiler
All support for the Borland compiler has been dropped. The code had not
@@ -626,6 +782,48 @@ Code that depends on the caching behavior will break. As described in
L</Core Enhancements>, C<$$> is now writable, but it will be reset during a
fork.
+=head2 C<$$> and C<getppid()> no longer emulate POSIX semantics under LinuxThreads
+
+The POSIX emulation of C<$$> and C<getppid()> under the obsolete
+LinuxThreads implementation has been removed (the C<$$> emulation was
+actually removed in v5.15.0). This only impacts users of Linux 2.4 and
+users of Debian GNU/kFreeBSD up to and including 6.0, not the vast
+majority of Linux installations that use NPTL threads.
+
+This means that C<getppid()> like C<$$> is now always guaranteed to
+return the OS's idea of the current state of the process, not perl's
+cached version of it.
+
+See the documentation for L<$$|perlvar/$$> for details.
+
+=head2 C<< $< >>, C<< $> >>, C<$(> and C<$)> are no longer cached
+
+Similarly to the changes to C<$$> and C<getppid()> the internal
+caching of C<< $< >>, C<< $> >>, C<$(> and C<$)> has been removed.
+
+When we cached these values our idea of what they were would drift out
+of sync with reality if someone (e.g. someone embedding perl) called
+sete?[ug]id() without updating C<PL_e?[ug]id>. Having to deal with
+this complexity wasn't worth it given how cheap the C<gete?[ug]id()>
+system call is.
+
+This change will break a handful of CPAN modules that use the XS-level
+C<PL_uid>, C<PL_gid>, C<PL_euid> or C<PL_egid> variables.
+
+The fix for those breakages is to use C<PerlProc_gete?[ug]id()> to
+retrieve them (e.g. C<PerlProc_getuid()>), and not to assign to
+C<PL_e?[ug]id> if you change the UID/GID/EUID/EGID. There is no longer
+any need to do so since perl will always retrieve the up-to-date
+version of those values from the OS.
+
+=head2 Which Non-ASCII characters get quoted by C<quotemeta> and C<\Q> has changed
+
+This is unlikely to result in a real problem, as Perl does not attach
+special meaning to any non-ASCII character, so it is currently
+irrelevant which are quoted or not. This change fixes bug [perl #77654] and
+bring Perl's behavior more into line with Unicode's recommendations.
+See L<perlfunc/quotemeta>.
+
=head1 Performance Enhancements
=over
@@ -763,6 +961,13 @@ in v5.17.0.
L<arybase> -- this new module implements the C<$[> variable.
+=item *
+
+C<PerlIO::mmap> 0.010 has been added to the Perl core.
+
+The C<mmap> PerlIO layer is no longer implemented by perl itself, but has
+been moved out into the new L<PerlIO::mmap> module.
+
=back
=head2 Updated Modules and Pragmata
@@ -813,6 +1018,12 @@ Perl. It is still a work in progress.
This a new OO tutorial. It focuses on basic OO concepts, and then recommends
that readers choose an OO framework from CPAN.
+=head3 L<perlxstypemap>
+
+The new manual describes the XS typemapping mechanism in unprecedented
+detail and combines new documentation with information extracted from
+L<perlxs> and the previously unofficial list of all core typemaps.
+
=head2 Changes to Existing Documentation
=head3 L<perlapi>
@@ -1455,6 +1666,21 @@ XXX
=head2 Platform-Specific Notes
+=head3 Cygwin
+
+=over 4
+
+=item *
+
+Since version 1.7, Cygwin supports native UTF-8 paths. If Perl is built
+under that environment, directory and filenames will be UTF-8 encoded.
+
+Cygwin does not initialize all original Win32 environment variables. See
+F<README.cygwin> for a discussion of C<Cygwin::sync_winenv()> and
+further links.
+
+=back
+
=head3 VMS
=over 4
@@ -1479,6 +1705,13 @@ otherwise-identical filename containing a dot in the same position
(e.g., t/test_pl as a directory and t/test.pl as a file). This problem
has been corrected.
+=item *
+
+The build on VMS now allows names of the resulting symbols in C code for
+Perl longer than 31 characters. Symbols like
+C<Perl__it_was_the_best_of_times_it_was_the_worst_of_times> can now be
+created freely without causing the VMS linker to seize up.
+
=back
=head3 GNU/Hurd
@@ -1554,6 +1787,31 @@ All the C files that make up the Perl core have been converted to UTF-8.
=item *
+C</[[:ascii:]]/> and C</[[:blank:]]/> now use locale rules under
+C<use locale> when the platform supports that. Previously, they used
+the platform's native character set.
+
+=item *
+
+C</.*/g> would sometimes refuse to match at the end of a string that ends
+with "\n". This has been fixed [perl #109206].
+
+=item *
+
+C<m/[[:ascii:]]/i> and C</\p{ASCII}/i> now match identically (when not
+under a differing locale). This fixes a regression introduced in 5.14
+in which the first expression could match characters outside of ASCII,
+such as the KELVIN SIGN.
+
+=item *
+
+Starting with 5.12.0, Perl used to get its internal bookkeeping muddled up
+after assigning C<${ qr// }> to a hash element and locking it with
+L<Hash::Util>. This could result in double frees, crashes or erratic
+behaviour.
+
+=item *
+
The new (in 5.14.0) regular expression modifier C</a> when repeated like
C</aa> forbids the characters outside the ASCII range that match
characters inside that range from matching under C</i>. This did not
@@ -1604,6 +1862,13 @@ LATIN SMALL LIGATURE ST.
Fixed memory leak regression in regular expression compilation
under threading
+=item *
+
+A regression introduced in 5.13.6 was fixed. This involved an inverted
+bracketed character class in a regular expression that consisted solely
+of a Unicode property, that property wasn't getting inverted outside the
+Latin1 range.
+
=back
=head2 Formats
@@ -1741,6 +2006,13 @@ Applying the C<:lvalue> attribute to an XSUB or to an aliased subroutine
stub with C<< sub foo :lvalue; >> syntax stopped working in Perl 5.12.
This has been fixed.
+=item *
+
+Method calls whose arguments were all surrounded with C<my()> or C<our()>
+(as in C<< $object->method(my($a,$b)) >>) used to force lvalue context on
+the subroutine. This would prevent lvalue methods from returning certain
+values.
+
=back
=head2 Fixes related to hashes
@@ -1849,6 +2121,28 @@ happened.
=item *
+C<stat _> no longer warns about unopened filehandles [perl #71002].
+
+=item *
+
+C<stat> on an unopened filehandle now warns consistently, instead of
+skipping the warning at times.
+
+=item *
+
+C<-t> now works when stacked with other filetest operators [perl #77388].
+
+=item *
+
+Stacked filetest operators now only call FETCH once on a tied argument.
+
+=item *
+
+C<~~> now correctly handles the precedence of Any~~Object, and is not tricked
+by an overloaded object on the left-hand side.
+
+=item *
+
Tying C<%^H>
Tying C<%^H> no longer causes perl to crash or ignore the contents of
@@ -1856,6 +2150,16 @@ C<%^H> when entering a compilation scope [perl #106282].
=item *
+C<quotemeta> now quotes consistently the same non-ASCII characters under
+C<use feature 'unicode_strings'>, regardless of whether the string is
+encoded in UTF-8 or not, hence fixing the last vestiges (we hope) of the
+infamous L<perlunicode/The "Unicode Bug">. [perl #77654].
+
+Which of these code points is quoted has changed, based on Unicode's
+recommendations. See L<perlfunc/quotemeta> for details.
+
+=item *
+
C<~> on vstrings
The bitwise complement operator (and possibly other operators, too) when
@@ -3175,6 +3479,19 @@ C<splice()> doesn't warn when truncating
You can now limit the size of an array using C<splice(@a,MAX_LEN)> without
worrying about warnings.
+=item *
+
+The C<SvPVutf8> C function no longer tries to modify its argument,
+resulting in errors [perl #108994].
+
+=item *
+
+C<SvPVutf8> now works properly with magical variables.
+
+=item *
+
+C<SvPVbyte> now works properly non-PVs.
+
=back
=head1 Known Problems