diff options
author | Ricardo Signes <rjbs@cpan.org> | 2012-03-05 21:25:31 -0500 |
---|---|---|
committer | Ricardo Signes <rjbs@cpan.org> | 2012-03-05 21:25:31 -0500 |
commit | 2a9794a2818bab2d883a7bc5854aa6687ac635b0 (patch) | |
tree | 6a64fd83ddf2af4c7a5f81a3fdcd6f796b04ce11 | |
parent | 31cb72072b6370cf13dcfeb8defe9578960f6629 (diff) | |
download | perl-rjbs/delta.tar.gz |
begin filling the 5.16.0 delta from 5.15.8rjbs/delta
This is largely a copy and paste job.
-rw-r--r-- | Porting/perl5160delta.pod | 319 |
1 files changed, 318 insertions, 1 deletions
diff --git a/Porting/perl5160delta.pod b/Porting/perl5160delta.pod index 71d86648ef..e55b6dfa30 100644 --- a/Porting/perl5160delta.pod +++ b/Porting/perl5160delta.pod @@ -160,6 +160,87 @@ $tied_variable)>. =head2 Unicode Support +=head3 Supports (I<almost>) Unicode 6.1 + +Besides the addition of whole new scripts, and new characters in +existing scripts, this new version of Unicode, as always, makes some +changes to existing characters. One change that may trip up some +applications is that the General Category of two characters in the +Latin-1 range, PILCROW SIGN and SECTION SIGN, has been changed from +Other_Symbol to Other_Punctuation. The same change has been made for +a character in each of Tibetan, Ethiopic, and Aegean. +The code points U+3248..U+324F (CIRCLED NUMBER TEN ON BLACK SQUARE +through CIRCLED NUMBER EIGHTY ON BLACK SQUARE) have had their General +Category changed from Other_Symbol to Other_Numeric. The Line Break +property has changes for Hebrew and Japanese; and as a consequence of +other changes in 6.1, the Perl regular expression construct C<\X> now +works differently for some characters in Thai and Lao. + +New aliases (synonyms) have been defined for many property values; +these, along with the previously existing ones, are all cross indexed in +L<perluniprops>. + +The return value of C<charnames::viacode()> is affected by other +changes: + + Code point Old Name New Name + U+000A LINE FEED (LF) LINE FEED + U+000C FORM FEED (FF) FORM FEED + U+000D CARRIAGE RETURN (CR) CARRIAGE RETURN + U+0085 NEXT LINE (NEL) NEXT LINE + U+008E SINGLE-SHIFT 2 SINGLE-SHIFT-2 + U+008F SINGLE-SHIFT 3 SINGLE-SHIFT-3 + U+0091 PRIVATE USE 1 PRIVATE USE-1 + U+0092 PRIVATE USE 2 PRIVATE USE-2 + U+2118 SCRIPT CAPITAL P WEIERSTRASS ELLIPTIC FUNCTION + +Perl will accept any of these names as input, but +C<charnames::viacode()> now returns the new name of each pair. The +change for U+2118 is considered by Unicode to be a correction, that is +the original name was a mistake (but again, it will remain forever valid +to use it to refer to U+2118). But most of these changes are the +fallout of the mistake Unicode 6.0 made in naming a character used in +Japanese cell phones to be "BELL", which conflicts with the long +standing industry use of (and Unicode's recommendation to use) that name +to mean the ASCII control character at U+0007. As a result, that name +has been deprecated in Perl since v5.14; and any use of it will raise a +warning message (unless turned off). The name "ALERT" is now the +preferred name for this code point, with "BEL" being an acceptable short +form. The name for the new cell phone character, at code point U+1F514, +remains undefined in this version of Perl (hence we don't quite +implement all of Unicode 6.1), but starting in v5.18, BELL will mean +this character, and not U+0007. + +Unicode has taken steps to make sure that this sort of mistake does not +happen again. The Standard now includes all the generally accepted +names and abbreviations for control characters, whereas previously it +didn't (though there were recommended names for most of them, which Perl +used). This means that most of those recommended names are now +officially in the Standard. Unicode did not recommend names for the +four code points listed above between U+008E and U+008F, and in +standardizing them Unicode subtly changed the names that Perl had +previously given them, by replacing the final blank in each name by a +hyphen. Unicode also officially accepts names that Perl had deprecated, +such as FILE SEPARATOR. Now the only deprecated name is BELL. +Finally, Perl now uses the new official names instead of the old +(now considered obsolete) names for the first four code points in the +list above (the ones which have the parentheses in them). + +Now that the names have been placed in the Unicode standard, these kinds +of changes should not happen again, though corrections, such as to +U+2118, are still possible. + +Unicode also added some name abbreviations, which Perl now accepts: +SP for SPACE; +TAB for CHARACTER TABULATION; +NEW LINE, END OF LINE, NL, and EOL for LINE FEED; +LOCKING-SHIFT ONE for SHIFT OUT; +LOCKING-SHIFT ZERO for SHIFT IN; +and ZWNBSP for ZERO WIDTH NO-BREAK SPACE. + +More details on this version of Unicode are provided in +L<http://www.unicode.org/versions/Unicode6.1.0/>. + =head3 C<use charnames> is no longer needed for C<\N{I<name>}> When C<\N{I<name>}> is encountered, the C<charnames> module is now @@ -246,7 +327,35 @@ Also, single-character Unicode punctuation variables (like $‰) are now supported [perl #69032]. They are also supported with C<our> and C<my>, but that is a mistake that will be fixed before 5.16. -=head2 The Unicode C<Script_Extensions> property is now supported. +=head3 Improved ability to mix locales and Unicode, including UTF-8 locales + +An optional parameter has been added to C<use locale> + + use locale ':not_characters'; + +which tells Perl to use all but the C<LC_CTYPE> and C<LC_COLLATE> +portions of the current locale. Instead, the character set is assumed +to be Unicode. This allows locales and Unicode to be seamlessly mixed, +including the increasingly frequent UTF-8 locales. When using this +hybrid form of locales, the C<:locale> layer to the L<open> pragma can +be used to interface with the file system, and there are CPAN modules +available for ARGV and environment variable conversions. + +Full details are in L<perllocale>. + +=head3 New function C<fc> and corresponding escape sequence C<\F> for Unicode foldcase + +Unicode foldcase is an extension to lowercase that gives better results +when comparing two strings case-insensitively. It has long been used +internally in regular expression C</i> matching. Now it is available +explicitly through the new C<fc> function call (enabled by +S<C<"use feature 'fc'">>, or C<use v5.16>, or explicitly callable via +C<CORE::fc>) or through the new C<\F> sequence in double-quotish +strings. + +Full details are in L<perlfunc/fc>. + +=head3 The Unicode C<Script_Extensions> property is now supported. New in Unicode 6.0, this is an improved C<Script> property. Details are in L<perlunicode/Scripts>. @@ -275,6 +384,12 @@ string. This cannot be fixed without changing its API. It is not called from CPAN. The documentation now describes how to use it safely. +=head3 Added C<is_utf8_char_buf()> + +This function is designed to replace the deprecated L</is_utf8_char()> +function. It includes an extra parameter to make sure it doesn't read +past the end of the input buffer. + =head3 Other C<is_utf8_foo()> functions, as well as C<utf8_to_foo()>, etc. Most of the other XS-callable functions that take UTF-8 encoded input @@ -397,8 +512,20 @@ The C<__FILE__>, C<__LINE__> and C<__PACKAGE__> tokens can now be written with an empty pair of parentheses after them. This makes them parse the same way as C<time>, C<fork> and other built-in functions. +=head2 C<_> in subroutine prototypes + +The C<_> character in subroutine prototypes is now allowed before C<@> or +C<%>. + =head1 Security +=head2 Use C<is_utf8_char_buf()> and not C<is_utf8_char()> + +The latter function is now deprecated because its API is insufficient to +guarantee that it doesn't read (up to 12 bytes in the worst case) beyond +the end of its input string. See +L<is_utf8_char_buf()|/Added is_utf8_char_buf()>. + =head2 C<File::Glob::bsd_glob()> memory error with GLOB_ALTDIRFUNC (CVE-2011-2728). Calling C<File::Glob::bsd_glob> with the unsupported flag @@ -430,6 +557,12 @@ file most likely for applications to have used is F<lib/unicore/ToDigit.pl>. L<Unicode::UCD/prop_invmap()> can be used to get at its data instead. +=head2 C<is_utf8_char()> + +This function is deprecated because it could read beyond the end of the +input string. Use the new L<is_utf8_char_buf()|/Added is_utf8_char_buf()> +instead. + =head1 Future Deprecations This section serves as a notice of feature that are I<likely> to be @@ -536,6 +669,29 @@ sfio, stdio =head1 Incompatible Changes +=head2 Special blocks called in void context + +Special blocks (C<BEGIN>, C<CHECK>, C<INIT>, C<UNITCHECK>, C<END>) are now +called in void context. This avoids wasteful copying of the result of the +last statement [perl #108794]. + +=head2 The C<overloading> pragma and regexp objects + +With C<no overloading>, regular expression objects returned by C<qr//> are +now stringified as "Regexp=REGEXP(0xbe600d)" instead of the regular +expression itself [perl #108780]. + +=head2 Two XS typemap Entries removed + +Two presumably unused XS typemap entries have been removed from the +core typemap: T_DATAUNIT and T_CALLBACK. If you are, against all odds, +a user of these, please see the instructions on how to regain them +in L<perlxstypemap>. + +=head2 Unicode 6.1 has incompatibilities with Unicode 6.0 + +These are detailed in L</Supports (almost) Unicode 6.1> above. + =head2 Borland compiler All support for the Borland compiler has been dropped. The code had not @@ -626,6 +782,48 @@ Code that depends on the caching behavior will break. As described in L</Core Enhancements>, C<$$> is now writable, but it will be reset during a fork. +=head2 C<$$> and C<getppid()> no longer emulate POSIX semantics under LinuxThreads + +The POSIX emulation of C<$$> and C<getppid()> under the obsolete +LinuxThreads implementation has been removed (the C<$$> emulation was +actually removed in v5.15.0). This only impacts users of Linux 2.4 and +users of Debian GNU/kFreeBSD up to and including 6.0, not the vast +majority of Linux installations that use NPTL threads. + +This means that C<getppid()> like C<$$> is now always guaranteed to +return the OS's idea of the current state of the process, not perl's +cached version of it. + +See the documentation for L<$$|perlvar/$$> for details. + +=head2 C<< $< >>, C<< $> >>, C<$(> and C<$)> are no longer cached + +Similarly to the changes to C<$$> and C<getppid()> the internal +caching of C<< $< >>, C<< $> >>, C<$(> and C<$)> has been removed. + +When we cached these values our idea of what they were would drift out +of sync with reality if someone (e.g. someone embedding perl) called +sete?[ug]id() without updating C<PL_e?[ug]id>. Having to deal with +this complexity wasn't worth it given how cheap the C<gete?[ug]id()> +system call is. + +This change will break a handful of CPAN modules that use the XS-level +C<PL_uid>, C<PL_gid>, C<PL_euid> or C<PL_egid> variables. + +The fix for those breakages is to use C<PerlProc_gete?[ug]id()> to +retrieve them (e.g. C<PerlProc_getuid()>), and not to assign to +C<PL_e?[ug]id> if you change the UID/GID/EUID/EGID. There is no longer +any need to do so since perl will always retrieve the up-to-date +version of those values from the OS. + +=head2 Which Non-ASCII characters get quoted by C<quotemeta> and C<\Q> has changed + +This is unlikely to result in a real problem, as Perl does not attach +special meaning to any non-ASCII character, so it is currently +irrelevant which are quoted or not. This change fixes bug [perl #77654] and +bring Perl's behavior more into line with Unicode's recommendations. +See L<perlfunc/quotemeta>. + =head1 Performance Enhancements =over @@ -763,6 +961,13 @@ in v5.17.0. L<arybase> -- this new module implements the C<$[> variable. +=item * + +C<PerlIO::mmap> 0.010 has been added to the Perl core. + +The C<mmap> PerlIO layer is no longer implemented by perl itself, but has +been moved out into the new L<PerlIO::mmap> module. + =back =head2 Updated Modules and Pragmata @@ -813,6 +1018,12 @@ Perl. It is still a work in progress. This a new OO tutorial. It focuses on basic OO concepts, and then recommends that readers choose an OO framework from CPAN. +=head3 L<perlxstypemap> + +The new manual describes the XS typemapping mechanism in unprecedented +detail and combines new documentation with information extracted from +L<perlxs> and the previously unofficial list of all core typemaps. + =head2 Changes to Existing Documentation =head3 L<perlapi> @@ -1455,6 +1666,21 @@ XXX =head2 Platform-Specific Notes +=head3 Cygwin + +=over 4 + +=item * + +Since version 1.7, Cygwin supports native UTF-8 paths. If Perl is built +under that environment, directory and filenames will be UTF-8 encoded. + +Cygwin does not initialize all original Win32 environment variables. See +F<README.cygwin> for a discussion of C<Cygwin::sync_winenv()> and +further links. + +=back + =head3 VMS =over 4 @@ -1479,6 +1705,13 @@ otherwise-identical filename containing a dot in the same position (e.g., t/test_pl as a directory and t/test.pl as a file). This problem has been corrected. +=item * + +The build on VMS now allows names of the resulting symbols in C code for +Perl longer than 31 characters. Symbols like +C<Perl__it_was_the_best_of_times_it_was_the_worst_of_times> can now be +created freely without causing the VMS linker to seize up. + =back =head3 GNU/Hurd @@ -1554,6 +1787,31 @@ All the C files that make up the Perl core have been converted to UTF-8. =item * +C</[[:ascii:]]/> and C</[[:blank:]]/> now use locale rules under +C<use locale> when the platform supports that. Previously, they used +the platform's native character set. + +=item * + +C</.*/g> would sometimes refuse to match at the end of a string that ends +with "\n". This has been fixed [perl #109206]. + +=item * + +C<m/[[:ascii:]]/i> and C</\p{ASCII}/i> now match identically (when not +under a differing locale). This fixes a regression introduced in 5.14 +in which the first expression could match characters outside of ASCII, +such as the KELVIN SIGN. + +=item * + +Starting with 5.12.0, Perl used to get its internal bookkeeping muddled up +after assigning C<${ qr// }> to a hash element and locking it with +L<Hash::Util>. This could result in double frees, crashes or erratic +behaviour. + +=item * + The new (in 5.14.0) regular expression modifier C</a> when repeated like C</aa> forbids the characters outside the ASCII range that match characters inside that range from matching under C</i>. This did not @@ -1604,6 +1862,13 @@ LATIN SMALL LIGATURE ST. Fixed memory leak regression in regular expression compilation under threading +=item * + +A regression introduced in 5.13.6 was fixed. This involved an inverted +bracketed character class in a regular expression that consisted solely +of a Unicode property, that property wasn't getting inverted outside the +Latin1 range. + =back =head2 Formats @@ -1741,6 +2006,13 @@ Applying the C<:lvalue> attribute to an XSUB or to an aliased subroutine stub with C<< sub foo :lvalue; >> syntax stopped working in Perl 5.12. This has been fixed. +=item * + +Method calls whose arguments were all surrounded with C<my()> or C<our()> +(as in C<< $object->method(my($a,$b)) >>) used to force lvalue context on +the subroutine. This would prevent lvalue methods from returning certain +values. + =back =head2 Fixes related to hashes @@ -1849,6 +2121,28 @@ happened. =item * +C<stat _> no longer warns about unopened filehandles [perl #71002]. + +=item * + +C<stat> on an unopened filehandle now warns consistently, instead of +skipping the warning at times. + +=item * + +C<-t> now works when stacked with other filetest operators [perl #77388]. + +=item * + +Stacked filetest operators now only call FETCH once on a tied argument. + +=item * + +C<~~> now correctly handles the precedence of Any~~Object, and is not tricked +by an overloaded object on the left-hand side. + +=item * + Tying C<%^H> Tying C<%^H> no longer causes perl to crash or ignore the contents of @@ -1856,6 +2150,16 @@ C<%^H> when entering a compilation scope [perl #106282]. =item * +C<quotemeta> now quotes consistently the same non-ASCII characters under +C<use feature 'unicode_strings'>, regardless of whether the string is +encoded in UTF-8 or not, hence fixing the last vestiges (we hope) of the +infamous L<perlunicode/The "Unicode Bug">. [perl #77654]. + +Which of these code points is quoted has changed, based on Unicode's +recommendations. See L<perlfunc/quotemeta> for details. + +=item * + C<~> on vstrings The bitwise complement operator (and possibly other operators, too) when @@ -3175,6 +3479,19 @@ C<splice()> doesn't warn when truncating You can now limit the size of an array using C<splice(@a,MAX_LEN)> without worrying about warnings. +=item * + +The C<SvPVutf8> C function no longer tries to modify its argument, +resulting in errors [perl #108994]. + +=item * + +C<SvPVutf8> now works properly with magical variables. + +=item * + +C<SvPVbyte> now works properly non-PVs. + =back =head1 Known Problems |