From 2796c109dc2c56e2241410992d78bd8e0cccd71f Mon Sep 17 00:00:00 2001 From: Jarkko Hietaniemi Date: Wed, 4 Jul 2001 01:32:11 +0000 Subject: Support preferentially the Unicode 'scripts' definition in the \p{In...} notation since according to Unicode the scripts concept is more natural for matching than using the somewhat artificial block names. The block names are still available, though, and if there's a name conflict, the scripts one wins and the blocks one has to do with 'Block' appended to its name. For more information see http://www.unicode.org/unicode/reports/tr24/ p4raw-id: //depot/perl@11132 --- pod/perl572delta.pod | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'pod/perl572delta.pod') diff --git a/pod/perl572delta.pod b/pod/perl572delta.pod index 2800cf85dc..1ff8436508 100644 --- a/pod/perl572delta.pod +++ b/pod/perl572delta.pod @@ -49,6 +49,34 @@ statically built in. This may or may not be a problem with ancient TCP/IP stacks of VMS: we do not know since we weren't able to test Perl in such configurations. +=head2 Different Definition of the Unicode Character Classes \p{In...} + +As suggested by the Unicode consortium, the Unicode character classes +now prefer I as opposed to I (as defined by Unicode); +in Perl, when the C<\p{In....}> and the C<\p{In....}> regular expression +constructs are used. This has changed the definition of some of those +character classes. + +The difference between scripts and blocks is that scripts are the +glyphs used by a language or a group of languages, while the blocks +are more artificial groupings of 256 characters based on the Unicode +numbering. + +In general this change results in more inclusive Unicode character +classes, but changes to the other direction also do take place: +for example while the script C includes all the Latin +characters and their various diacritic-adorned versions, it +does not include the various punctuation or digits (since they +are not solely C). + +Changes in the character class semantics may have happened if a script +and a block happen to have the same name, for example C. +In such cases the script wins and C<\p{InHebrew}> now means the script +definition of Hebrew. The block definition in still available, +though, by appending C to the name: C<\p{InHebrewBlock}> means +what C<\p{InHebrew}> meant in perl 5.6.0. For the full list +of affected character classes, see L. + =head2 Deprecations The current user-visible implementation of pseudo-hashes (the weird -- cgit v1.2.1