summaryrefslogtreecommitdiff
path: root/lib
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2012-05-15 21:45:56 -0600
committerKarl Williamson <public@khwilliamson.com>2012-06-02 08:29:22 -0600
commit0ff33a848afab2dffc78f54d45e8bc65c39859b9 (patch)
tree46a324b8f97ab6111a24fcd23181ab018fd04884 /lib
parentfc862497326763ca3aa354e81498c1aed28775c1 (diff)
downloadperl-0ff33a848afab2dffc78f54d45e8bc65c39859b9.tar.gz
mktables: Remove early Unicode defective \p{Alpha=Y}
The \p{Alphabetic=y} property was not defined in all Unicode releases; however in some of those early ones, there was a data file that contained a definition for it, and prior to this patch, mktables used that definition to construct a \p{Alphabetic=y} table. However, it turns out that the definition is quite defective in many of the releases it occurred in. So rather than mislead code into thinking there is a good definition of that property for the early releases, this just doesn't generate a table for it. But, prior commits have created a good definition for the Perl single-form extensions \p{Alpha} and \p{Alphabetic}, and most code uses those anyway.
Diffstat (limited to 'lib')
-rw-r--r--lib/unicore/mktables43
1 files changed, 22 insertions, 21 deletions
diff --git a/lib/unicore/mktables b/lib/unicore/mktables
index 5e0fc2592e..dd95457d2f 100644
--- a/lib/unicore/mktables
+++ b/lib/unicore/mktables
@@ -9113,7 +9113,6 @@ END
# This first set is in the original old-style proplist.
push @return, split /\n/, <<'END';
-Alpha ; Alphabetic
Bidi_C ; Bidi_Control
Dash ; Dash
Dia ; Diacritic
@@ -9184,6 +9183,7 @@ END
}
if (-e 'DCoreProperties.txt') {
push @return, split /\n/, <<'END';
+Alpha ; Alphabetic
IDS ; ID_Start
XIDC ; XID_Continue
XIDS ; XID_Start
@@ -11717,26 +11717,26 @@ sub filter_blocks_lines {
# PropList.txt has been in Unicode since version 2.0. Until 3.1, it
# was in a completely different syntax. Ken Whistler of Unicode says
# that it was something he used as an aid for his own purposes, but
- # was never an official part of the standard. However, comments in
- # DAge.txt indicate that non-character code points were available in
- # the UCD as of 3.1. It is unclear to me (khw) how they could be
- # there except through this file (but on the other hand, they first
- # appeared there in 3.0.1), so maybe it was part of the UCD, and maybe
- # not. But the claim is that it was published as an aid to others who
- # might want some more information than was given in the official UCD
- # of the time. Many of the properties in it were incorporated into
- # the later PropList.txt, but some were not. This program uses this
- # early file to generate property tables that are otherwise not
- # accessible in the early UCD's, and most were probably not really
- # official at that time, so one could argue that it should be ignored,
- # and you can easily modify things to skip this. And there are bugs
- # in this file in various versions. (For example, the 2.1.9 version
- # removes from Alphabetic the CJK range starting at 4E00, and they
- # weren't added back in until 3.1.0.) Many of this file's properties
- # were later sanctioned, so this code generates tables for those
- # properties that aren't otherwise in the UCD of the time but
- # eventually did become official, and throws away the rest. Here is a
- # list of all the ones that are thrown away:
+ # was never an official part of the standard. Many of the properties
+ # in it were incorporated into the later PropList.txt, but some were
+ # not. This program uses this early file to generate property tables
+ # that are otherwise not accessible in the early UCD's. It does this
+ # for the ones that eventually became official, and don't appear to be
+ # too different in their contents from the later official version, and
+ # throws away the rest. It could be argued that the ones it generates
+ # were probably not really official at that time, so should be
+ # ignored. You can easily modify things to skip all of them by
+ # changing this function to just set $_ to "", and return; and to skip
+ # certain of them by by simply removing their declarations from
+ # get_old_property_aliases().
+ #
+ # Here is a list of all the ones that are thrown away:
+ # Alphabetic The definitions for this are very
+ # defective, so better to not mislead
+ # people into thinking it works.
+ # Instead the Perl extension of the
+ # same name is constructed from first
+ # principles.
# Bidi=* duplicates UnicodeData.txt
# Combining never made into official property;
# is \P{ccc=0}
@@ -12878,6 +12878,7 @@ sub compile_perl() {
}
}
$Alpha->add_description('Alphabetic');
+ $Alpha->add_alias('Alphabetic');
}
$Alpha->add_alias('XPosixAlpha');
my $Posix_Alpha = $perl->add_match_table("PosixAlpha",