diff options
author | Karl Williamson <khw@cpan.org> | 2015-12-03 13:27:21 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2015-12-09 23:43:22 -0700 |
commit | 3bfc1e7044659f9ec4cc4f1bc9eea7a8b00061fb (patch) | |
tree | ec472afda925dc0c5b5f0f45b48aadb3e241061c /unicode_constants.h | |
parent | 36eaa8111efe6b0ebe974f6b26ed667c1206dc9f (diff) | |
download | perl-3bfc1e7044659f9ec4cc4f1bc9eea7a8b00061fb.tar.gz |
Skip casing for high code points
As discussed in the previous commit, most code points in Unicode
don't change if upper-, or lower-cased, etc. In fact as of Unicode
v8.0, 93% of the available code points are above the highest one that
does change.
This commit skips trying to case these 93%. A regen/ script keeps track
of the max changing one in the current Unicode release, and skips casing
for the higher ones. Thus currently, casing emoji will be skipped.
Together with the previous commits that dealt with casing, the potential
for huge memory requirements for the swash hashes for casing are
severely limited.
If the following command is run on a perl compiled with -O2 and no
DEBUGGING:
blead Porting/bench.pl --raw --perlargs="-Ilib -X" --benchfile=plane1_case_perf /path_to_prior_perl=before_this_commit /path_to_new_perl=after
and the file 'plane1_case_perf' contains
[
'string::casing::emoji' => {
desc => 'yes swash vs no swash',
setup => 'my $a = "\x{1F570}"', # MANTELPIECE CLOCK
code => 'uc($a)'
},
];
the following results are obtained:
The numbers represent raw counts per loop iteration.
string::casing::emoji
yes swash vs no swash
before_this_commit after
------------------ --------
Ir 981.0 306.0
Dr 228.0 94.0
Dw 100.0 45.0
COND 137.0 49.0
IND 7.0 4.0
COND_m 5.5 0.0
IND_m 4.0 2.0
Ir_m1 0.1 -0.1
Dr_m1 0.0 0.0
Dw_m1 0.0 0.0
Ir_mm 0.0 0.0
Dr_mm 0.0 0.0
Dw_mm 0.0 0.0
Diffstat (limited to 'unicode_constants.h')
-rw-r--r-- | unicode_constants.h | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/unicode_constants.h b/unicode_constants.h index 71755de7f6..1384873f19 100644 --- a/unicode_constants.h +++ b/unicode_constants.h @@ -182,6 +182,9 @@ /* The number of code points not matching \pC */ #define NON_OTHER_COUNT_FOR_USE_ONLY_BY_REGCOMP_DOT_C 120522 +/* The highest code point that has any type of case change */ +#define HIGHEST_CASE_CHANGING_CP_FOR_USE_ONLY_BY_UTF8_DOT_C 0x118DF + #endif /* H_UNICODE_CONSTANTS */ /* ex: set ro: */ |