diff options
author | Karl Williamson <public@khwilliamson.com> | 2012-07-19 21:53:06 -0600 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2012-07-24 21:13:49 -0600 |
commit | f4cdb42cbeec255c36ec708cc36f0243138d9645 (patch) | |
tree | 1360a88dc85ec0762a204a210aafb4b15eb04001 /regen | |
parent | 430b7c7009472449aece84a5288fb71719d8418f (diff) | |
download | perl-f4cdb42cbeec255c36ec708cc36f0243138d9645.tar.gz |
handy.h: Free up bits in PL_charclass[]
This array is a bit map containing the Posix and similar character
classes for the first 256 code points. Prior to this commit many
character classes were represented by two bits, one for characters that
are in it over the full Latin-1 range, and one for just the ASCII
characters that are in it. The number of bits in use was approaching
the 32-bit limit available without playing games.
This commit takes advantage of a recent commit that adds a bit to the
table for all the ASCII characters, and the fact that the ASCII
characters in a character class are a subset of the full Latin1
range. So, iff both the full-range character class bit and the ASCII
bit is set is that character an ASCII-range character with the given
character class.
A new internal macro is created to generate code to determine if a
character is an ASCII range character with the given class. It's not
clear if the generated code is faster or slower than the full range
version.
The result is that nearly half the bits are freed up, as the ones for
the ASCII-range are now redundant.
Diffstat (limited to 'regen')
-rw-r--r-- | regen/mk_PL_charclass.pl | 43 |
1 files changed, 15 insertions, 28 deletions
diff --git a/regen/mk_PL_charclass.pl b/regen/mk_PL_charclass.pl index eccb0e85a4..77691901ee 100644 --- a/regen/mk_PL_charclass.pl +++ b/regen/mk_PL_charclass.pl @@ -22,37 +22,24 @@ require 'regen/regen_lib.pl'; # new Unicode release, to make sure things haven't been changed by it. my @properties = qw( - ALNUMC_A - ALNUMC_L1 - ALPHA_A - ALPHA_L1 + ALNUMC + ALPHA ASCII - BLANK_A - BLANK_L1 + BLANK CHARNAME_CONT - CNTRL_A - CNTRL_L1 - DIGIT_A - GRAPH_A - GRAPH_L1 - IDFIRST_A - IDFIRST_L1 - LOWER_A - LOWER_L1 - PRINT_A - PRINT_L1 - PSXSPC_A - PSXSPC_L1 - PUNCT_A - PUNCT_L1 - SPACE_A - SPACE_L1 - UPPER_A - UPPER_L1 - WORDCHAR_A - WORDCHAR_L1 - XDIGIT_A + CNTRL + DIGIT + GRAPH + IDFIRST + LOWER + PRINT + PSXSPC + PUNCT QUOTEMETA + SPACE + UPPER + WORDCHAR + XDIGIT ); # Read in the case fold mappings. |