summaryrefslogtreecommitdiff
path: root/regen
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2012-07-19 21:53:06 -0600
committerKarl Williamson <public@khwilliamson.com>2012-07-24 21:13:49 -0600
commitf4cdb42cbeec255c36ec708cc36f0243138d9645 (patch)
tree1360a88dc85ec0762a204a210aafb4b15eb04001 /regen
parent430b7c7009472449aece84a5288fb71719d8418f (diff)
downloadperl-f4cdb42cbeec255c36ec708cc36f0243138d9645.tar.gz
handy.h: Free up bits in PL_charclass[]
This array is a bit map containing the Posix and similar character classes for the first 256 code points. Prior to this commit many character classes were represented by two bits, one for characters that are in it over the full Latin-1 range, and one for just the ASCII characters that are in it. The number of bits in use was approaching the 32-bit limit available without playing games. This commit takes advantage of a recent commit that adds a bit to the table for all the ASCII characters, and the fact that the ASCII characters in a character class are a subset of the full Latin1 range. So, iff both the full-range character class bit and the ASCII bit is set is that character an ASCII-range character with the given character class. A new internal macro is created to generate code to determine if a character is an ASCII range character with the given class. It's not clear if the generated code is faster or slower than the full range version. The result is that nearly half the bits are freed up, as the ones for the ASCII-range are now redundant.
Diffstat (limited to 'regen')
-rw-r--r--regen/mk_PL_charclass.pl43
1 files changed, 15 insertions, 28 deletions
diff --git a/regen/mk_PL_charclass.pl b/regen/mk_PL_charclass.pl
index eccb0e85a4..77691901ee 100644
--- a/regen/mk_PL_charclass.pl
+++ b/regen/mk_PL_charclass.pl
@@ -22,37 +22,24 @@ require 'regen/regen_lib.pl';
# new Unicode release, to make sure things haven't been changed by it.
my @properties = qw(
- ALNUMC_A
- ALNUMC_L1
- ALPHA_A
- ALPHA_L1
+ ALNUMC
+ ALPHA
ASCII
- BLANK_A
- BLANK_L1
+ BLANK
CHARNAME_CONT
- CNTRL_A
- CNTRL_L1
- DIGIT_A
- GRAPH_A
- GRAPH_L1
- IDFIRST_A
- IDFIRST_L1
- LOWER_A
- LOWER_L1
- PRINT_A
- PRINT_L1
- PSXSPC_A
- PSXSPC_L1
- PUNCT_A
- PUNCT_L1
- SPACE_A
- SPACE_L1
- UPPER_A
- UPPER_L1
- WORDCHAR_A
- WORDCHAR_L1
- XDIGIT_A
+ CNTRL
+ DIGIT
+ GRAPH
+ IDFIRST
+ LOWER
+ PRINT
+ PSXSPC
+ PUNCT
QUOTEMETA
+ SPACE
+ UPPER
+ WORDCHAR
+ XDIGIT
);
# Read in the case fold mappings.