diff options
author | Karl Williamson <public@khwilliamson.com> | 2012-09-05 20:56:09 -0600 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2012-09-13 21:14:04 -0600 |
commit | 4d6461409e812aecb1fa745debb6132ce8e5612d (patch) | |
tree | 233a2c093d46c73bc151240415219e0e7ed41b11 /regen | |
parent | ae1d4929d23a3d6949518058aa41cd90a700a4af (diff) | |
download | perl-4d6461409e812aecb1fa745debb6132ce8e5612d.tar.gz |
utf8.h: Use machine generated IS_UTF8_CHAR()
This takes the output of regen/regcharclass.pl for all the 1-4 byte
UTF8-representations of Unicode code points, and replaces the current
hand-rolled definition there. It does this only for ASCII platforms,
leaving EBCDIC to be machine generated when run on such a platform.
I would rather have both versions to be regenerated each time it is
needed to save an EBCDIC dependency, but it takes more than 10 minutes
on my computer to process the 2 billion code points that have to be
checked for on ASCII platforms, and currently t/porting/regen.t runs
this program every times; and that slow down would be unacceptable. If
this is ever run under EBCDIC, the macro should be machine computed
(very slowly). So, even though there is an EBCDIC dependency, it has
essentially been solved.
Diffstat (limited to 'regen')
-rwxr-xr-x | regen/regcharclass.pl | 16 |
1 files changed, 16 insertions, 0 deletions
diff --git a/regen/regcharclass.pl b/regen/regcharclass.pl index c4f5951a3c..7d126428ef 100755 --- a/regen/regcharclass.pl +++ b/regen/regcharclass.pl @@ -1161,6 +1161,22 @@ GCB_V: Grapheme_Cluster_Break=V => UTF8 :fast \p{_X_GCB_V} +# This program was run with this enabled, and the results copied to utf8.h; +# then this was commented out because it takes so long to figure out these 2 +# million code points. The results would not change unless utf8.h decides it +# wants a maximum other than 4 bytes, or this program creates better +# optimizations +#UTF8_CHAR: Matches utf8 from 1 to 4 bytes +#=> UTF8 :safe only_ascii_platform +#0x0 - 0x1FFFFF + +# This hasn't been commented out, because we haven't an EBCDIC platform to run +# it on, and the 3 types of EBCDIC allegedly supported by Perl would have +# different results +UTF8_CHAR: Matches utf8 from 1 to 5 bytes +=> UTF8 :safe only_ebcdic_platform +0x0 - 0x3FFFFF: + QUOTEMETA: Meta-characters that \Q should quote => high :fast \p{_Perl_Quotemeta} |