diff options
author | Karl Williamson <khw@cpan.org> | 2018-07-01 16:00:41 -0600 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2018-07-05 14:47:19 -0600 |
commit | c5bfbb64f98c2c7e8055565dd018e0a2a8565c10 (patch) | |
tree | 480e811b72524783daf394b67ec17dc426538540 /regen/regcharclass.pl | |
parent | 67049a5ffa8b7757041edb8f972a0a74fbe5d63d (diff) | |
download | perl-c5bfbb64f98c2c7e8055565dd018e0a2a8565c10.tar.gz |
Make isC9_STRICT_UTF8_CHAR() an inline dfa
This replaces a complicated trie with a dfa. This should cut down the
number of conditionals encountered in parsing many code points.
Diffstat (limited to 'regen/regcharclass.pl')
-rwxr-xr-x | regen/regcharclass.pl | 47 |
1 files changed, 0 insertions, 47 deletions
diff --git a/regen/regcharclass.pl b/regen/regcharclass.pl index 3dee00060b..84936360d9 100755 --- a/regen/regcharclass.pl +++ b/regen/regcharclass.pl @@ -1637,53 +1637,6 @@ SURROGATE: Surrogate code points => UTF8 :safe \p{_Perl_Surrogate} -# This program was run with this enabled, and the results copied to utf8.h and -# utfebcdic.h; then this was commented out because it takes so long to figure -# out these 2 million code points. The results would not change unless utf8.h -# decides it wants a different maximum, or this program creates better -# optimizations. Trying with 5 bytes used too much memory to calculate. -# -# We don't generate code for invariants here because the EBCDIC form is too -# complicated and would slow things down; instead the user should test for -# invariants first. -# -# 0x1FFFFF was chosen because for both UTF-8 and UTF-EBCDIC, its start byte -# is the same as 0x10FFFF, and it includes all the above-Unicode code points -# that have that start byte. In other words, it is the natural stopping place -# that includes all Unicode code points. -# -#STRICT_UTF8_CHAR: Matches legal Unicode UTF-8 variant code points, no surrrogates nor non-character code points -#=> UTF8 :no_length_checks only_ebcdic_platform -#0x00A0 - 0xD7FF -#0xE000 - 0xFDCF -#0xFDF0 - 0xFFFD -#0x10000 - 0x1FFFD -#0x20000 - 0x2FFFD -#0x30000 - 0x3FFFD -#0x40000 - 0x4FFFD -#0x50000 - 0x5FFFD -#0x60000 - 0x6FFFD -#0x70000 - 0x7FFFD -#0x80000 - 0x8FFFD -#0x90000 - 0x9FFFD -#0xA0000 - 0xAFFFD -#0xB0000 - 0xBFFFD -#0xC0000 - 0xCFFFD -#0xD0000 - 0xDFFFD -#0xE0000 - 0xEFFFD -#0xF0000 - 0xFFFFD -#0x100000 - 0x10FFFD - -#C9_STRICT_UTF8_CHAR: Matches legal Unicode UTF-8 variant code points, no surrogates -#=> UTF8 :no_length_checks only_ascii_platform -#0x0080 - 0xD7FF -#0xE000 - 0x10FFFF -# -#C9_STRICT_UTF8_CHAR: Matches legal Unicode UTF-8 variant code points including non-character code points, no surrogates -#=> UTF8 :no_length_checks only_ebcdic_platform -#0x00A0 - 0xD7FF -#0xE000 - 0x10FFFF - QUOTEMETA: Meta-characters that \Q should quote => high :fast \p{_Perl_Quotemeta} |