utf8.h: Use machine generated IS_UTF8_CHAR()

This takes the output of regen/regcharclass.pl for all the 1-4 byte UTF8-representations of Unicode code points, and replaces the current hand-rolled definition there. It does this only for ASCII platforms, leaving EBCDIC to be machine generated when run on such a platform. I would rather have both versions to be regenerated each time it is needed to save an EBCDIC dependency, but it takes more than 10 minutes on my computer to process the 2 billion code points that have to be checked for on ASCII platforms, and currently t/porting/regen.t runs this program every times; and that slow down would be unacceptable. If this is ever run under EBCDIC, the macro should be machine computed (very slowly). So, even though there is an EBCDIC dependency, it has essentially been solved.
author: Karl Williamson <public@khwilliamson.com> 2012-09-05 20:56:09 -0600
committer: Karl Williamson <public@khwilliamson.com> 2012-09-13 21:14:04 -0600
commit: 4d6461409e812aecb1fa745debb6132ce8e5612d (patch)
tree: 233a2c093d46c73bc151240415219e0e7ed41b11 /regen
parent: ae1d4929d23a3d6949518058aa41cd90a700a4af (diff)
download: perl-4d6461409e812aecb1fa745debb6132ce8e5612d.tar.gz
1 files changed, 16 insertions, 0 deletions
diff --git a/regen/regcharclass.pl b/regen/regcharclass.pl
index c4f5951a3c..7d126428ef 100755
--- a/regen/regcharclass.pl
+++ b/regen/regcharclass.pl
@@ -1161,6 +1161,22 @@ GCB_V: Grapheme_Cluster_Break=V
 => UTF8 :fast
 \p{_X_GCB_V}
 
+# This program was run with this enabled, and the results copied to utf8.h;
+# then this was commented out because it takes so long to figure out these 2
+# million code points.  The results would not change unless utf8.h decides it
+# wants a maximum other than 4 bytes, or this program creates better
+# optimizations
+#UTF8_CHAR: Matches utf8 from 1 to 4 bytes
+#=> UTF8 :safe only_ascii_platform
+#0x0 - 0x1FFFFF
+
+# This hasn't been commented out, because we haven't an EBCDIC platform to run
+# it on, and the 3 types of EBCDIC allegedly supported by Perl would have
+# different results
+UTF8_CHAR: Matches utf8 from 1 to 5 bytes
+=> UTF8 :safe only_ebcdic_platform
+0x0 - 0x3FFFFF:
+
 QUOTEMETA: Meta-characters that \Q should quote
 => high :fast
 \p{_Perl_Quotemeta}
author	Karl Williamson <public@khwilliamson.com>	2012-09-05 20:56:09 -0600
committer	Karl Williamson <public@khwilliamson.com>	2012-09-13 21:14:04 -0600
commit	4d6461409e812aecb1fa745debb6132ce8e5612d (patch)
tree	233a2c093d46c73bc151240415219e0e7ed41b11 /regen
parent	ae1d4929d23a3d6949518058aa41cd90a700a4af (diff)
download	perl-4d6461409e812aecb1fa745debb6132ce8e5612d.tar.gz