summaryrefslogtreecommitdiff
path: root/regen
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2012-09-05 20:32:29 -0600
committerKarl Williamson <public@khwilliamson.com>2012-09-13 21:14:04 -0600
commitb96a92fb2dbf3acb43641479fc731469e1de9f6c (patch)
treeb4fe776a41b8764ea8b916796d1c70bd3a1c8fa5 /regen
parent6e130234c25b195bf5141bd859d947ec051416ec (diff)
downloadperl-b96a92fb2dbf3acb43641479fc731469e1de9f6c.tar.gz
utf8.h: Remove some EBCDIC dependencies
regen/regcharclass.pl has been enhanced in previous commits so that it generates as good code as these hand-defined macro definitions for various UTF-8 constructs. And, it should be able to generate EBCDIC ones as well. By using its definitions, we can remove the EBCDIC dependencies for them. It is quite possible that the EBCDIC versions were wrong, since they have never been tested. Even if regcharclass.pl has bugs under EBCDIC, it is easier to find and fix those in one place, than all the sundry definitions.
Diffstat (limited to 'regen')
-rwxr-xr-xregen/regcharclass.pl12
1 files changed, 12 insertions, 0 deletions
diff --git a/regen/regcharclass.pl b/regen/regcharclass.pl
index 70f46b03b4..81ac13ce45 100755
--- a/regen/regcharclass.pl
+++ b/regen/regcharclass.pl
@@ -1112,6 +1112,18 @@ VERTWS: Vertical Whitespace: \v \V
=> generic UTF8 LATIN1 cp :fast safe
\p{VertSpace}
+REPLACEMENT: Unicode REPLACEMENT CHARACTER
+=> UTF8 :safe
+0xFFFD
+
+NONCHAR: Non character code points
+=> UTF8 :fast
+\p{Nchar}
+
+SURROGATE: Surrogate characters
+=> UTF8 :fast
+\p{Gc=Cs}
+
GCB_L: Grapheme_Cluster_Break=L
=> UTF8 :fast
\p{_X_GCB_L}