summaryrefslogtreecommitdiff
path: root/regen/regcharclass.pl
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2016-08-27 20:08:52 -0600
committerKarl Williamson <khw@cpan.org>2016-08-31 20:32:37 -0600
commit89d986df51f55257b8cc3f6e4f54eba60f607e48 (patch)
tree1bcfc07660079fb264bd43052493ad5d1243cd0c /regen/regcharclass.pl
parent52be253637900db8bbf44d387a06af529972b855 (diff)
downloadperl-89d986df51f55257b8cc3f6e4f54eba60f607e48.tar.gz
Make 3 UTF-8 macros API
These may be useful to various module writers. They certainly are useful for Encode. This makes public API macros to determine if the input UTF-8 represents (one macro for each category) a) a surrogate code point b) a non-character code point c) a code point that is above Unicode's legal maximum. The macros are machine generated. In making them public, I am now using the string end location parameter to guard against running off the end of the input. Previously this parameter was ignored, as their use in the core could be tightly controlled so that we already knew that the string was long enough when calling these macros. But this can't be guaranteed in the public API. An optimizing compiler should be able to remove redundant length checks.
Diffstat (limited to 'regen/regcharclass.pl')
-rwxr-xr-xregen/regcharclass.pl4
1 files changed, 2 insertions, 2 deletions
diff --git a/regen/regcharclass.pl b/regen/regcharclass.pl
index 9115eafeb6..e22720b508 100755
--- a/regen/regcharclass.pl
+++ b/regen/regcharclass.pl
@@ -1630,11 +1630,11 @@ REPLACEMENT: Unicode REPLACEMENT CHARACTER
0xFFFD
NONCHAR: Non character code points
-=> UTF8 :fast
+=> UTF8 :safe
\p{_Perl_Nchar}
SURROGATE: Surrogate characters
-=> UTF8 :fast
+=> UTF8 :safe
\p{_Perl_Surrogate}
# This program was run with this enabled, and the results copied to utf8.h;