summaryrefslogtreecommitdiff
path: root/utf8.h
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2014-01-01 20:08:02 -0700
committerKarl Williamson <public@khwilliamson.com>2014-01-01 20:29:37 -0700
commitea5ced44af8b967bfce3763b11ba4714d4fcd154 (patch)
treef6dbae186171fc57cfb675765a07ca8b2bc2eb9f /utf8.h
parent23dfa30831199ff0adfa3c42488e59e3df455e2c (diff)
downloadperl-ea5ced44af8b967bfce3763b11ba4714d4fcd154.tar.gz
Change some warnings in utf8n_to_uvchr()
This bottom level function decodes the first character of a UTF-8 string into a code point. It is discouraged from using it directly. This commit cleans up some of the warnings it can raise. Now, tests for malformations are done before any tests for other potential issues. One of those issues involves code points so large that they have never appeared in any official standard (the current standard has scaled back the highest acceptable code point from earlier versions). It is possible (though not done in CPAN) to warn and/or forbid these code points, while accepting smaller code points that are still above the legal Unicode maximum. The warning message for this now includes the code point if representable on the machine. Previously it always displayed raw bytes, which is what it still does for non-representable code points.
Diffstat (limited to 'utf8.h')
-rw-r--r--utf8.h4
1 files changed, 3 insertions, 1 deletions
diff --git a/utf8.h b/utf8.h
index 2d4877552b..f72a2433cc 100644
--- a/utf8.h
+++ b/utf8.h
@@ -447,7 +447,9 @@ Perl's extended UTF-8 means we can have start bytes up to FF.
#define UTF8_WARN_SUPER 0x0400 /* points above the legal max */
/* Code points which never were part of the original UTF-8 standard, the first
- * byte of which is a FE or FF on ASCII platforms. */
+ * byte of which is a FE or FF on ASCII platforms. If the first byte is FF, it
+ * will overflow a 32-bit word. If the first byte is FE, it will overflow a
+ * signed 32-bit word. */
#define UTF8_DISALLOW_FE_FF 0x0800
#define UTF8_WARN_FE_FF 0x1000