summaryrefslogtreecommitdiff
path: root/utf8.c
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2013-02-26 11:02:33 -0700
committerKarl Williamson <public@khwilliamson.com>2013-08-29 09:55:57 -0600
commit4f83cdcd5c1f4154a1ecc18f39f9e5c3f21bc4b3 (patch)
tree2cfb559f611fb57257b30fde275e0e11f4fb0fd9 /utf8.c
parent5495102afe3b4589647ff274c9692632113ce6f4 (diff)
downloadperl-4f83cdcd5c1f4154a1ecc18f39f9e5c3f21bc4b3.tar.gz
Deprecate utf8_to_uni_buf()
Now that the tables are stored in native order, there is almost no need for code to be dealing in Unicode order. According to grep.cpan.me, there are no uses of this function in CPAN.
Diffstat (limited to 'utf8.c')
-rw-r--r--utf8.c16
1 files changed, 8 insertions, 8 deletions
diff --git a/utf8.c b/utf8.c
index 4cc12d62f0..b1dc30b9dd 100644
--- a/utf8.c
+++ b/utf8.c
@@ -996,13 +996,14 @@ Perl_utf8_to_uvchr(pTHX_ const U8 *s, STRLEN *retlen)
/*
=for apidoc utf8_to_uvuni_buf
-Returns the Unicode code point of the first character in the string C<s> which
+Only in very rare circumstances should code need to be dealing in the Unicode
+code point. Use L</utf8_to_uvchr_buf> instead.
+
+Returns the Unicode (not-native) code point of the first character in the
+string C<s> which
is assumed to be in UTF-8 encoding; C<send> points to 1 beyond the end of C<s>.
C<retlen> will be set to the length, in bytes, of that character.
-This function should only be used when the returned UV is considered
-an index into the Unicode semantic tables (e.g. swashes).
-
If C<s> does not point to a well-formed UTF-8 character and UTF8 warnings are
enabled, zero is returned and C<*retlen> is set (if C<retlen> isn't
NULL) to -1. If those warnings are off, the computed value if well-defined (or
@@ -1046,12 +1047,11 @@ Returns the Unicode code point of the first character in the string C<s>
which is assumed to be in UTF-8 encoding; C<retlen> will be set to the
length, in bytes, of that character.
-This function should only be used when the returned UV is considered
-an index into the Unicode semantic tables (e.g. swashes).
-
Some, but not all, UTF-8 malformations are detected, and in fact, some
malformed input could cause reading beyond the end of the input buffer, which
-is why this function is deprecated. Use L</utf8_to_uvuni_buf> instead.
+is one reason why this function is deprecated. The other is that only in
+extremely limited circumstances should the Unicode versus native code point be
+of any interest to you. Use L</utf8_to_uvchr_buf> instead.
If C<s> points to one of the detected malformations, and UTF8 warnings are
enabled, zero is returned and C<*retlen> is set (if C<retlen> doesn't point to