diff options
author | Karl Williamson <public@khwilliamson.com> | 2012-03-19 15:38:06 -0600 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2012-03-19 18:23:44 -0600 |
commit | 4b88fb76efce8c436e63b907c9842345d4fa77c7 (patch) | |
tree | 67d8be3146bf0c32e93bd8209c141ed72c5a0ae2 /pod/perlguts.pod | |
parent | 27d6c58a7e12243bef66c58b38e7d1415d9ca07e (diff) | |
download | perl-4b88fb76efce8c436e63b907c9842345d4fa77c7.tar.gz |
Use the new utf8 to code point functions
These functions should be used in preference to the old ones which can
read beyond the end of the input string.
Diffstat (limited to 'pod/perlguts.pod')
-rw-r--r-- | pod/perlguts.pod | 7 |
1 files changed, 4 insertions, 3 deletions
diff --git a/pod/perlguts.pod b/pod/perlguts.pod index ee938ea137..908fa1f0bd 100644 --- a/pod/perlguts.pod +++ b/pod/perlguts.pod @@ -2670,17 +2670,18 @@ character like this (the UTF8_IS_INVARIANT() is a macro that tests whether the byte can be encoded as a single byte even in UTF-8): U8 *utf; + U8 *utf_end; /* 1 beyond buffer pointed to by utf */ UV uv; /* Note: a UV, not a U8, not a char */ STRLEN len; /* length of character in bytes */ if (!UTF8_IS_INVARIANT(*utf)) /* Must treat this as UTF-8 */ - uv = utf8_to_uvchr(utf, &len); + uv = utf8_to_uvchr_buf(utf, utf_end, &len); else /* OK to treat this character as a byte */ uv = *utf; -You can also see in that example that we use C<utf8_to_uvchr> to get the +You can also see in that example that we use C<utf8_to_uvchr_buf> to get the value of the character; the inverse function C<uvchr_to_utf8> is available for putting a UV into UTF-8: @@ -2792,7 +2793,7 @@ it's not - if you pass on the PV to somewhere, pass on the flag too. =item * -If a string is UTF-8, B<always> use C<utf8_to_uvchr> to get at the value, +If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. =item * |