diff options
-rw-r--r-- | pod/perlunicode.pod | 30 |
1 files changed, 19 insertions, 11 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 484f356dc0..1accda426b 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -825,7 +825,7 @@ for more discussion of the issues. =head2 Using Unicode in XS If you want to handle Perl Unicode in XS extensions, you may find -the following C APIs useful: +the following C APIs useful (see perlapi for details): =over 4 @@ -856,8 +856,8 @@ the UTF-8 byte sequence). =item * -utf8_length(s, len) returns the length of the UTF-8 encoded buffer in -characters. sv_len_utf8(sv) returns the length of the UTF-8 encoded +utf8_length(start, end) returns the length of the UTF-8 encoded buffer +in characters. sv_len_utf8(sv) returns the length of the UTF-8 encoded scalar. =item * @@ -869,7 +869,8 @@ get turned on. sv_utf8_decode() does the opposite of sv_utf8_encode(). =item * -is_utf8_char(buf) returns true if the buffer points to valid UTF-8. +is_utf8_char(s) returns true if the pointer points to a valid UTF-8 +character. =item * @@ -880,7 +881,10 @@ are valid UTF-8. UTF8SKIP(buf) will return the number of bytes in the UTF-8 encoded character in the buffer. UNISKIP(chr) will return the number of bytes -required to UTF-8-encode the Unicode character code point. +required to UTF-8-encode the Unicode character code point. UTF8SKIP() +is useful for example for iterating over the characters of a UTF-8 +encoded buffer; UNISKIP() is useful for example in computing +the size required for a UTF-8 encoded buffer. =item * @@ -891,20 +895,24 @@ two pointers pointing to the same UTF-8 encoded buffer. utf8_hop(s, off) will return a pointer to an UTF-8 encoded buffer that is C<off> (positive or negative) Unicode characters displaced from the -UTF-8 buffer C<s>. +UTF-8 buffer C<s>. Be careful not to overstep the buffer: utf8_hop() +will merrily run off the end or the beginning if told to do so. =item * pv_uni_display(dsv, spv, len, pvlim, flags) and sv_uni_display(dsv, ssv, pvlim, flags) are useful for debug output of Unicode strings and -scalars (only for debug: they display B<all> characters as hexadecimal -code points). +scalars. By default they are useful only for debug: they display +B<all> characters as hexadecimal code points, but with the flags +UNI_DISPLAY_ISPRINT and UNI_DISPLAY_BACKSLASH you can make the output +more readable. =item * -ibcmp_utf8(s1, u1, len1, s2, u2, len2) can be used to compare two -strings case-insensitively in Unicode. (For case-sensitive -comparisons you can just use memEQ() and memNE() as usual.) +ibcmp_utf8(s1, pe1, u1, l1, u1, s2, pe2, l2, u2) can be used to +compare two strings case-insensitively in Unicode. +(For case-sensitive comparisons you can just use memEQ() and memNE() +as usual.) =back |