summaryrefslogtreecommitdiff
path: root/pod/perlunicode.pod
diff options
context:
space:
mode:
authorJarkko Hietaniemi <jhi@iki.fi>2002-02-19 15:01:25 +0000
committerJarkko Hietaniemi <jhi@iki.fi>2002-02-19 15:01:25 +0000
commit90f968e0b6755d9afac55901e4fbdc41d13dc5a4 (patch)
tree6dd802cf480da94b3a8d5852ca3a1ccc4a360c42 /pod/perlunicode.pod
parent95cc3e0cadf76343f00139ab2ca28b282e01d6cf (diff)
downloadperl-90f968e0b6755d9afac55901e4fbdc41d13dc5a4.tar.gz
UTF-8 C API doc tweaks.
p4raw-id: //depot/perl@14772
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r--pod/perlunicode.pod30
1 files changed, 19 insertions, 11 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 484f356dc0..1accda426b 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -825,7 +825,7 @@ for more discussion of the issues.
=head2 Using Unicode in XS
If you want to handle Perl Unicode in XS extensions, you may find
-the following C APIs useful:
+the following C APIs useful (see perlapi for details):
=over 4
@@ -856,8 +856,8 @@ the UTF-8 byte sequence).
=item *
-utf8_length(s, len) returns the length of the UTF-8 encoded buffer in
-characters. sv_len_utf8(sv) returns the length of the UTF-8 encoded
+utf8_length(start, end) returns the length of the UTF-8 encoded buffer
+in characters. sv_len_utf8(sv) returns the length of the UTF-8 encoded
scalar.
=item *
@@ -869,7 +869,8 @@ get turned on. sv_utf8_decode() does the opposite of sv_utf8_encode().
=item *
-is_utf8_char(buf) returns true if the buffer points to valid UTF-8.
+is_utf8_char(s) returns true if the pointer points to a valid UTF-8
+character.
=item *
@@ -880,7 +881,10 @@ are valid UTF-8.
UTF8SKIP(buf) will return the number of bytes in the UTF-8 encoded
character in the buffer. UNISKIP(chr) will return the number of bytes
-required to UTF-8-encode the Unicode character code point.
+required to UTF-8-encode the Unicode character code point. UTF8SKIP()
+is useful for example for iterating over the characters of a UTF-8
+encoded buffer; UNISKIP() is useful for example in computing
+the size required for a UTF-8 encoded buffer.
=item *
@@ -891,20 +895,24 @@ two pointers pointing to the same UTF-8 encoded buffer.
utf8_hop(s, off) will return a pointer to an UTF-8 encoded buffer that
is C<off> (positive or negative) Unicode characters displaced from the
-UTF-8 buffer C<s>.
+UTF-8 buffer C<s>. Be careful not to overstep the buffer: utf8_hop()
+will merrily run off the end or the beginning if told to do so.
=item *
pv_uni_display(dsv, spv, len, pvlim, flags) and sv_uni_display(dsv,
ssv, pvlim, flags) are useful for debug output of Unicode strings and
-scalars (only for debug: they display B<all> characters as hexadecimal
-code points).
+scalars. By default they are useful only for debug: they display
+B<all> characters as hexadecimal code points, but with the flags
+UNI_DISPLAY_ISPRINT and UNI_DISPLAY_BACKSLASH you can make the output
+more readable.
=item *
-ibcmp_utf8(s1, u1, len1, s2, u2, len2) can be used to compare two
-strings case-insensitively in Unicode. (For case-sensitive
-comparisons you can just use memEQ() and memNE() as usual.)
+ibcmp_utf8(s1, pe1, u1, l1, u1, s2, pe2, l2, u2) can be used to
+compare two strings case-insensitively in Unicode.
+(For case-sensitive comparisons you can just use memEQ() and memNE()
+as usual.)
=back