diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2000-10-25 20:00:48 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2000-10-25 20:00:48 +0000 |
commit | dcad28805702d580064bc39a267d63c58bbb3b3f (patch) | |
tree | c59311ffbadd7bc18b3b7bcc1b2158652051f134 /pod | |
parent | fcc8fcf67e5ea5f08178c9ac86509bc972ef38ff (diff) | |
download | perl-dcad28805702d580064bc39a267d63c58bbb3b3f.tar.gz |
Continue the internal UTF-8 API tweaking.
Rename utf8_to_uv_chk() back to utf8_to_uv() because it's
used much more than the simpler API, now called utf8_to_uv_simple().
Still not quite happy with API, too much partial duplication
of functionality.
p4raw-id: //depot/perl@7439
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perlapi.pod | 44 | ||||
-rw-r--r-- | pod/perlunicode.pod | 5 |
2 files changed, 22 insertions, 27 deletions
diff --git a/pod/perlapi.pod b/pod/perlapi.pod index 730d89f896..634180f7ef 100644 --- a/pod/perlapi.pod +++ b/pod/perlapi.pod @@ -2368,19 +2368,19 @@ false, defined or undefined. Does not handle 'get' magic. =for hackers Found in file sv.h -=item svtype +=item SvTYPE -An enum of flags for Perl types. These are found in the file B<sv.h> -in the C<svtype> enum. Test these flags with the C<SvTYPE> macro. +Returns the type of the SV. See C<svtype>. + + svtype SvTYPE(SV* sv) =for hackers Found in file sv.h -=item SvTYPE - -Returns the type of the SV. See C<svtype>. +=item svtype - svtype SvTYPE(SV* sv) +An enum of flags for Perl types. These are found in the file B<sv.h> +in the C<svtype> enum. Test these flags with the C<SvTYPE> macro. =for hackers Found in file sv.h @@ -3218,32 +3218,32 @@ Found in file utf8.c =item utf8_to_uv Returns the character value of the first character in the string C<s> -which is assumed to be in UTF8 encoding; C<retlen> will be set to the -length, in bytes, of that character, and the pointer C<s> will be -advanced to the end of the character. +which is assumed to be in UTF8 encoding and no longer than C<curlen>; +C<retlen> will be set to the length, in bytes, of that character, +and the pointer C<s> will be advanced to the end of the character. -If C<s> does not point to a well-formed UTF8 character, an optional UTF8 +If C<s> does not point to a well-formed UTF8 character, the behaviour +is dependent on the value of C<checking>: if this is true, it is +assumed that the caller will raise a warning, and this function will +set C<retlen> to C<-1> and return. If C<checking> is not true, an optional UTF8 warning is produced. - U8* s utf8_to_uv(STRLEN *retlen) + U8* s utf8_to_uv(STRLEN curlen, I32 *retlen, U32 flags) =for hackers Found in file utf8.c -=item utf8_to_uv_chk +=item utf8_to_uv_simple Returns the character value of the first character in the string C<s> -which is assumed to be in UTF8 encoding and no longer than C<curlen>; -C<retlen> will be set to the length, in bytes, of that character, -and the pointer C<s> will be advanced to the end of the character. +which is assumed to be in UTF8 encoding; C<retlen> will be set to the +length, in bytes, of that character, and the pointer C<s> will be +advanced to the end of the character. -If C<s> does not point to a well-formed UTF8 character, the behaviour -is dependent on the value of C<checking>: if this is true, it is -assumed that the caller will raise a warning, and this function will -set C<retlen> to C<-1> and return. If C<checking> is not true, an optional UTF8 -warning is produced. +If C<s> does not point to a well-formed UTF8 character, zero is +returned and retlen is set, if possible, to -1. - U8* s utf8_to_uv_chk(STRLEN curlen, I32 *retlen, I32 checking) + U8* s utf8_to_uv_simple(STRLEN *retlen) =for hackers Found in file utf8.c diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 145c953099..c9954d8e96 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -71,11 +71,6 @@ on Windows. Regardless of the above, the C<bytes> pragma can always be used to force byte semantics in a particular lexical scope. See L<bytes>. -One effect of the C<utf8> pragma is that the internal UTF-8 decoding -becomes stricter so that the character 0xFFFF (UTF-8 bytes 0xEF 0xBF -0xBF), and the bytes 0xFE and 0xFF, start to cause warnings if they -appear in the data. - The C<utf8> pragma is primarily a compatibility device that enables recognition of UTF-8 in literals encountered by the parser. It may also be used for enabling some of the more experimental Unicode support features. |