diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2000-12-08 16:33:39 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2000-12-08 16:33:39 +0000 |
commit | 28d3d195b04c40a4811db0a74e089d561a53d924 (patch) | |
tree | bdd0f4c521d3fd304c4ceca9c8acedf52302b4b9 /utf8.c | |
parent | 659f4fc511096e10da01a8fc865bdea31f309580 (diff) | |
download | perl-28d3d195b04c40a4811db0a74e089d561a53d924.tar.gz |
Do not return the Unicode replacement character if UTF-8
decoding goes awry, it should be up to the caller to decide.
p4raw-id: //depot/perl@8042
Diffstat (limited to 'utf8.c')
-rw-r--r-- | utf8.c | 16 |
1 files changed, 9 insertions, 7 deletions
@@ -189,11 +189,13 @@ and the pointer C<s> will be advanced to the end of the character. If C<s> does not point to a well-formed UTF8 character, the behaviour is dependent on the value of C<flags>: if it contains UTF8_CHECK_ONLY, it is assumed that the caller will raise a warning, and this function -will set C<retlen> to C<-1> and return zero. If the C<flags> does not -contain UTF8_CHECK_ONLY, the UNICODE_REPLACEMENT (0xFFFD) will be -returned, and C<retlen> will be set to the expected length of the -UTF-8 character in bytes. The C<flags> can also contain various flags -to allow deviations from the strict UTF-8 encoding (see F<utf8.h>). +will silently just set C<retlen> to C<-1> and return zero. If the +C<flags> does not contain UTF8_CHECK_ONLY, warnings about +malformations will be given, C<retlen> will be set to the expected +length of the UTF-8 character in bytes, and zero will be returned. + +The C<flags> can also contain various flags to allow deviations from +the strict UTF-8 encoding (see F<utf8.h>). =cut */ @@ -339,9 +341,9 @@ malformed: } if (retlen) - *retlen = expectlen; + *retlen = expectlen ? expectlen : len; - return UNICODE_REPLACEMENT; + return 0; } /* |