summaryrefslogtreecommitdiff
path: root/inline.h
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2016-08-26 16:29:54 -0600
committerKarl Williamson <khw@cpan.org>2016-08-31 20:32:37 -0600
commit35f8c9bd0ff4f298f8bc09ae9848a14a9667a95a (patch)
tree2d3a60c10c7a5b9e8f379eb4389f44e8c64a1195 /inline.h
parenteda91ad75c71796ed6c5d3da7850b2fd7566c2a2 (diff)
downloadperl-35f8c9bd0ff4f298f8bc09ae9848a14a9667a95a.tar.gz
Move isUTF8_CHAR helper function, and reimplement it
The macro isUTF8_CHAR calls a helper function for code points higher than it can handle. That function had been an inlined wrapper around utf8n_to_uvchr(). The function has been rewritten to not call utf8n_to_uvchr(), so it is now too big to be effectively inlined. Instead, it implements a faster method of checking the validity of the UTF-8 without having to decode it. It just checks for valid syntax and now knows where the few discontinuities are in UTF-8 where overlongs can occur, and uses a string compare to verify that overflow won't occur. As a result this is now a pure function. This also causes a previously generated deprecation warning to not be, because in printing UTF-8, no longer does it have to be converted to internal form. I could add a check for that, but I think it's best not to. If you manipulated what is getting printed in any way, the deprecation message will already have been raised. This commit also fleshes out the documentation of isUTF8_CHAR.
Diffstat (limited to 'inline.h')
-rw-r--r--inline.h30
1 files changed, 0 insertions, 30 deletions
diff --git a/inline.h b/inline.h
index f709572ff6..0dcc733851 100644
--- a/inline.h
+++ b/inline.h
@@ -277,36 +277,6 @@ S_append_utf8_from_native_byte(const U8 byte, U8** dest)
}
/*
-
-A helper function for the macro isUTF8_CHAR(), which should be used instead of
-this function. The macro will handle smaller code points directly saving time,
-using this function as a fall-back for higher code points.
-
-Tests if the first bytes of string C<s> form a valid UTF-8 character. 0 is
-returned if the bytes starting at C<s> up to but not including C<e> do not form a
-complete well-formed UTF-8 character; otherwise the number of bytes in the
-character is returned.
-
-Note that an INVARIANT (i.e. ASCII on non-EBCDIC) character is a valid UTF-8
-character.
-
-=cut */
-PERL_STATIC_INLINE STRLEN
-S__is_utf8_char_slow(const U8 *s, const U8 *e)
-{
- dTHX; /* The function called below requires thread context */
-
- STRLEN actual_len;
-
- PERL_ARGS_ASSERT__IS_UTF8_CHAR_SLOW;
-
- assert(e >= s);
- utf8n_to_uvchr(s, e - s, &actual_len, UTF8_CHECK_ONLY);
-
- return (actual_len == (STRLEN) -1) ? 0 : actual_len;
-}
-
-/*
=for apidoc valid_utf8_to_uvchr
Like L</utf8_to_uvchr_buf>(), but should only be called when it is known that
the next character in the input UTF-8 string C<s> is well-formed (I<e.g.>,