summaryrefslogtreecommitdiff
path: root/ext/POSIX/lib
diff options
context:
space:
mode:
authorKarl Williamson <khw@cpan.org>2020-01-03 22:18:02 -0700
committerKarl Williamson <khw@cpan.org>2020-02-19 22:09:48 -0700
commit5a6637f01ac3edd837722b6d156ddb2250ea049f (patch)
treef6138a6fbd869dcd0a560fffcd31a0aca1321556 /ext/POSIX/lib
parent63bebc1439c3a3eefd310e674f2593ff60cceca4 (diff)
downloadperl-5a6637f01ac3edd837722b6d156ddb2250ea049f.tar.gz
Fixup POSIX::mbtowc, wctomb
This commit enhances these functions so that on threaded perls, they use mbrtowc and wcrtomb when available, making them thread safe. The substitution isn't completely transparent, as no effort is made to hide any differences in errno setting upon error. And there may be slight differences in edge case behavior on some platforms. This commit also changes the behaviors so that they take a scalar parameter instead of a char *, and this might be 'undef' or not be forceable into a valid PV. If not a PV, the functions initialize the shift state. Previously the shift state was always reinitialized with every call, which meant these could not work on locales with shift states. In addition, there were several issues in mbtowc and wctomb that this commit fixes. mbtowc and wctomb, when used, are now run with a semaphore. This avoids races if called at the same time in another thread. The returned wide character from mbtowc() could well have been garbage. The final parameter to mbtowc is now optional, as passing an SV allows us to determine the length without the need for an extra parameter. It is now used only to restrict the parsing of the string to shorter than the actual length. wctomb would segfault if the string parameter was shared or hadn't been pre-allocated with a string of sufficient length to hold the result.
Diffstat (limited to 'ext/POSIX/lib')
-rw-r--r--ext/POSIX/lib/POSIX.pod55
1 files changed, 51 insertions, 4 deletions
diff --git a/ext/POSIX/lib/POSIX.pod b/ext/POSIX/lib/POSIX.pod
index 9919f4c67f..2686457fa7 100644
--- a/ext/POSIX/lib/POSIX.pod
+++ b/ext/POSIX/lib/POSIX.pod
@@ -1098,9 +1098,35 @@ actual length of the first parameter string.
=item C<mbtowc>
-This is identical to the C function C<mbtowc()>.
+This is the same as the C function C<mbtowc()> on unthreaded perls. On
+threaded perls, it transparently (almost) substitutes the more
+thread-safe L<C<mbrtowc>(3)>, if available, instead of C<mbtowc>.
-See L</mblen>.
+Core Perl does not have any support for wide and multibyte locales,
+except Unicode UTF-8 locales. This function, in conjunction with
+L</mblen> and L</wctomb> may be used to roll your own decoding/encoding
+of other types of multi-byte locales.
+
+The first parameter is a scalar into which, upon success, the wide
+character represented by the multi-byte string contained in the second
+parameter is stored. The optional third parameter is ignored if it is
+larger than the actual length of the second parameter string.
+
+Use C<undef> as the second parameter to this function to get the effect
+of passing NULL as the second parameter to C<mbtowc>. This resets any
+shift state to its initial value. The return value is undefined if
+C<mbrtowc> was substituted, so you should never rely on it.
+
+When the second parameter is a scalar containing a value that either is
+a PV string or can be forced into one, the return value is the number of
+bytes occupied by the first character of that string; or 0 if that first
+character is the wide NUL character; or negative if there is an error.
+This is based on the locale that currently underlies the program,
+regardless of whether or not the function is called from Perl code that
+is within the scope of S<C<use locale>>. Perl makes no attempt at
+hiding from your code any differences in the C<errno> setting between
+C<mbtowc> and C<mbrtowc>. It does set C<errno> to 0 before calling
+them.
=item C<memchr>
@@ -2131,9 +2157,30 @@ See L</mblen>.
=item C<wctomb>
-This is identical to the C function C<wctomb()>.
+This is the same as the C function C<wctomb()> on unthreaded perls. On
+threaded perls, it transparently (almost) substitutes the more
+thread-safe L<C<wcrtomb>(3)>, if available, instead of C<wctomb>.
-See L</mblen>.
+Core Perl does not have any support for wide and multibyte locales,
+except Unicode UTF-8 locales. This function, in conjunction with
+L</mblen> and L</mbtowc> may be used to roll your own decoding/encoding
+of other types of multi-byte locales.
+
+Use C<undef> as the first parameter to this function to get the effect
+of passing NULL as the first parameter to C<wctomb>. This resets any
+shift state to its initial value. The return value is undefined if
+C<wcrtomb> was substituted, so you should never rely on it.
+
+When the first parameter is a scalar, the code point contained in the
+scalar second parameter is converted into a multi-byte string and stored
+into the first parameter scalar. This is based on the locale that
+currently underlies the program, regardless of whether or not the
+function is called from Perl code that is within the scope of S<C<use
+locale>>. The return value is the number of bytes stored; or negative
+if the code point isn't representable in the current locale. Perl makes
+no attempt at hiding from your code any differences in the C<errno>
+setting between C<wctomb> and C<wcrtomb>. It does set C<errno> to 0
+before calling them.
=item C<write>