diff options
author | Tony Cook <tony@develop-help.com> | 2018-09-25 11:18:40 +1000 |
---|---|---|
committer | Tony Cook <tony@develop-help.com> | 2018-10-10 11:12:13 +1100 |
commit | 1ed4b7762a858fb9c71bc209fe868060f3774cb5 (patch) | |
tree | d7fd59a4d3f823d2d46530e79be9da4dff4f2b64 /pod | |
parent | 03b94aa47e981af3c7b0118bfb11facda2b95251 (diff) | |
download | perl-1ed4b7762a858fb9c71bc209fe868060f3774cb5.tar.gz |
(perl #125760) fatalize sysread/syswrite/recv/send on :utf8 handles
This includes removing the :utf8 logic from pp_syswrite. pp_sysread
retains it, since it's also used for read().
Tests that are specifically testing the behaviour against :utf8
handles have been removed (eg in lib/open.t), several other tests
that incidentally used those functions on :utf8 handles have been
adapted to use :raw handles instead (eg. op/readline.t).
Test lib/sigtrap.t fails if STDERR is :utf8, in code from the
original 5.000 commit, which is intended to run in a signal handler
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perldiag.pod | 17 | ||||
-rw-r--r-- | pod/perlfunc.pod | 33 |
2 files changed, 16 insertions, 34 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod index 59c5e79ad1..4a50e5d9d8 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -3206,27 +3206,24 @@ neither as a system call nor an ioctl call (SIOCATMARK). Perl. The current valid ones are given in L<perlrebackslash/\b{}, \b, \B{}, \B>. -=item %s() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30 +=item %s() isn't allowed on :utf8 handles -(D deprecated) The sysread(), recv(), syswrite() and send() operators are -deprecated on handles that have the C<:utf8> layer, either explicitly, or +(F) The sysread(), recv(), syswrite() and send() operators are +not allowed on handles that have the C<:utf8> layer, either explicitly, or implicitly, eg., with the C<:encoding(UTF-16LE)> layer. -Both sysread() and recv() currently use only the C<:utf8> flag for the stream, -ignoring the actual layers. Since sysread() and recv() do no UTF-8 +Previously sysread() and recv() currently use only the C<:utf8> flag for the stream, +ignoring the actual layers. Since sysread() and recv() did no UTF-8 validation they can end up creating invalidly encoded scalars. -Similarly, syswrite() and send() use only the C<:utf8> flag, otherwise ignoring -any layers. If the flag is set, both write the value UTF-8 encoded, even if +Similarly, syswrite() and send() used only the C<:utf8> flag, otherwise ignoring +any layers. If the flag is set, both wrote the value UTF-8 encoded, even if the layer is some different encoding, such as the example above. Ideally, all of these operators would completely ignore the C<:utf8> state, working only with bytes, but this would result in silently breaking existing code. -In Perl 5.30, it will no longer be possible to use sysread(), recv(), -syswrite() or send() to read or send bytes from/to :utf8 handles. - =item "%s" is more clearly written simply as "%s" in regex; marked by S<<-- HERE> in m/%s/ (W regexp) (only under C<S<use re 'strict'>> or within C<(?[...])>) diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index a2fad3b8fc..316daff1cf 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -6284,14 +6284,9 @@ string otherwise. If there's an error, returns the undefined value. This call is actually implemented in terms of the L<recvfrom(2)> system call. See L<perlipc/"UDP: Message Passing"> for examples. -Note the I<characters>: depending on the status of the socket, either -(8-bit) bytes or characters are received. By default all sockets -operate on bytes, but for example if the socket has been changed using -L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the -C<:encoding(UTF-8)> I/O layer (see the L<open> pragma), the I/O will -operate on UTF8-encoded Unicode -characters, not bytes. Similarly for the C<:encoding> layer: in that -case pretty much any characters can be read. +Note that if the socket has been marked as C<:utf8>, C<recv> will +throw an exception. The C<:encoding(...)> layer implicitly introduces +the C<:utf8> layer. See L<C<binmode>|/binmode FILEHANDLE, LAYER>. =item redo LABEL X<redo> @@ -7083,14 +7078,9 @@ case it does a L<sendto(2)> syscall. Returns the number of characters sent, or the undefined value on error. The L<sendmsg(2)> syscall is currently unimplemented. See L<perlipc/"UDP: Message Passing"> for examples. -Note the I<characters>: depending on the status of the socket, either -(8-bit) bytes or characters are sent. By default all sockets operate -on bytes, but for example if the socket has been changed using -L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the -C<:encoding(UTF-8)> I/O layer (see L<C<open>|/open FILEHANDLE,EXPR>, or -the L<open> pragma), the I/O will operate on UTF-8 -encoded Unicode characters, not bytes. Similarly for the C<:encoding> -layer: in that case pretty much any characters can be sent. +Note that if the socket has been marked as C<:utf8>, C<send> will +throw an exception. The C<:encoding(...)> layer implicitly introduces +the C<:utf8> layer. See L<C<binmode>|/binmode FILEHANDLE, LAYER>. =item setpgrp PID,PGRP X<setpgrp> X<group> @@ -8723,10 +8713,8 @@ L<C<eof>|/eof FILEHANDLE> doesn't work well on device files (like ttys) anyway. Use L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> and check for a return value of 0 to decide whether you're done. -Note that if the filehandle has been marked as C<:utf8>, Unicode -characters are read instead of bytes (the LENGTH, OFFSET, and the -return value of L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> -are in Unicode characters). The C<:encoding(...)> layer implicitly +Note that if the filehandle has been marked as C<:utf8>, C<sysread> will +throw an exception. The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer. See L<C<binmode>|/binmode FILEHANDLE, LAYER>, L<C<open>|/open FILEHANDLE,EXPR>, and the L<open> pragma. @@ -8887,10 +8875,7 @@ string other than the beginning. A negative OFFSET specifies writing that many characters counting backwards from the end of the string. If SCALAR is of length zero, you can only use an OFFSET of 0. -B<WARNING>: If the filehandle is marked C<:utf8>, Unicode characters -encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and -return value of L<C<syswrite>|/syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET> -are in (UTF8-encoded Unicode) characters. +B<WARNING>: If the filehandle is marked C<:utf8>, C<syswrite> will raise an exception. The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer. Alternately, if the handle is not marked with an encoding but you attempt to write characters with code points over 255, raises an exception. |