summaryrefslogtreecommitdiff
path: root/pod
diff options
context:
space:
mode:
authorTony Cook <tony@develop-help.com>2018-09-25 11:18:40 +1000
committerTony Cook <tony@develop-help.com>2018-10-10 11:12:13 +1100
commit1ed4b7762a858fb9c71bc209fe868060f3774cb5 (patch)
treed7fd59a4d3f823d2d46530e79be9da4dff4f2b64 /pod
parent03b94aa47e981af3c7b0118bfb11facda2b95251 (diff)
downloadperl-1ed4b7762a858fb9c71bc209fe868060f3774cb5.tar.gz
(perl #125760) fatalize sysread/syswrite/recv/send on :utf8 handles
This includes removing the :utf8 logic from pp_syswrite. pp_sysread retains it, since it's also used for read(). Tests that are specifically testing the behaviour against :utf8 handles have been removed (eg in lib/open.t), several other tests that incidentally used those functions on :utf8 handles have been adapted to use :raw handles instead (eg. op/readline.t). Test lib/sigtrap.t fails if STDERR is :utf8, in code from the original 5.000 commit, which is intended to run in a signal handler
Diffstat (limited to 'pod')
-rw-r--r--pod/perldiag.pod17
-rw-r--r--pod/perlfunc.pod33
2 files changed, 16 insertions, 34 deletions
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 59c5e79ad1..4a50e5d9d8 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3206,27 +3206,24 @@ neither as a system call nor an ioctl call (SIOCATMARK).
Perl. The current valid ones are given in
L<perlrebackslash/\b{}, \b, \B{}, \B>.
-=item %s() is deprecated on :utf8 handles. This will be a fatal error in Perl 5.30
+=item %s() isn't allowed on :utf8 handles
-(D deprecated) The sysread(), recv(), syswrite() and send() operators are
-deprecated on handles that have the C<:utf8> layer, either explicitly, or
+(F) The sysread(), recv(), syswrite() and send() operators are
+not allowed on handles that have the C<:utf8> layer, either explicitly, or
implicitly, eg., with the C<:encoding(UTF-16LE)> layer.
-Both sysread() and recv() currently use only the C<:utf8> flag for the stream,
-ignoring the actual layers. Since sysread() and recv() do no UTF-8
+Previously sysread() and recv() currently use only the C<:utf8> flag for the stream,
+ignoring the actual layers. Since sysread() and recv() did no UTF-8
validation they can end up creating invalidly encoded scalars.
-Similarly, syswrite() and send() use only the C<:utf8> flag, otherwise ignoring
-any layers. If the flag is set, both write the value UTF-8 encoded, even if
+Similarly, syswrite() and send() used only the C<:utf8> flag, otherwise ignoring
+any layers. If the flag is set, both wrote the value UTF-8 encoded, even if
the layer is some different encoding, such as the example above.
Ideally, all of these operators would completely ignore the C<:utf8> state,
working only with bytes, but this would result in silently breaking existing
code.
-In Perl 5.30, it will no longer be possible to use sysread(), recv(),
-syswrite() or send() to read or send bytes from/to :utf8 handles.
-
=item "%s" is more clearly written simply as "%s" in regex; marked by S<<-- HERE> in m/%s/
(W regexp) (only under C<S<use re 'strict'>> or within C<(?[...])>)
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index a2fad3b8fc..316daff1cf 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -6284,14 +6284,9 @@ string otherwise. If there's an error, returns the undefined value.
This call is actually implemented in terms of the L<recvfrom(2)> system call.
See L<perlipc/"UDP: Message Passing"> for examples.
-Note the I<characters>: depending on the status of the socket, either
-(8-bit) bytes or characters are received. By default all sockets
-operate on bytes, but for example if the socket has been changed using
-L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the
-C<:encoding(UTF-8)> I/O layer (see the L<open> pragma), the I/O will
-operate on UTF8-encoded Unicode
-characters, not bytes. Similarly for the C<:encoding> layer: in that
-case pretty much any characters can be read.
+Note that if the socket has been marked as C<:utf8>, C<recv> will
+throw an exception. The C<:encoding(...)> layer implicitly introduces
+the C<:utf8> layer. See L<C<binmode>|/binmode FILEHANDLE, LAYER>.
=item redo LABEL
X<redo>
@@ -7083,14 +7078,9 @@ case it does a L<sendto(2)> syscall. Returns the number of characters sent,
or the undefined value on error. The L<sendmsg(2)> syscall is currently
unimplemented. See L<perlipc/"UDP: Message Passing"> for examples.
-Note the I<characters>: depending on the status of the socket, either
-(8-bit) bytes or characters are sent. By default all sockets operate
-on bytes, but for example if the socket has been changed using
-L<C<binmode>|/binmode FILEHANDLE, LAYER> to operate with the
-C<:encoding(UTF-8)> I/O layer (see L<C<open>|/open FILEHANDLE,EXPR>, or
-the L<open> pragma), the I/O will operate on UTF-8
-encoded Unicode characters, not bytes. Similarly for the C<:encoding>
-layer: in that case pretty much any characters can be sent.
+Note that if the socket has been marked as C<:utf8>, C<send> will
+throw an exception. The C<:encoding(...)> layer implicitly introduces
+the C<:utf8> layer. See L<C<binmode>|/binmode FILEHANDLE, LAYER>.
=item setpgrp PID,PGRP
X<setpgrp> X<group>
@@ -8723,10 +8713,8 @@ L<C<eof>|/eof FILEHANDLE> doesn't work well on device files (like ttys)
anyway. Use L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> and
check for a return value of 0 to decide whether you're done.
-Note that if the filehandle has been marked as C<:utf8>, Unicode
-characters are read instead of bytes (the LENGTH, OFFSET, and the
-return value of L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET>
-are in Unicode characters). The C<:encoding(...)> layer implicitly
+Note that if the filehandle has been marked as C<:utf8>, C<sysread> will
+throw an exception. The C<:encoding(...)> layer implicitly
introduces the C<:utf8> layer. See
L<C<binmode>|/binmode FILEHANDLE, LAYER>,
L<C<open>|/open FILEHANDLE,EXPR>, and the L<open> pragma.
@@ -8887,10 +8875,7 @@ string other than the beginning. A negative OFFSET specifies writing
that many characters counting backwards from the end of the string.
If SCALAR is of length zero, you can only use an OFFSET of 0.
-B<WARNING>: If the filehandle is marked C<:utf8>, Unicode characters
-encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and
-return value of L<C<syswrite>|/syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET>
-are in (UTF8-encoded Unicode) characters.
+B<WARNING>: If the filehandle is marked C<:utf8>, C<syswrite> will raise an exception.
The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer.
Alternately, if the handle is not marked with an encoding but you
attempt to write characters with code points over 255, raises an exception.