diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2001-11-12 13:11:55 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2001-11-12 13:11:55 +0000 |
commit | bf0fa0b28861f64af680a3c19765ac8a24e4f2bd (patch) | |
tree | a358b8d15bc72e9c377f13a10f8359a377eab6db /pod | |
parent | 31f17f41bb8d60c477667b416652af44045ba3ed (diff) | |
download | perl-bf0fa0b28861f64af680a3c19765ac8a24e4f2bd.tar.gz |
Add a note about the dangers of bad UTF-8.
p4raw-id: //depot/perl@12953
Diffstat (limited to 'pod')
-rw-r--r-- | pod/perlunicode.pod | 12 |
1 files changed, 12 insertions, 0 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 2c9b078029..277238e452 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -742,6 +742,18 @@ is not extensible beyond 0xFFFF, because it does not use surrogates. A seven-bit safe (non-eight-bit) encoding, useful if the transport/storage is not eight-bit safe. Defined by RFC 2152. +=head2 Security Implications of Malformed UTF-8 + +Unfortunately, the specification of UTF-8 leaves some room for +interpretation of how many bytes of encoded output one should generate +from one input Unicode character. Strictly speaking, one is supposed +to always generate the shortest possible sequence of UTF-8 bytes, +because otherwise there is potential for input buffer overflow at the +receiving end of a UTF-8 connection. Perl always generates the shortest +length UTF-8, and with warnings on (C<-w> or C<use warnings;>) Perl will +warn about non-shortest length UTF-8 (and other malformations, too, +such as the surrogates, which are not real character code points.) + =head2 Unicode in Perl on EBCDIC The way Unicode is handled on EBCDIC platforms is still rather |