summaryrefslogtreecommitdiff
path: root/pod/perlunicode.pod
diff options
context:
space:
mode:
authorJarkko Hietaniemi <jhi@iki.fi>2001-12-16 02:45:06 +0000
committerJarkko Hietaniemi <jhi@iki.fi>2001-12-16 02:45:06 +0000
commit9466bab696bbe701541ead3a883c2387f5110da2 (patch)
tree070a0fde8d3f0dcf9fbcc1b789f7e4ee2c1ce265 /pod/perlunicode.pod
parent76ccdbe266e63a2a3ac21a782e44a6b13093ac7f (diff)
downloadperl-9466bab696bbe701541ead3a883c2387f5110da2.tar.gz
Make creating UTF-8 surrogates a punishable act.
p4raw-id: //depot/perl@13707
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r--pod/perlunicode.pod6
1 files changed, 6 insertions, 0 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 4102fc42a6..103b33b69a 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -740,6 +740,12 @@ and the decoding is
$uni = 0x10000 + ($hi - 0xD8000) * 0x400 + ($lo - 0xDC00);
+If you try to generate surrogates (for example by using chr()), you
+will get an error because firstly a surrogate on its own is
+meaningless, and secondly because Perl encodes its Unicode characters
+in UTF-8 (not 16-bit numbers), which makes the encoded character doubly
+illegal.
+
Because of the 16-bitness, UTF-16 is byteorder dependent. UTF-16
itself can be used for in-memory computations, but if storage or
transfer is required, either UTF-16BE (Big Endian) or UTF-16LE