diff options
author | Robin Barker <RMBarker@cpan.org> | 2002-02-27 12:25:30 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2002-02-27 13:41:14 +0000 |
commit | 5cb3728cfe288ad05e8d10c8176f72378da2238f (patch) | |
tree | 9fbfd12df7a9a44badbe6e8ad3a4775b3d160bb3 /pod/perlunicode.pod | |
parent | c9436a12b1ee8d5e32d19b5870c63a8435afed9d (diff) | |
download | perl-5cb3728cfe288ad05e8d10c8176f72378da2238f.tar.gz |
Re: [PATCH @14870] long C<=item>s and other pod->man->troff problems
Message-Id: <200202271225.MAA24806@tempest.npl.co.uk>
p4raw-id: //depot/perl@14892
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r-- | pod/perlunicode.pod | 24 |
1 files changed, 18 insertions, 6 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 1accda426b..7fb473ebe5 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -692,7 +692,9 @@ numbers. To use these numbers various encodings are needed. =over 4 -=item UTF-8 +=item + +UTF-8 UTF-8 is a variable-length (1 to 6 bytes, current character allocations require 4 bytes), byteorder independent encoding. For ASCII, UTF-8 is @@ -723,11 +725,15 @@ As you can see, the continuation bytes all begin with C<10>, and the leading bits of the start byte tells how many bytes the are in the encoded character. -=item UTF-EBCDIC +=item + +UTF-EBCDIC Like UTF-8, but EBCDIC-safe, as UTF-8 is ASCII-safe. -=item UTF-16, UTF-16BE, UTF16-LE, Surrogates, and BOMs (Byte Order Marks) +=item + +UTF-16, UTF-16BE, UTF16-LE, Surrogates, and BOMs (Byte Order Marks) (The followings items are mostly for reference, Perl doesn't use them internally.) @@ -778,20 +784,26 @@ sequence of bytes 0xFF 0xFE is unambiguously "BOM, represented in little-endian format" and cannot be "0xFFFE, represented in big-endian format". -=item UTF-32, UTF-32BE, UTF32-LE +=item + +UTF-32, UTF-32BE, UTF32-LE The UTF-32 family is pretty much like the UTF-16 family, expect that the units are 32-bit, and therefore the surrogate scheme is not needed. The BOM signatures will be 0x00 0x00 0xFE 0xFF for BE and 0xFF 0xFE 0x00 0x00 for LE. -=item UCS-2, UCS-4 +=item + +UCS-2, UCS-4 Encodings defined by the ISO 10646 standard. UCS-2 is a 16-bit encoding, UCS-4 is a 32-bit encoding. Unlike UTF-16, UCS-2 is not extensible beyond 0xFFFF, because it does not use surrogates. -=item UTF-7 +=item + +UTF-7 A seven-bit safe (non-eight-bit) encoding, useful if the transport/storage is not eight-bit safe. Defined by RFC 2152. |