diff options
author | Karl Williamson <khw@khw-desktop.(none)> | 2009-12-24 22:54:58 -0700 |
---|---|---|
committer | Abigail <abigail@abigail.be> | 2009-12-25 10:07:41 +0100 |
commit | e1b711dac329baf9cf4ea3e4628e6c713e24b342 (patch) | |
tree | b12ce1b41c2d6c0582296ddad541efd2ae3f71e2 /utf8.h | |
parent | 27bca3226281a592aed848b7e68ea50f27381dac (diff) | |
download | perl-e1b711dac329baf9cf4ea3e4628e6c713e24b342.tar.gz |
Update .pods
Signed-off-by: Abigail <abigail@abigail.be>
Diffstat (limited to 'utf8.h')
-rw-r--r-- | utf8.h | 8 |
1 files changed, 4 insertions, 4 deletions
@@ -72,17 +72,17 @@ END_EXTERN_C Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte U+0000..U+007F 00..7F - U+0080..U+07FF C2..DF 80..BF + U+0080..U+07FF * C2..DF 80..BF U+0800..U+0FFF E0 * A0..BF 80..BF U+1000..U+CFFF E1..EC 80..BF 80..BF - U+D000..U+D7FF ED * 80..9F 80..BF + U+D000..U+D7FF ED 80..9F 80..BF U+D800..U+DFFF +++++++ utf16 surrogates, not legal utf8 +++++++ U+E000..U+FFFF EE..EF 80..BF 80..BF U+10000..U+3FFFF F0 * 90..BF 80..BF 80..BF U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF U+100000..U+10FFFF F4 80..8F 80..BF 80..BF -Note the gaps before the 2nd Byte entries above marked by '*'. These are +Note the gaps before several of the byte entries above marked by '*'. These are caused by legal UTF-8 avoiding non-shortest encodings: it is technically possible to UTF-8-encode a single code point in different ways, but that is explicitly forbidden, and the shortest possible encoding should always be used @@ -101,7 +101,7 @@ explicitly forbidden, and the shortest possible encoding should always be used 00000dddccccccbbbbbbaaaaaa 11110ddd 10cccccc 10bbbbbb 10aaaaaa As you can see, the continuation bytes all begin with C<10>, and the -leading bits of the start byte tell how many bytes the are in the +leading bits of the start byte tell how many bytes there are in the encoded character. */ |