diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2002-03-19 04:58:22 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2002-03-19 04:58:22 +0000 |
commit | 64c66fb6d001b6ad9c6dcec93084b647d4c6eb13 (patch) | |
tree | a4f708a0a0e71b216bf952783da5418c7b9b9495 /pod/perluniintro.pod | |
parent | c3939953e4b292b4e1d7b8bbc60cede4dc14fcaf (diff) | |
download | perl-64c66fb6d001b6ad9c6dcec93084b647d4c6eb13.tar.gz |
Update the Unicode vs EBCDIC situation.
p4raw-id: //depot/perl@15313
Diffstat (limited to 'pod/perluniintro.pod')
-rw-r--r-- | pod/perluniintro.pod | 26 |
1 files changed, 17 insertions, 9 deletions
diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod index 8a7a055935..e36bb07dd7 100644 --- a/pod/perluniintro.pod +++ b/pod/perluniintro.pod @@ -169,15 +169,23 @@ To output UTF-8 always, use the ":utf8" output discipline. Prepending to this sample program ensures the output is completely UTF-8, and of course, removes the warning. -Perl 5.8.0 also supports Unicode on EBCDIC platforms. There, the -support is somewhat harder to implement since additional conversions -are needed at every step. Because of these difficulties, the Unicode -support isn't quite as full as in other, mainly ASCII-based, platforms -(the Unicode support is better than in the 5.6 series, which didn't -work much at all for EBCDIC platform). On EBCDIC platforms, the -internal Unicode encoding form is UTF-EBCDIC instead of UTF-8 (the -difference is that as UTF-8 is "ASCII-safe" in that ASCII characters -encode to UTF-8 as-is, UTF-EBCDIC is "EBCDIC-safe"). +=head2 Unicode and EBCDIC + +Perl 5.8.0 also supports Unicode on EBCDIC platforms. There, +the Unicode support is somewhat more complex to implement since +additional conversions are needed at every step. Some problems +remain, but they all seem to be related to the combination of +the extra mapping just described and case-insensitive matching: +for example, "\x{131}" (LATIN SMALL LETTER DOTLESS I) does not +match "I" case-insensitively, as it should under Unicode. +(The match succeeds in ASCII-derived platforms.) + +In any case, the Unicode support on EBCDIC platforms is better than +in the 5.6 series, which didn't work much at all for EBCDIC platform. +On EBCDIC platforms, the internal Unicode encoding form is UTF-EBCDIC +instead of UTF-8 (the difference is that as UTF-8 is "ASCII-safe" in +that ASCII characters encode to UTF-8 as-is, UTF-EBCDIC is +"EBCDIC-safe"). =head2 Creating Unicode |