summaryrefslogtreecommitdiff
path: root/pod/perluniintro.pod
diff options
context:
space:
mode:
authorJarkko Hietaniemi <jhi@iki.fi>2002-03-19 04:58:22 +0000
committerJarkko Hietaniemi <jhi@iki.fi>2002-03-19 04:58:22 +0000
commit64c66fb6d001b6ad9c6dcec93084b647d4c6eb13 (patch)
treea4f708a0a0e71b216bf952783da5418c7b9b9495 /pod/perluniintro.pod
parentc3939953e4b292b4e1d7b8bbc60cede4dc14fcaf (diff)
downloadperl-64c66fb6d001b6ad9c6dcec93084b647d4c6eb13.tar.gz
Update the Unicode vs EBCDIC situation.
p4raw-id: //depot/perl@15313
Diffstat (limited to 'pod/perluniintro.pod')
-rw-r--r--pod/perluniintro.pod26
1 files changed, 17 insertions, 9 deletions
diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod
index 8a7a055935..e36bb07dd7 100644
--- a/pod/perluniintro.pod
+++ b/pod/perluniintro.pod
@@ -169,15 +169,23 @@ To output UTF-8 always, use the ":utf8" output discipline. Prepending
to this sample program ensures the output is completely UTF-8, and
of course, removes the warning.
-Perl 5.8.0 also supports Unicode on EBCDIC platforms. There, the
-support is somewhat harder to implement since additional conversions
-are needed at every step. Because of these difficulties, the Unicode
-support isn't quite as full as in other, mainly ASCII-based, platforms
-(the Unicode support is better than in the 5.6 series, which didn't
-work much at all for EBCDIC platform). On EBCDIC platforms, the
-internal Unicode encoding form is UTF-EBCDIC instead of UTF-8 (the
-difference is that as UTF-8 is "ASCII-safe" in that ASCII characters
-encode to UTF-8 as-is, UTF-EBCDIC is "EBCDIC-safe").
+=head2 Unicode and EBCDIC
+
+Perl 5.8.0 also supports Unicode on EBCDIC platforms. There,
+the Unicode support is somewhat more complex to implement since
+additional conversions are needed at every step. Some problems
+remain, but they all seem to be related to the combination of
+the extra mapping just described and case-insensitive matching:
+for example, "\x{131}" (LATIN SMALL LETTER DOTLESS I) does not
+match "I" case-insensitively, as it should under Unicode.
+(The match succeeds in ASCII-derived platforms.)
+
+In any case, the Unicode support on EBCDIC platforms is better than
+in the 5.6 series, which didn't work much at all for EBCDIC platform.
+On EBCDIC platforms, the internal Unicode encoding form is UTF-EBCDIC
+instead of UTF-8 (the difference is that as UTF-8 is "ASCII-safe" in
+that ASCII characters encode to UTF-8 as-is, UTF-EBCDIC is
+"EBCDIC-safe").
=head2 Creating Unicode