summaryrefslogtreecommitdiff
path: root/pod/perlunicode.pod
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2013-03-11 21:13:38 -0600
committerKarl Williamson <public@khwilliamson.com>2013-03-11 21:21:03 -0600
commit4b9734bf16232aac75ed56df6352c09d1caad7b3 (patch)
treeb1fe580a02d6ae63b54c758d033c9722519e86b5 /pod/perlunicode.pod
parent020c4f9110283940e8755ca2f70f6e943b42efe3 (diff)
downloadperl-4b9734bf16232aac75ed56df6352c09d1caad7b3.tar.gz
EBCDIC has the Unicode bug too
We have not had a working modern Perl on EBCDIC for some years. When I started out, comments and code led me to conclude erroneously that natively it supported semantics for all 256 characters 0-255. It turns out that I was wrong; it natively (at least on some platforms) has the same rules (essentially none) for the characters which don't correspond to ASCII onees, as the rules for these on ASCII platforms. This commit is documentation only, mostly just removing the special mentions of EBCDIC.
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r--pod/perlunicode.pod9
1 files changed, 2 insertions, 7 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 7a0b91593e..7a98285acc 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -98,13 +98,8 @@ while C<use locale ':not_characters'> effectively also selects
C<use feature 'unicode_strings'> in its scope; see L<perllocale>.)
Otherwise, Perl uses the platform's native
byte semantics for characters whose code points are less than 256, and
-Unicode semantics for those greater than 255. On EBCDIC platforms, this
-is almost seamless, as the EBCDIC code pages that Perl handles are
-equivalent to Unicode's first 256 code points. (The exception is that
-EBCDIC regular expression case-insensitive matching rules are not as
-as robust as Unicode's.) But on ASCII platforms, Perl uses US-ASCII
-(or Basic Latin in Unicode terminology) byte semantics, meaning that characters
-whose ordinal numbers are in the range 128 - 255 are undefined except for their
+Unicode semantics for those greater than 255. That means that non-ASCII
+characters are undefined except for their
ordinal numbers. This means that none have case (upper and lower), nor are any
a member of character classes, like C<[:alpha:]> or C<\w>. (But all do belong
to the C<\W> class or the Perl regular expression extension C<[:^alpha:]>.)