diff options
author | Karl Williamson <public@khwilliamson.com> | 2013-03-11 21:13:38 -0600 |
---|---|---|
committer | Karl Williamson <public@khwilliamson.com> | 2013-03-11 21:21:03 -0600 |
commit | 4b9734bf16232aac75ed56df6352c09d1caad7b3 (patch) | |
tree | b1fe580a02d6ae63b54c758d033c9722519e86b5 /pod/perlunicode.pod | |
parent | 020c4f9110283940e8755ca2f70f6e943b42efe3 (diff) | |
download | perl-4b9734bf16232aac75ed56df6352c09d1caad7b3.tar.gz |
EBCDIC has the Unicode bug too
We have not had a working modern Perl on EBCDIC for some years. When I
started out, comments and code led me to conclude erroneously that
natively it supported semantics for all 256 characters 0-255. It turns
out that I was wrong; it natively (at least on some platforms) has the
same rules (essentially none) for the characters which don't correspond
to ASCII onees, as the rules for these on ASCII platforms.
This commit is documentation only, mostly just removing the special
mentions of EBCDIC.
Diffstat (limited to 'pod/perlunicode.pod')
-rw-r--r-- | pod/perlunicode.pod | 9 |
1 files changed, 2 insertions, 7 deletions
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 7a0b91593e..7a98285acc 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -98,13 +98,8 @@ while C<use locale ':not_characters'> effectively also selects C<use feature 'unicode_strings'> in its scope; see L<perllocale>.) Otherwise, Perl uses the platform's native byte semantics for characters whose code points are less than 256, and -Unicode semantics for those greater than 255. On EBCDIC platforms, this -is almost seamless, as the EBCDIC code pages that Perl handles are -equivalent to Unicode's first 256 code points. (The exception is that -EBCDIC regular expression case-insensitive matching rules are not as -as robust as Unicode's.) But on ASCII platforms, Perl uses US-ASCII -(or Basic Latin in Unicode terminology) byte semantics, meaning that characters -whose ordinal numbers are in the range 128 - 255 are undefined except for their +Unicode semantics for those greater than 255. That means that non-ASCII +characters are undefined except for their ordinal numbers. This means that none have case (upper and lower), nor are any a member of character classes, like C<[:alpha:]> or C<\w>. (But all do belong to the C<\W> class or the Perl regular expression extension C<[:^alpha:]>.) |