summaryrefslogtreecommitdiff
path: root/handy.h
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2013-03-11 21:13:38 -0600
committerKarl Williamson <public@khwilliamson.com>2013-03-11 21:21:03 -0600
commit4b9734bf16232aac75ed56df6352c09d1caad7b3 (patch)
treeb1fe580a02d6ae63b54c758d033c9722519e86b5 /handy.h
parent020c4f9110283940e8755ca2f70f6e943b42efe3 (diff)
downloadperl-4b9734bf16232aac75ed56df6352c09d1caad7b3.tar.gz
EBCDIC has the Unicode bug too
We have not had a working modern Perl on EBCDIC for some years. When I started out, comments and code led me to conclude erroneously that natively it supported semantics for all 256 characters 0-255. It turns out that I was wrong; it natively (at least on some platforms) has the same rules (essentially none) for the characters which don't correspond to ASCII onees, as the rules for these on ASCII platforms. This commit is documentation only, mostly just removing the special mentions of EBCDIC.
Diffstat (limited to 'handy.h')
-rw-r--r--handy.h22
1 files changed, 9 insertions, 13 deletions
diff --git a/handy.h b/handy.h
index a65523e492..a969d1a2ec 100644
--- a/handy.h
+++ b/handy.h
@@ -489,19 +489,15 @@ Perl rules. If the input is a number that doesn't fit in an octet, FALSE is
always returned.
Variant C<isFOO_A> (e.g., C<isALPHA_A()>) will return TRUE only if the input is
-also in the ASCII character set. For ASCII platforms, the base function with
-no suffix and the one with the C<_A> suffix are identical. On EBCDIC
-platforms, the C<_A> suffix function will not return true unless the specified
-character also has an ASCII equivalent.
-
-Variant C<isFOO_L1> operates on the full Latin1 character set. For EBCDIC
-platforms, the base function with no suffix and the one with the C<_L1> suffix
-are identical. For ASCII platforms, the C<_L1> suffix imposes the Latin-1
-character set onto the platform. That is, the code points that are ASCII are
-unaffected, since ASCII is a subset of Latin-1. But the non-ASCII code points
-are treated as if they are Latin-1 characters. For example, C<isSPACE_L1()>
-will return true when called with the code point 0xA0, which is the Latin-1
-NO-BREAK SPACE.
+also in the ASCII character set. The base function with no suffix and the one
+with the C<_A> suffix are identical.
+
+Variant C<isFOO_L1> imposes the Latin-1 (or EBCDIC equivlalent) character set
+onto the platform. That is, the code points that are ASCII are unaffected,
+since ASCII is a subset of Latin-1. But the non-ASCII code points are treated
+as if they are Latin-1 characters. For example, C<isWORDCHAR_L1()> will return
+true when called with the code point 0xDF, which is a word character in both
+ASCII and EBCDIC (though it represent different characters in each).
Variant C<isFOO_uni> is like the C<isFOO_L1> variant, but accepts any UV code
point as input. If the code point is larger than 255, Unicode rules are used