Update comments and documentation dealing with utf

author: Karl <khw@karl.(none)> 2008-12-26 10:18:34 -0700
committer: Rafael Garcia-Suarez <rgarciasuarez@gmail.com> 2008-12-26 23:23:55 +0100
commit: fe749c9aa803ce74d997ff797103481a55741837 (patch)
tree: a8009cd572392a5b7a06cc5988ebc5661dd65f91 /pod/perlebcdic.pod
parent: eccdc4d715215b93b6b598d8cf3ac12e323f67e0 (diff)
download: perl-fe749c9aa803ce74d997ff797103481a55741837.tar.gz
1 files changed, 12 insertions, 11 deletions
diff --git a/pod/perlebcdic.pod b/pod/perlebcdic.pod
index ca4ef84408..26e6b3494f 100644
--- a/pod/perlebcdic.pod
+++ b/pod/perlebcdic.pod
@@ -153,20 +153,21 @@ depends on the ordinal number of that code point,
 with larger numbers requiring more bytes.
 UTF-EBCDIC is like UTF-8, but based on EBCDIC.
 
-In UTF-8, the code points corresponding to the lowest 128
-ordinal numbers (0 - 127) are the same (or C<invariant>)
-in UTF-8 or not.  They occupy one byte each.  All other Unicode code points
-require more than one byte to be represented in UTF-8.
-With UTF-EBCDIC, the term C<invariant> has a somewhat different meaning.
-(First, note that this is very different from the L</13 variant characters>
+You may see the term C<invariant> character or code point.
+This simply means that the character has the same numeric
+value when encoded as when not.
+(Note that this is a very different concept from L<The /13 variant characters>
 mentioned above.)
-In UTF-EBCDIC, an C<invariant> character or code point
-is one which takes up exactly one byte encoded, regardless
-of whether or not the encoding changes its value
-(which it most likely will).
+For example, the ordinal value of 'A' is 193 in most EBCDIC code pages,
+and also is 193 when encoded in UTF-EBCDIC.
+All other code points occupy at least two bytes when encoded.
+In UTF-8, the code points corresponding to the lowest 128
+ordinal numbers (0 - 127: the ASCII characters) are invariant.
+In UTF-EBCDIC, there are 160 invariant characters.
 (If you care, the EBCDIC invariants are those characters
-which correspond to the the ASCII characters, plus those that correspond to
+which have ASCII equivalents, plus those that correspond to
 the C1 controls (80..9f on ASCII platforms).)
+
 A string encoded in UTF-EBCDIC may be longer (but never shorter) than
 one encoded in UTF-8.
author	Karl <khw@karl.(none)>	2008-12-26 10:18:34 -0700
committer	Rafael Garcia-Suarez <rgarciasuarez@gmail.com>	2008-12-26 23:23:55 +0100
commit	fe749c9aa803ce74d997ff797103481a55741837 (patch)
tree	a8009cd572392a5b7a06cc5988ebc5661dd65f91 /pod/perlebcdic.pod
parent	eccdc4d715215b93b6b598d8cf3ac12e323f67e0 (diff)
download	perl-fe749c9aa803ce74d997ff797103481a55741837.tar.gz