diff options
author | Karl <khw@karl.(none)> | 2008-12-26 10:18:34 -0700 |
---|---|---|
committer | Rafael Garcia-Suarez <rgarciasuarez@gmail.com> | 2008-12-26 23:23:55 +0100 |
commit | fe749c9aa803ce74d997ff797103481a55741837 (patch) | |
tree | a8009cd572392a5b7a06cc5988ebc5661dd65f91 /pod/perlebcdic.pod | |
parent | eccdc4d715215b93b6b598d8cf3ac12e323f67e0 (diff) | |
download | perl-fe749c9aa803ce74d997ff797103481a55741837.tar.gz |
Update comments and documentation dealing with utf
Diffstat (limited to 'pod/perlebcdic.pod')
-rw-r--r-- | pod/perlebcdic.pod | 23 |
1 files changed, 12 insertions, 11 deletions
diff --git a/pod/perlebcdic.pod b/pod/perlebcdic.pod index ca4ef84408..26e6b3494f 100644 --- a/pod/perlebcdic.pod +++ b/pod/perlebcdic.pod @@ -153,20 +153,21 @@ depends on the ordinal number of that code point, with larger numbers requiring more bytes. UTF-EBCDIC is like UTF-8, but based on EBCDIC. -In UTF-8, the code points corresponding to the lowest 128 -ordinal numbers (0 - 127) are the same (or C<invariant>) -in UTF-8 or not. They occupy one byte each. All other Unicode code points -require more than one byte to be represented in UTF-8. -With UTF-EBCDIC, the term C<invariant> has a somewhat different meaning. -(First, note that this is very different from the L</13 variant characters> +You may see the term C<invariant> character or code point. +This simply means that the character has the same numeric +value when encoded as when not. +(Note that this is a very different concept from L<The /13 variant characters> mentioned above.) -In UTF-EBCDIC, an C<invariant> character or code point -is one which takes up exactly one byte encoded, regardless -of whether or not the encoding changes its value -(which it most likely will). +For example, the ordinal value of 'A' is 193 in most EBCDIC code pages, +and also is 193 when encoded in UTF-EBCDIC. +All other code points occupy at least two bytes when encoded. +In UTF-8, the code points corresponding to the lowest 128 +ordinal numbers (0 - 127: the ASCII characters) are invariant. +In UTF-EBCDIC, there are 160 invariant characters. (If you care, the EBCDIC invariants are those characters -which correspond to the the ASCII characters, plus those that correspond to +which have ASCII equivalents, plus those that correspond to the C1 controls (80..9f on ASCII platforms).) + A string encoded in UTF-EBCDIC may be longer (but never shorter) than one encoded in UTF-8. |