diff options
author | Steve Peters <steve@fisharerojo.org> | 2008-12-19 11:38:31 -0600 |
---|---|---|
committer | Steve Peters <steve@fisharerojo.org> | 2008-12-19 11:38:31 -0600 |
commit | 2bbc8d558d247c6ef91207a12a4650c0bc292dd6 (patch) | |
tree | f56c82008dc643d8e799b8e21fb9a3c36b64b3b4 /pod/perlhack.pod | |
parent | 7df2e4bc09d8ad053532c5f9232b2d713856c938 (diff) | |
download | perl-2bbc8d558d247c6ef91207a12a4650c0bc292dd6.tar.gz |
Subject: PATCH 5.10 documentation
From: karl williamson <public@khwilliamson.com>
Date: Tue, 16 Dec 2008 16:00:34 -0700
Message-ID: <49483312.80804@khwilliamson.com>
Diffstat (limited to 'pod/perlhack.pod')
-rw-r--r-- | pod/perlhack.pod | 61 |
1 files changed, 58 insertions, 3 deletions
diff --git a/pod/perlhack.pod b/pod/perlhack.pod index b2192d2752..ef648e7776 100644 --- a/pod/perlhack.pod +++ b/pod/perlhack.pod @@ -518,7 +518,7 @@ you should see something like this: (Then creating the symlinks...) The specifics may vary based on your operating system, of course. -After you see this, you can abort the F<Configure> script, and you +After it's all done, you will see that the directory you are in has a tree of symlinks to the F<perl-rsync> directories and files. @@ -2646,6 +2646,61 @@ sizeof() of the field =item * +Assuming the character set is ASCIIish + +Perl can compile and run under EBCDIC platforms. See L<perlebcdic>. +This is transparent for the most part, but because the character sets +differ, you shouldn't use numeric (decimal, octal, nor hex) constants +to refer to characters. You can safely say 'A', but not 0x41. +You can safely say '\n', but not \012. +If a character doesn't have a trivial input form, you can +create a #define for it in both C<utfebcdic.h> and C<utf8.h>, so that +it resolves to different values depending on the character set being used. +(There are three different EBCDIC character sets defined in C<utfebcdic.h>, +so it might be best to insert the #define three times in that file.) + +Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper case +alphabetic characters. That is not true in EBCDIC. Nor for 'a' to 'z'. +But '0' - '9' is an unbroken range in both systems. Don't assume anything +about other ranges. + +Many of the comments in the existing code ignore the possibility of EBCDIC, +and may be wrong therefore, even if the code works. +This is actually a tribute to the successful transparent insertion of being +able to handle EBCDIC. without having to change pre-existing code. + +UTF-8 and UTF-EBCDIC are two different encodings used to represent Unicode +code points as sequences of bytes. Macros +with the same names (but different definitions) +in C<utf8.h> and C<utfebcdic.h> +are used to allow the calling code think that there is only one such encoding. +This is almost always referred to as C<utf8>, but it means the EBCDIC +version as well. Comments in the code may well be wrong even if the code +itself is right. +For example, the concept of C<invariant characters> differs between ASCII and +EBCDIC. +On ASCII platforms, only characters that do not have the high-order +bit set (i.e. whose ordinals are strict ASCII, 0 - 127) +are invariant, and the documentation and comments in the code +may assume that, +often referring to something like, say, C<hibit>. +The situation differs and is not so simple on EBCDIC machines, but as long as +the code itself uses the C<NATIVE_IS_INVARIANT()> macro appropriately, it +works, even if the comments are wrong. + +=item * + +Assuming the character set is just ASCII + +ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra +characters have different meanings depending on the locale. Absent a locale, +currently these extra characters are generally considered to be unassigned, +and this has presented some problems. +This is scheduled to be changed in 5.12 so that these characters will +be considered to be Latin-1 (ISO-8859-1). + +=item * + Mixing #define and #ifdef #define BURGLE(x) ... \ @@ -2660,7 +2715,7 @@ you need two separate BURGLE() #defines, one for each #ifdef branch. =item * -Adding stuff after #endif or #else +Adding non-comment stuff after #endif or #else #ifdef SNOSH ... @@ -2836,7 +2891,7 @@ admittedly use them if available to gain some extra speed =item * -Binding together several statements +Binding together several statements in a macro Use the macros STMT_START and STMT_END. |