perlunicode.pod refers to a future which is now the past.

Get perlhack from bleadperl (but note that the description of the test suite is not in line with what's in perl 5.6.2). p4raw-id: //depot/maint-5.6/perl-5.6.2@21281
author: Rafael Garcia-Suarez <rgarciasuarez@gmail.com> 2003-09-18 20:48:50 +0000
committer: Rafael Garcia-Suarez <rgarciasuarez@gmail.com> 2003-09-18 20:48:50 +0000
commit: a1d64b115abe65e79c49e9ce69b44def625912c7 (patch)
tree: ad2de6abe73cc9f5e60c7c16ea57d66c719242ec
parent: d3e9bfc518091918d9eca26f6caa523aa1b07356 (diff)
download: perl-a1d64b115abe65e79c49e9ce69b44def625912c7.tar.gz
2 files changed, 823 insertions, 104 deletions
diff --git a/pod/perlhack.pod b/pod/perlhack.pod
index d524fe55f5..b85914f40f 100644
--- a/pod/perlhack.pod
+++ b/pod/perlhack.pod
@@ -14,14 +14,13 @@ messages a day, depending on the heatedness of the debate.  Most days
 there are two or three patches, extensions, features, or bugs being
 discussed at a time.
 
-A searchable archive of the list is at:
+A searchable archive of the list is at either:
 
     http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
 
-The list is also archived under the usenet group name
-C<perl.porters-gw> at:
+or
 
-    http://www.deja.com/
+    http://archive.develooper.com/perl5-porters@perl.org/
 
 List subscribers (the porters themselves) come in several flavours.
 Some are quiet curious lurkers, who rarely pitch in and instead watch
@@ -38,12 +37,13 @@ in what does and does not change in the Perl language.  Various
 releases of Perl are shepherded by a ``pumpking'', a porter
 responsible for gathering patches, deciding on a patch-by-patch
 feature-by-feature basis what will and will not go into the release.
-For instance, Gurusamy Sarathy is the pumpking for the 5.6 release of
-Perl.
+For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of
+Perl, and Jarkko Hietaniemi is the pumpking for the 5.8 release, and
+Hugo van der Sanden will be the pumpking for the 5.10 release.
 
 In addition, various people are pumpkings for different things.  For
 instance, Andy Dougherty and Jarkko Hietaniemi share the I<Configure>
-pumpkin, and Tom Christiansen is the documentation pumpking.
+pumpkin.
 
 Larry sees Perl development along the lines of the US government:
 there's the Legislature (the porters), the Executive branch (the
@@ -158,13 +158,22 @@ The worst patches make use of a system-specific features.  It's highly
 unlikely that nonportable additions to the Perl language will be
 accepted.
 
+=item Is the implementation tested?
+
+Patches which change behaviour (fixing bugs or introducing new features)
+must include regression tests to verify that everything works as expected.
+Without tests provided by the original author, how can anyone else changing
+perl in the future be sure that they haven't unwittingly broken the behaviour
+the patch implements? And without tests, how can the patch's author be
+confident that his/her hard work put into the patch won't be accidentally
+thrown away by someone in the future?
+
 =item Is there enough documentation?
 
 Patches without documentation are probably ill-thought out or
 incomplete.  Nothing can be added without documentation, so submitting
 a patch for the appropriate manpages as well as the source code is
-always a good idea.  If appropriate, patches should add to the test
-suite as well.
+always a good idea.
 
 =item Is there another way to do it?
 
@@ -197,8 +206,8 @@ interpreter.  ``A core module'' is one that ships with Perl.
 =head2 Keeping in sync
 
 The source code to the Perl interpreter, in its different versions, is
-kept in a repository managed by a revision control system (which is
-currently the Perforce program, see http://perforce.com/).  The
+kept in a repository managed by a revision control system ( which is
+currently the Perforce program, see http://perforce.com/ ).  The
 pumpkings and a few others have access to the repository to check in
 changes.  Periodically the pumpking for the development version of Perl
 will release a new version, so the rest of the porters can see what's
@@ -206,8 +215,20 @@ changed.  The current state of the main trunk of repository, and patches
 that describe the individual changes that have happened since the last
 public release are available at this location:
 
+    http://public.activestate.com/gsar/APC/
     ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/
 
+If you're looking for a particular change, or a change that affected
+a particular set of files, you may find the B<Perl Repository Browser>
+useful:
+
+    http://public.activestate.com/cgi-bin/perlbrowse
+
+You may also want to subscribe to the perl5-changes mailing list to
+receive a copy of each patch that gets submitted to the maintenance
+and development "branches" of the perl repository.  See
+http://lists.perl.org/ for subscription information.
+
 If you are a member of the perl5-porters mailing list, it is a good
 thing to keep in touch with the most recent changes. If not only to
 verify if what you would have posted as a bug report isn't already
@@ -241,6 +262,17 @@ the latest applied patch level, creating files that are new (to your
 distribution) and setting date/time stamps of existing files to
 reflect the bleadperl status.
 
+Note that this will not delete any files that were in '.' before
+the rsync. Once you are sure that the rsync is running correctly,
+run it with the --delete and the --dry-run options like this:
+
+ # rsync -avz --delete --dry-run rsync://ftp.linux.activestate.com/perl-current/ .
+
+This will I<simulate> an rsync run that also deletes files not
+present in the bleadperl master copy. Observe the results from
+this run closely. If you are sure that the actual run would delete
+no files precious to you, you could remove the '--dry-run' option.
+
 You can than check what patch was the latest that was applied by
 looking in the file B<.patch>, which will show the number of the
 latest patch.
@@ -258,11 +290,11 @@ Set up a local rsync server which makes the rsynced source tree
 available to the LAN and sync the other machines against this
 directory.
 
-From http://rsync.samba.org/README.html:
+From http://rsync.samba.org/README.html :
 
    "Rsync uses rsh or ssh for communication. It does not need to be
     setuid and requires no special privileges for installation.  It
-    does not require a inetd entry or a deamon.  You must, however,
+    does not require an inetd entry or a daemon.  You must, however,
     have a working rsh or ssh system.  Using ssh is recommended for
     its security features."
 
@@ -328,7 +360,7 @@ patch directory.
 
 It's then up to you to apply these patches, using something like
 
- # last=`ls -rt1 *.gz | tail -1`
+ # last=`ls -t *.gz | sed q`
  # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ .
  # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
  # cd ../perl-current
@@ -343,27 +375,18 @@ from Andreas K�nig to have better control over the patching process.
 
 =over 4
 
-=item It's easier
+=item It's easier to rsync the source tree
 
 Since you don't have to apply the patches yourself, you are sure all
 files in the source tree are in the right state.
 
-=item It's more recent
-
-According to Gurusamy Sarathy:
-
-   "... The rsync mirror is automatic and syncs with the repository
-    every five minutes.
-
-   "Updating the patch  area  still  requires  manual  intervention
-    (with all the goofiness that implies,  which you've noted)  and
-    is typically on a daily cycle.   Making this process  automatic
-    is on my tuit list, but don't ask me when."
-
 =item It's more reliable
 
-Well, since the patches are updated by hand, I don't have to say any
-more ... (see Sarathy's remark).
+While both the rsync-able source and patch areas are automatically
+updated every few minutes, keep in mind that applying patches may
+sometimes mean careful hand-holding, especially if your version of
+the C<patch> program does not understand how to deal with new files,
+files with 8-bit characters, or files without trailing newlines.
 
 =back
 
@@ -371,7 +394,7 @@ more ... (see Sarathy's remark).
 
 =over 4
 
-=item It's easier
+=item It's easier to rsync the patches
 
 If you have more than one machine that you want to keep in track with
 bleadperl, it's easier to rsync the patches only once and then apply
@@ -423,7 +446,7 @@ look how others apply the fix.
 =item Finding the source of misbehaviour
 
 When you keep in sync with bleadperl, the pumpking would love to
-I<see> that the community efforts realy work. So after each of his
+I<see> that the community efforts really work. So after each of his
 sync points, you are to 'make test' to check if everything is still
 in working order. If it is, you do 'make ok', which will send an OK
 report to perlbug@perl.org. (If you do not have access to a mailer
@@ -431,7 +454,7 @@ from the system you just finished successfully 'make test', you can
 do 'make okfile', which creates the file C<perl.ok>, which you can
 than take to your favourite mailer and mail yourself).
 
-But of course, as always, things will not allways lead to a success
+But of course, as always, things will not always lead to a success
 path, and one or more test do not pass the 'make test'. Before
 sending in a bug report (using 'make nok' or 'make nokfile'), check
 the mailing list if someone else has reported the bug already and if
@@ -450,44 +473,66 @@ If searching the patches is too bothersome, you might consider using
 perl's bugtron to find more information about discussions and
 ramblings on posted bugs.
 
-=back
-
 If you want to get the best of both worlds, rsync both the source
 tree for convenience, reliability and ease and rsync the patches
 for reference.
 
+=back
+
+
+=head2 Perlbug administration
+
+There is a single remote administrative interface for modifying bug status, 
+category, open issues etc. using the B<RT> I<bugtracker> system, maintained
+by I<Robert Spier>.  Become an administrator, and close any bugs you can get 
+your sticky mitts on:
+
+	http://rt.perl.org
+
+The bugtracker mechanism for B<perl5> bugs in particular is at:
+
+	http://bugs6.perl.org/perlbug
+
+To email the bug system administrators:
+
+	"perlbug-admin" <perlbug-admin@perl.org>
+
+
 =head2 Submitting patches
 
-Always submit patches to I<perl5-porters@perl.org>.  This lets other
-porters review your patch, which catches a surprising number of errors
-in patches.  Either use the diff program (available in source code
-form from I<ftp://ftp.gnu.org/pub/gnu/>), or use Johan Vromans'
-I<makepatch> (available from I<CPAN/authors/id/JV/>).  Unified diffs
-are preferred, but context diffs are accepted.  Do not send RCS-style
-diffs or diffs without context lines.  More information is given in
-the I<Porting/patching.pod> file in the Perl source distribution.
-Please patch against the latest B<development> version (e.g., if
-you're fixing a bug in the 5.005 track, patch against the latest
-5.005_5x version).  Only patches that survive the heat of the
-development branch get applied to maintenance versions.
-
-Your patch should update the documentation and test suite.
+Always submit patches to I<perl5-porters@perl.org>.  If you're
+patching a core module and there's an author listed, send the author a
+copy (see L<Patching a core module>).  This lets other porters review
+your patch, which catches a surprising number of errors in patches.
+Either use the diff program (available in source code form from
+ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' I<makepatch>
+(available from I<CPAN/authors/id/JV/>).  Unified diffs are preferred,
+but context diffs are accepted.  Do not send RCS-style diffs or diffs
+without context lines.  More information is given in the
+I<Porting/patching.pod> file in the Perl source distribution.  Please
+patch against the latest B<development> version (e.g., if you're
+fixing a bug in the 5.005 track, patch against the latest 5.005_5x
+version).  Only patches that survive the heat of the development
+branch get applied to maintenance versions.
+
+Your patch should update the documentation and test suite.  See
+L<Writing a test>.
 
 To report a bug in Perl, use the program I<perlbug> which comes with
 Perl (if you can't get Perl to work, send mail to the address
 I<perlbug@perl.org> or I<perlbug@perl.com>).  Reporting bugs through
 I<perlbug> feeds into the automated bug-tracking system, access to
-which is provided through the web at I<http://bugs.perl.org/>.  It
+which is provided through the web at http://bugs.perl.org/ .  It
 often pays to check the archives of the perl5-porters mailing list to
 see whether the bug you're reporting has been reported before, and if
 so whether it was considered a bug.  See above for the location of
 the searchable archives.
 
-The CPAN testers (I<http://testers.cpan.org/>) are a group of
-volunteers who test CPAN modules on a variety of platforms.  Perl Labs
-(I<http://labs.perl.org/>) automatically tests Perl source releases on
-platforms and gives feedback to the CPAN testers mailing list.  Both
-efforts welcome volunteers.
+The CPAN testers ( http://testers.cpan.org/ ) are a group of
+volunteers who test CPAN modules on a variety of platforms.  Perl
+Smokers ( http://archives.develooper.com/daily-build@perl.org/ )
+automatically tests Perl source releases on platforms with various
+configurations.  Both efforts welcome volunteers.
 
 It's a good idea to read and lurk for a while before chipping in.
 That way you'll get to see the dynamic of the conversations, learn the
@@ -513,7 +558,7 @@ source, and we'll do that later on.
 You might also want to look at Gisle Aas's illustrated perlguts -
 there's no guarantee that this will be absolutely up-to-date with the
 latest documentation in the Perl core, but the fundamentals will be
-right. (http://gisle.aas.no/perl/illguts/)
+right. ( http://gisle.aas.no/perl/illguts/ )
 
 =item L<perlxstut> and L<perlxs>
 
@@ -536,12 +581,11 @@ wanting to go about Perl development.
 
 =item The perl5-porters FAQ
 
-This is posted to perl5-porters at the beginning on every month, and
-should be available from http://perlhacker.org/p5p-faq; alternatively,
-you can get the FAQ emailed to you by sending mail to
-C<perl5-porters-faq@perl.org>. It contains hints on reading
-perl5-porters, information on how perl5-porters works and how Perl
-development in general works.
+This should be available from http://simon-cozens.org/writings/p5p-faq ;
+alternatively, you can get the FAQ emailed to you by sending mail to
+C<perl5-porters-faq@perl.org>. It contains hints on reading perl5-porters,
+information on how perl5-porters works and how Perl development in general
+works.
 
 =back
 
@@ -559,6 +603,13 @@ Modules shipped as part of the Perl core live in the F<lib/> and F<ext/>
 subdirectories: F<lib/> is for the pure-Perl modules, and F<ext/>
 contains the core XS modules.
 
+=item Tests
+
+There are tests for nearly all the modules, built-ins and major bits
+of functionality.  Test files all have a .t suffix.  Module tests live
+in the F<lib/> and F<ext/> directories next to the module being
+tested.  Others live in F<t/>.  See L<Writing a test>
+
 =item Documentation
 
 Documentation maintenance includes looking after everything in the
@@ -824,7 +875,7 @@ C<"\0">.
 
 Line 13 manipulates the flags; since we've changed the PV, any IV or NV
 values will no longer be valid: if we have C<$a=10; $a.="6";> we don't
-want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF8-aware
+want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF-8-aware
 version of C<SvPOK_only>, a macro which turns off the IOK and NOK flags
 and turns on POK. The final C<SvTAINT> is a macro which launders tainted
 data if taint mode is turned on.
@@ -854,7 +905,8 @@ operations in.
 
 The easiest way to examine the op tree is to stop Perl after it has
 finished parsing, and get it to dump out the tree. This is exactly what
-the compiler backends L<B::Terse|B::Terse> and L<B::Debug|B::Debug> do.
+the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise>
+and L<B::Debug|B::Debug> do.
 
 Let's have a look at how Perl sees C<$a = $b + $c>:
 
@@ -1164,6 +1216,13 @@ important ones are explained in L<perlxs> as well. Pay special attention
 to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for information on
 the C<[pad]THX_?> macros.
 
+=head2 The .i Targets
+
+You can expand the macros in a F<foo.c> file by saying
+
+    make foo.i
+
+which will expand the macros using cpp.  Don't be scared by the results.
 
 =head2 Poking at Perl
 
@@ -1258,8 +1317,11 @@ blessing when stepping through miles of source code.
 =item print
 
 Execute the given C code and print its results. B<WARNING>: Perl makes
-heavy use of macros, and F<gdb> is not aware of macros. You'll have to
-substitute them yourself. So, for instance, you can't say
+heavy use of macros, and F<gdb> does not necessarily support macros
+(see later L</"gdb macro support">).  You'll have to substitute them
+yourself, or to invoke cpp on the source code files
+(see L</"The .i Targets">)
+So, for instance, you can't say
 
     print SvPV_nolen(sv)
 
@@ -1267,11 +1329,19 @@ but you have to say
 
     print Perl_sv_2pv_nolen(sv)
 
+=back
+
 You may find it helpful to have a "macro dictionary", which you can
 produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't
-recursively apply the macros for you. 
+recursively apply those macros for you. 
 
-=back
+=head2 gdb macro support
+
+Recent versions of F<gdb> have fairly good macro support, but
+in order to use it you'll need to compile perl with macro definitions
+included in the debugging information.  Using F<gcc> version 3.1, this
+means configuring with C<-Doptimize=-g3>.  Other compilers might use a
+different switch (if they support debugging macros at all).
 
 =head2 Dumping Perl Data Structures
 
@@ -1360,7 +1430,7 @@ similar output to L<B::Debug|B::Debug>.
             }
         }
 
-< finish this later >
+# finish this later #
 
 =head2 Patching
 
@@ -1369,13 +1439,16 @@ some things you'll need to know when fiddling with them. Let's now get
 on and create a simple patch. Here's something Larry suggested: if a
 C<U> is the first active format during a C<pack>, (for example, 
 C<pack "U3C8", @stuff>) then the resulting string should be treated as
-UTF8 encoded.
+UTF-8 encoded.
 
 How do we prepare to fix this up? First we locate the code in question -
 the C<pack> happens at runtime, so it's going to be in one of the F<pp>
 files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going to be
 altering this file, let's copy it to F<pp.c~>.
 
+[Well, it was in F<pp.c> when this tutorial was written. It has now been
+split off with C<pp_unpack> to its own file, F<pp_pack.c>]
+
 Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then
 loop over the pattern, taking each format character in turn into
 C<datum_type>. Then for each possible format character, we swallow up
@@ -1415,7 +1488,7 @@ of C<pat>:
     while (pat < patend) {
 
 Now if we see a C<U> which was at the start of the string, we turn on
-the UTF8 flag for the output SV, C<cat>:
+the C<UTF8> flag for the output SV, C<cat>:
 
  +  if (datumtype == 'U' && pat==patcopy+1)
  +      SvUTF8_on(cat);
@@ -1447,32 +1520,48 @@ we must document that change. We must also provide some more regression
 tests to make sure our patch works and doesn't create a bug somewhere
 else along the line.
 
-The regression tests for each operator live in F<t/op/>, and so we make
-a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our tests
-to the end. First, we'll test that the C<U> does indeed create Unicode
-strings:
+The regression tests for each operator live in F<t/op/>, and so we
+make a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our
+tests to the end. First, we'll test that the C<U> does indeed create
+Unicode strings.  
+
+t/op/pack.t has a sensible ok() function, but if it didn't we could
+use the one from t/test.pl.
+
+ require './test.pl';
+ plan( tests => 159 );
+
+so instead of this:
 
  print 'not ' unless "1.20.300.4000" eq sprintf "%vd", pack("U*",1,20,300,4000);
  print "ok $test\n"; $test++;
 
+we can write the more sensible (see L<Test::More> for a full
+explanation of is() and other testing functions).
+
+ is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000), 
+                                       "U* produces unicode" );
+
 Now we'll test that we got that space-at-the-beginning business right:
 
- print 'not ' unless "1.20.300.4000" eq
-                     sprintf "%vd", pack("  U*",1,20,300,4000);
- print "ok $test\n"; $test++;
+ is( "1.20.300.4000", sprintf "%vd", pack("  U*",1,20,300,4000),
+                                       "  with spaces at the beginning" );
 
 And finally we'll test that we don't make Unicode strings if C<U> is B<not>
 the first active format:
 
- print 'not ' unless v1.20.300.4000 ne
-                     sprintf "%vd", pack("C0U*",1,20,300,4000);
- print "ok $test\n"; $test++;
+ isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000),
+                                       "U* not first isn't unicode" );
 
-Mustn't forget to change the number of tests which appears at the top, or
-else the automated tester will get confused:
+Mustn't forget to change the number of tests which appears at the top,
+or else the automated tester will get confused.  This will either look
+like this:
 
- -print "1..156\n";
- +print "1..159\n";
+ print "1..156\n";
+
+or this:
+
+ plan( tests => 156 );
 
 We now compile up Perl, and run it through the test suite. Our new
 tests pass, hooray!
@@ -1485,10 +1574,10 @@ this text in the description of C<pack>:
  =item *
 
  If the pattern begins with a C<U>, the resulting string will be treated
- as Unicode-encoded. You can force UTF8 encoding on in a string with an
- initial C<U0>, and the bytes that follow will be interpreted as Unicode
- characters. If you don't want this to happen, you can begin your pattern
- with C<C0> (or anything else) to force Perl not to UTF8 encode your
+ as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string
+ with an initial C<U0>, and the bytes that follow will be interpreted as
+ Unicode characters. If you don't want this to happen, you can begin your
+ pattern with C<C0> (or anything else) to force Perl not to UTF-8 encode your
  string, and then follow this with a C<U*> somewhere in your pattern.
 
 All done. Now let's create the patch. F<Porting/patching.pod> tells us
@@ -1521,6 +1610,295 @@ We end up with a patch looking a little like this:
 And finally, we submit it, with our rationale, to perl5-porters. Job
 done!
 
+=head2 Patching a core module
+
+This works just like patching anything else, with an extra
+consideration.  Many core modules also live on CPAN.  If this is so,
+patch the CPAN version instead of the core and send the patch off to
+the module maintainer (with a copy to p5p).  This will help the module
+maintainer keep the CPAN version in sync with the core version without
+constantly scanning p5p.
+
+=head2 Adding a new function to the core
+
+If, as part of a patch to fix a bug, or just because you have an
+especially good idea, you decide to add a new function to the core,
+discuss your ideas on p5p well before you start work.  It may be that
+someone else has already attempted to do what you are considering and
+can give lots of good advice or even provide you with bits of code
+that they already started (but never finished).
+
+You have to follow all of the advice given above for patching.  It is
+extremely important to test any addition thoroughly and add new tests
+to explore all boundary conditions that your new function is expected
+to handle.  If your new function is used only by one module (e.g. toke),
+then it should probably be named S_your_function (for static); on the
+other hand, if you expect it to accessible from other functions in
+Perl, you should name it Perl_your_function.  See L<perlguts/Internal Functions>
+for more details.
+
+The location of any new code is also an important consideration.  Don't
+just create a new top level .c file and put your code there; you would
+have to make changes to Configure (so the Makefile is created properly),
+as well as possibly lots of include files.  This is strictly pumpking
+business.
+
+It is better to add your function to one of the existing top level
+source code files, but your choice is complicated by the nature of
+the Perl distribution.  Only the files that are marked as compiled
+static are located in the perl executable.  Everything else is located
+in the shared library (or DLL if you are running under WIN32).  So,
+for example, if a function was only used by functions located in
+toke.c, then your code can go in toke.c.  If, however, you want to call
+the function from universal.c, then you should put your code in another
+location, for example util.c.
+
+In addition to writing your c-code, you will need to create an
+appropriate entry in embed.pl describing your function, then run
+'make regen_headers' to create the entries in the numerous header
+files that perl needs to compile correctly.  See L<perlguts/Internal Functions>
+for information on the various options that you can set in embed.pl.
+You will forget to do this a few (or many) times and you will get
+warnings during the compilation phase.  Make sure that you mention
+this when you post your patch to P5P; the pumpking needs to know this.
+
+When you write your new code, please be conscious of existing code
+conventions used in the perl source files.  See L<perlstyle> for
+details.  Although most of the guidelines discussed seem to focus on
+Perl code, rather than c, they all apply (except when they don't ;).
+See also I<Porting/patching.pod> file in the Perl source distribution
+for lots of details about both formatting and submitting patches of
+your changes.
+
+Lastly, TEST TEST TEST TEST TEST any code before posting to p5p.
+Test on as many platforms as you can find.  Test as many perl
+Configure options as you can (e.g. MULTIPLICITY).  If you have
+profiling or memory tools, see L<EXTERNAL TOOLS FOR DEBUGGING PERL>
+below for how to use them to further test your code.  Remember that
+most of the people on P5P are doing this on their own time and
+don't have the time to debug your code.
+
+=head2 Writing a test
+
+Every module and built-in function has an associated test file (or
+should...).  If you add or change functionality, you have to write a
+test.  If you fix a bug, you have to write a test so that bug never
+comes back.  If you alter the docs, it would be nice to test what the
+new documentation says.
+
+In short, if you submit a patch you probably also have to patch the
+tests.
+
+For modules, the test file is right next to the module itself.
+F<lib/strict.t> tests F<lib/strict.pm>.  This is a recent innovation,
+so there are some snags (and it would be wonderful for you to brush
+them out), but it basically works that way.  Everything else lives in
+F<t/>.
+
+(The following list and instructions describe the situation as of perl 5.8.1.
+The test suite is much less sophisticated in perl 5.6.)
+
+=over 3
+
+=item F<t/base/>
+
+Testing of the absolute basic functionality of Perl.  Things like
+C<if>, basic file reads and writes, simple regexes, etc.  These are
+run first in the test suite and if any of them fail, something is
+I<really> broken.
+
+=item F<t/cmd/>
+
+These test the basic control structures, C<if/else>, C<while>,
+subroutines, etc.
+
+=item F<t/comp/>
+
+Tests basic issues of how Perl parses and compiles itself.
+
+=item F<t/io/>
+
+Tests for built-in IO functions, including command line arguments.
+
+=item F<t/lib/>
+
+The old home for the module tests, you shouldn't put anything new in
+here.  There are still some bits and pieces hanging around in here
+that need to be moved.  Perhaps you could move them?  Thanks!
+
+=item F<t/op/>
+
+Tests for perl's built in functions that don't fit into any of the
+other directories.
+
+=item F<t/pod/>
+
+Tests for POD directives.  There are still some tests for the Pod
+modules hanging around in here that need to be moved out into F<lib/>.
+
+=item F<t/run/>
+
+Testing features of how perl actually runs, including exit codes and
+handling of PERL* environment variables.
+
+=item F<t/uni/>
+
+Tests for the core support of Unicode.
+
+=item F<t/win32/>
+
+Windows-specific tests.
+
+=item F<t/x2p>
+
+A test suite for the s2p converter.
+
+=back
+
+The core uses the same testing style as the rest of Perl, a simple
+"ok/not ok" run through Test::Harness, but there are a few special
+considerations.
+
+There are three ways to write a test in the core.  Test::More,
+t/test.pl and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">.  The
+decision of which to use depends on what part of the test suite you're
+working on.  This is a measure to prevent a high-level failure (such
+as Config.pm breaking) from causing basic functionality tests to fail.
+
+=over 4 
+
+=item t/base t/comp
+
+Since we don't know if require works, or even subroutines, use ad hoc
+tests for these two.  Step carefully to avoid using the feature being
+tested.
+
+=item t/cmd t/run t/io t/op
+
+Now that basic require() and subroutines are tested, you can use the
+t/test.pl library which emulates the important features of Test::More
+while using a minimum of core features.
+
+You can also conditionally use certain libraries like Config, but be
+sure to skip the test gracefully if it's not there.
+
+=item t/lib ext lib
+
+Now that the core of Perl is tested, Test::More can be used.  You can
+also use the full suite of core modules in the tests.
+
+=back
+
+When you say "make test" Perl uses the F<t/TEST> program to run the
+test suite.  All tests are run from the F<t/> directory, B<not> the
+directory which contains the test.  This causes some problems with the
+tests in F<lib/>, so here's some opportunity for some patching.
+
+You must be triply conscious of cross-platform concerns.  This usually
+boils down to using File::Spec and avoiding things like C<fork()> and
+C<system()> unless absolutely necessary.
+
+=head2 Special Make Test Targets
+
+There are various special make targets that can be used to test Perl
+slightly differently than the standard "test" target.  Not all them
+are expected to give a 100% success rate.  Many of them have several
+aliases.
+
+=over 4
+
+=item coretest
+
+Run F<perl> on all core tests (F<t/*> and F<lib/[a-z]*> pragma tests).
+
+=item test.deparse
+
+Run all the tests through the B::Deparse.  Not all tests will succeed.
+
+=item minitest
+
+Run F<miniperl> on F<t/base>, F<t/comp>, F<t/cmd>, F<t/run>, F<t/io>,
+F<t/op>, and F<t/uni> tests.
+
+=item test.valgrind check.valgrind utest.valgrind ucheck.valgrind
+
+(Only in Linux) Run all the tests using the memory leak + naughty
+memory access tool "valgrind".  The log files will be named
+F<testname.valgrind>.
+
+=item test.third check.third utest.third ucheck.third
+
+(Only in Tru64)  Run all the tests using the memory leak + naughty
+memory access tool "Third Degree".  The log files will be named
+F<perl3.log.testname>.
+
+=item test.torture torturetest
+
+Run all the usual tests and some extra tests.  As of Perl 5.8.0 the
+only extra tests are Abigail's JAPHs, F<t/japh/abigail.t>.
+
+You can also run the torture test with F<t/harness> by giving
+C<-torture> argument to F<t/harness>.
+
+=item utest ucheck test.utf8 check.utf8
+
+Run all the tests with -Mutf8.  Not all tests will succeed.
+
+=item test_harness
+
+Run the test suite with the F<t/harness> controlling program, instead of
+F<t/TEST>. F<t/harness> is more sophisticated, and uses the
+L<Test::Harness> module, thus using this test target supposes that perl
+mostly works. The main advantage for our purposes is that it prints a
+detailed summary of failed tests at the end. Also, unlike F<t/TEST>, it
+doesn't redirect stderr to stdout.
+
+=back
+
+=head2 Running tests by hand
+
+You can run part of the test suite by hand by using one the following
+commands from the F<t/> directory :
+
+    ./perl -I../lib TEST list-of-.t-files
+
+or
+
+    ./perl -I../lib harness list-of-.t-files
+
+(if you don't specify test scripts, the whole test suite will be run.)
+
+You can run an individual test by a command similar to
+
+    ./perl -I../lib patho/to/foo.t
+
+except that the harnesses set up some environment variables that may
+affect the execution of the test :
+
+=over 4 
+
+=item PERL_CORE=1
+
+indicates that we're running this test part of the perl core test suite.
+This is useful for modules that have a dual life on CPAN.
+
+=item PERL_DESTRUCT_LEVEL=2
+
+is set to 2 if it isn't set already (see L</PERL_DESTRUCT_LEVEL>)
+
+=item PERL
+
+(used only by F<t/TEST>) if set, overrides the path to the perl executable
+that should be used to run the tests (the default being F<./perl>).
+
+=item PERL_SKIP_TTY_TEST
+
+if set, tells to skip the tests that need a terminal. It's actually set
+automatically by the Makefile, but can also be forced artificially by
+running 'make test_notty'.
+
+=back
+
 =head1 EXTERNAL TOOLS FOR DEBUGGING PERL
 
 Sometimes it helps to use external tools while debugging and
@@ -1529,6 +1907,38 @@ some common testing and debugging tools with Perl.  This is
 meant as a guide to interfacing these tools with Perl, not
 as any kind of guide to the use of the tools themselves.
 
+B<NOTE 1>: Running under memory debuggers such as Purify, valgrind, or
+Third Degree greatly slows down the execution: seconds become minutes,
+minutes become hours.  For example as of Perl 5.8.1, the
+ext/Encode/t/Unicode.t takes extraordinarily long to complete under
+e.g. Purify, Third Degree, and valgrind.  Under valgrind it takes more
+than six hours, even on a snappy computer-- the said test must be
+doing something that is quite unfriendly for memory debuggers.  If you
+don't feel like waiting, that you can simply kill away the perl
+process.
+
+B<NOTE 2>: To minimize the number of memory leak false alarms (see
+L</PERL_DESTRUCT_LEVEL> for more information), you have to have
+environment variable PERL_DESTRUCT_LEVEL set to 2.  The F<TEST>
+and harness scripts do that automatically.  But if you are running
+some of the tests manually-- for csh-like shells:
+
+    setenv PERL_DESTRUCT_LEVEL 2
+
+and for Bourne-type shells:
+
+    PERL_DESTRUCT_LEVEL=2
+    export PERL_DESTRUCT_LEVEL
+
+or in UNIXy environments you can also use the C<env> command:
+
+    env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ...
+
+B<NOTE 3>: There are known memory leaks when there are compile-time
+errors within eval or require, seeing C<S_doeval> in the call stack
+is a good sign of these.  Fixing these leaks is non-trivial,
+unfortunately, but they must be fixed eventually.
+
 =head2 Rational Software's Purify
 
 Purify is a commercial tool that is helpful in identifying
@@ -1537,11 +1947,6 @@ badness.  Perl must be compiled in a specific way for
 optimal testing with Purify.  Purify is available under
 Windows NT, Solaris, HP-UX, SGI, and Siemens Unix.
 
-The only currently known leaks happen when there are
-compile-time errors within eval or require.  (Fixing these
-is non-trivial, unfortunately, but they must be fixed
-eventually.)
-
 =head2 Purify on Unix
 
 On Unix, Purify creates a new Perl binary.  To get the most
@@ -1613,6 +2018,15 @@ If you plan to use the "Viewer" windows, then you only need this option:
 
     setenv PURIFYOPTIONS "-chain-length=25"
 
+In Bourne-type shells:
+
+    PURIFYOPTIONS="..."
+    export PURIFYOPTIONS
+
+or if you have the "env" utility:
+
+    env PURIFYOPTIONS="..." ../pureperl ...
+
 =head2 Purify on NT
 
 Purify on Windows NT instruments the Perl binary 'perl.exe'
@@ -1663,6 +2077,313 @@ standard Perl testset you would create and run Purify as:
 which would instrument Perl in memory, run Perl on test.pl,
 then finally report any memory problems.
 
+=head2 valgrind
+
+The excellent valgrind tool can be used to find out both memory leaks
+and illegal memory accesses.  As of August 2003 it unfortunately works
+only on x86 (ELF) Linux.  The special "test.valgrind" target can be used
+to run the tests under valgrind.  Found errors and memory leaks are
+logged in files named F<test.valgrind>.
+
+As system libraries (most notably glibc) are also triggering errors,
+valgrind allows to suppress such errors using suppression files. The
+default suppression file that comes with valgrind already catches a lot
+of them. Some additional suppressions are defined in F<t/perl.supp>.
+
+To get valgrind and for more information see
+
+    http://developer.kde.org/~sewardj/
+
+=head2 Compaq's/Digital's/HP's Third Degree
+
+Third Degree is a tool for memory leak detection and memory access checks.
+It is one of the many tools in the ATOM toolkit.  The toolkit is only
+available on Tru64 (formerly known as Digital UNIX formerly known as
+DEC OSF/1).
+
+When building Perl, you must first run Configure with -Doptimize=-g
+and -Uusemymalloc flags, after that you can use the make targets
+"perl.third" and "test.third".  (What is required is that Perl must be
+compiled using the C<-g> flag, you may need to re-Configure.)
+
+The short story is that with "atom" you can instrument the Perl
+executable to create a new executable called F<perl.third>.  When the
+instrumented executable is run, it creates a log of dubious memory
+traffic in file called F<perl.3log>.  See the manual pages of atom and
+third for more information.  The most extensive Third Degree
+documentation is available in the Compaq "Tru64 UNIX Programmer's
+Guide", chapter "Debugging Programs with Third Degree".
+
+The "test.third" leaves a lot of files named F<foo_bar.3log> in the t/
+subdirectory.  There is a problem with these files: Third Degree is so
+effective that it finds problems also in the system libraries.
+Therefore you should used the Porting/thirdclean script to cleanup
+the F<*.3log> files.
+
+There are also leaks that for given certain definition of a leak,
+aren't.  See L</PERL_DESTRUCT_LEVEL> for more information.
+
+=head2 PERL_DESTRUCT_LEVEL
+
+If you want to run any of the tests yourself manually using e.g.
+valgrind, or the pureperl or perl.third executables, please note that
+by default perl B<does not> explicitly cleanup all the memory it has
+allocated (such as global memory arenas) but instead lets the exit()
+of the whole program "take care" of such allocations, also known as
+"global destruction of objects".
+
+There is a way to tell perl to do complete cleanup: set the
+environment variable PERL_DESTRUCT_LEVEL to a non-zero value.
+The t/TEST wrapper does set this to 2, and this is what you
+need to do too, if you don't want to see the "global leaks":
+For example, for "third-degreed" Perl:
+
+	env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t
+
+(Note: the mod_perl apache module uses also this environment variable
+for its own purposes and extended its semantics. Refer to the mod_perl
+documentation for more information. Also, spawned threads do the
+equivalent of setting this variable to the value 1.)
+
+If, at the end of a run you get the message I<N scalars leaked>, you can
+recompile with C<-DDEBUG_LEAKING_SCALARS>, which will cause
+the addresses of all those leaked SVs to be dumped; it also converts
+C<new_SV()> from a macro into a real function, so you can use your
+favourite debugger to discover where those pesky SVs were allocated.
+
+=head2 Profiling
+
+Depending on your platform there are various of profiling Perl.
+
+There are two commonly used techniques of profiling executables:
+I<statistical time-sampling> and I<basic-block counting>.
+
+The first method takes periodically samples of the CPU program
+counter, and since the program counter can be correlated with the code
+generated for functions, we get a statistical view of in which
+functions the program is spending its time.  The caveats are that very
+small/fast functions have lower probability of showing up in the
+profile, and that periodically interrupting the program (this is
+usually done rather frequently, in the scale of milliseconds) imposes
+an additional overhead that may skew the results.  The first problem
+can be alleviated by running the code for longer (in general this is a
+good idea for profiling), the second problem is usually kept in guard
+by the profiling tools themselves.
+
+The second method divides up the generated code into I<basic blocks>.
+Basic blocks are sections of code that are entered only in the
+beginning and exited only at the end.  For example, a conditional jump
+starts a basic block.  Basic block profiling usually works by
+I<instrumenting> the code by adding I<enter basic block #nnnn>
+book-keeping code to the generated code.  During the execution of the
+code the basic block counters are then updated appropriately.  The
+caveat is that the added extra code can skew the results: again, the
+profiling tools usually try to factor their own effects out of the
+results.
+
+=head2 Gprof Profiling
+
+gprof is a profiling tool available in many UNIX platforms,
+it uses F<statistical time-sampling>.
+
+You can build a profiled version of perl called "perl.gprof" by
+invoking the make target "perl.gprof"  (What is required is that Perl
+must be compiled using the C<-pg> flag, you may need to re-Configure).
+Running the profiled version of Perl will create an output file called
+F<gmon.out> is created which contains the profiling data collected
+during the execution.
+
+The gprof tool can then display the collected data in various ways.
+Usually gprof understands the following options:
+
+=over 4
+
+=item -a
+
+Suppress statically defined functions from the profile.
+
+=item -b
+
+Suppress the verbose descriptions in the profile.
+
+=item -e routine
+
+Exclude the given routine and its descendants from the profile.
+
+=item -f routine
+
+Display only the given routine and its descendants in the profile.
+
+=item -s
+
+Generate a summary file called F<gmon.sum> which then may be given
+to subsequent gprof runs to accumulate data over several runs.
+
+=item -z
+
+Display routines that have zero usage.
+
+=back
+
+For more detailed explanation of the available commands and output
+formats, see your own local documentation of gprof.
+
+=head2 GCC gcov Profiling
+
+Starting from GCC 3.0 I<basic block profiling> is officially available
+for the GNU CC.
+
+You can build a profiled version of perl called F<perl.gcov> by
+invoking the make target "perl.gcov" (what is required that Perl must
+be compiled using gcc with the flags C<-fprofile-arcs
+-ftest-coverage>, you may need to re-Configure).
+
+Running the profiled version of Perl will cause profile output to be
+generated.  For each source file an accompanying ".da" file will be
+created.
+
+To display the results you use the "gcov" utility (which should
+be installed if you have gcc 3.0 or newer installed).  F<gcov> is
+run on source code files, like this
+
+    gcov sv.c
+
+which will cause F<sv.c.gcov> to be created.  The F<.gcov> files
+contain the source code annotated with relative frequencies of
+execution indicated by "#" markers.
+
+Useful options of F<gcov> include C<-b> which will summarise the
+basic block, branch, and function call coverage, and C<-c> which
+instead of relative frequencies will use the actual counts.  For
+more information on the use of F<gcov> and basic block profiling
+with gcc, see the latest GNU CC manual, as of GCC 3.0 see
+
+    http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html
+
+and its section titled "8. gcov: a Test Coverage Program"
+
+    http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132
+
+=head2 Pixie Profiling
+
+Pixie is a profiling tool available on IRIX and Tru64 (aka Digital
+UNIX aka DEC OSF/1) platforms.  Pixie does its profiling using
+I<basic-block counting>.
+
+You can build a profiled version of perl called F<perl.pixie> by
+invoking the make target "perl.pixie" (what is required is that Perl
+must be compiled using the C<-g> flag, you may need to re-Configure).
+
+In Tru64 a file called F<perl.Addrs> will also be silently created,
+this file contains the addresses of the basic blocks.  Running the
+profiled version of Perl will create a new file called "perl.Counts"
+which contains the counts for the basic block for that particular
+program execution.
+
+To display the results you use the F<prof> utility.  The exact
+incantation depends on your operating system, "prof perl.Counts" in
+IRIX, and "prof -pixie -all -L. perl" in Tru64.
+
+In IRIX the following prof options are available:
+
+=over 4
+
+=item -h
+
+Reports the most heavily used lines in descending order of use.
+Useful for finding the hotspot lines.
+
+=item -l
+
+Groups lines by procedure, with procedures sorted in descending order of use.
+Within a procedure, lines are listed in source order.
+Useful for finding the hotspots of procedures.
+
+=back
+
+In Tru64 the following options are available:
+
+=over 4
+
+=item -p[rocedures]
+
+Procedures sorted in descending order by the number of cycles executed
+in each procedure.  Useful for finding the hotspot procedures.
+(This is the default option.)
+
+=item -h[eavy]
+
+Lines sorted in descending order by the number of cycles executed in
+each line.  Useful for finding the hotspot lines.
+
+=item -i[nvocations]
+
+The called procedures are sorted in descending order by number of calls
+made to the procedures.  Useful for finding the most used procedures.
+
+=item -l[ines]
+
+Grouped by procedure, sorted by cycles executed per procedure.
+Useful for finding the hotspots of procedures.
+
+=item -testcoverage
+
+The compiler emitted code for these lines, but the code was unexecuted.
+
+=item -z[ero]
+
+Unexecuted procedures.
+
+=back
+
+For further information, see your system's manual pages for pixie and prof.
+
+=head2 Miscellaneous tricks
+
+=over 4
+
+=item *
+
+Those debugging perl with the DDD frontend over gdb may find the
+following useful:
+
+You can extend the data conversion shortcuts menu, so for example you
+can display an SV's IV value with one click, without doing any typing.
+To do that simply edit ~/.ddd/init file and add after:
+
+  ! Display shortcuts.
+  Ddd*gdbDisplayShortcuts: \
+  /t ()   // Convert to Bin\n\
+  /d ()   // Convert to Dec\n\
+  /x ()   // Convert to Hex\n\
+  /o ()   // Convert to Oct(\n\
+
+the following two lines:
+
+  ((XPV*) (())->sv_any )->xpv_pv  // 2pvx\n\
+  ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
+
+so now you can do ivx and pvx lookups or you can plug there the
+sv_peek "conversion":
+
+  Perl_sv_peek(my_perl, (SV*)()) // sv_peek
+
+(The my_perl is for threaded builds.)
+Just remember that every line, but the last one, should end with \n\
+
+Alternatively edit the init file interactively via:
+3rd mouse button -> New Display -> Edit Menu
+
+Note: you can define up to 20 conversion shortcuts in the gdb
+section.
+
+=item *
+
+If you see in a debugger a memory area mysteriously full of 0xabababab,
+you may be seeing the effect of the Poison() macro, see L<perlclib>.
+
+=back
+
 =head2 CONCLUSION
 
 We've had a brief look around the Perl source, an overview of the stages
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 5b0fe2faaf..6de9598bf4 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -7,18 +7,17 @@ perlunicode - Unicode support in Perl (EXPERIMENTAL, subject to change)
 =head2 Important Caveat
 
     WARNING:  As of the 5.6.1 release, the implementation of Unicode
-    support in Perl is incomplete, and continues to be highly experimental.
+    support in Perl is incomplete and highly experimental.
 
-The following areas need further work.  They are being rapidly addressed
-in the 5.7.x development branch.
+If you want a far better unicode support, you should use perl 5.8.1 or
+later instead.
 
 =over 4
 
 =item Input and Output Disciplines
 
-There is currently no easy way to mark data read from a file or other
-external source as being utf8.  This will be one of the major areas of
-focus in the near future.
+As of 5.6.x there is no easy way to mark data read from a file or other
+external source as being utf8.
 
 =item Regular Expressions
 
@@ -122,7 +121,7 @@ a Unicode smiley face is C<\x{263A}>.
 
 Identifiers within the Perl script may contain Unicode alphanumeric
 characters, including ideographs.  (You are currently on your own when
-it comes to using the canonical forms of characters--Perl doesn't (yet)
+it comes to using the canonical forms of characters--Perl doesn't
 attempt to canonicalize variable names for you.)
 
 =item *
@@ -218,13 +217,12 @@ And finally, C<scalar reverse()> reverses by character rather than by byte.
 
 =head2 Character encodings for input and output
 
-[XXX: This feature is not yet implemented.]
+This feature is not implemented.
 
 =head1 CAVEATS
 
-As of yet, there is no method for automatically coercing input and
-output to some encoding other than UTF-8.  This is planned in the near
-future, however.
+As of perl 5.6.2, there is no method for automatically coercing input and
+output to some encoding other than UTF-8.
 
 Whether an arbitrary piece of data will be treated as "characters" or
 "bytes" by internal operations cannot be divined at the current time.
author	Rafael Garcia-Suarez <rgarciasuarez@gmail.com>	2003-09-18 20:48:50 +0000
committer	Rafael Garcia-Suarez <rgarciasuarez@gmail.com>	2003-09-18 20:48:50 +0000
commit	a1d64b115abe65e79c49e9ce69b44def625912c7 (patch)
tree	ad2de6abe73cc9f5e60c7c16ea57d66c719242ec
parent	d3e9bfc518091918d9eca26f6caa523aa1b07356 (diff)
download	perl-a1d64b115abe65e79c49e9ce69b44def625912c7.tar.gz