diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 1999-05-25 20:13:47 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 1999-05-25 20:13:47 +0000 |
commit | 4b557a8e777a9b184a60af6094430156c0cd3cd0 (patch) | |
tree | cf0bc3f65fcdc9bb76a0bf48fd65858952bf0f01 /pod | |
parent | 5c44b94ea302876a61650d4e2424ae187b8bd3d8 (diff) | |
parent | 3239fffd94e3f194c659a33f1fc2cf3c767bc537 (diff) | |
download | perl-4b557a8e777a9b184a60af6094430156c0cd3cd0.tar.gz |
Integrate from mainperl.
p4raw-id: //depot/cfgperl@3478
Diffstat (limited to 'pod')
46 files changed, 7822 insertions, 4022 deletions
diff --git a/pod/Makefile b/pod/Makefile index 7db379ca90..f28b9d43a1 100644 --- a/pod/Makefile +++ b/pod/Makefile @@ -38,6 +38,7 @@ POD = \ perldsc.pod \ perllol.pod \ perltoot.pod \ + perltootc.pod \ perlobj.pod \ perltie.pod \ perlbot.pod \ @@ -96,6 +97,7 @@ MAN = \ perldsc.man \ perllol.man \ perltoot.man \ + perltootc.man \ perlobj.man \ perltie.man \ perlbot.man \ @@ -154,6 +156,7 @@ HTML = \ perldsc.html \ perllol.html \ perltoot.html \ + perltootc.html \ perlobj.html \ perltie.html \ perlbot.html \ @@ -212,6 +215,7 @@ TEX = \ perldsc.tex \ perllol.tex \ perltoot.tex \ + perltootc.tex \ perlobj.tex \ perltie.tex \ perlbot.tex \ diff --git a/pod/buildtoc b/pod/buildtoc index 62df02baba..2574b1096f 100644 --- a/pod/buildtoc +++ b/pod/buildtoc @@ -10,7 +10,8 @@ sub output ($); perlsyn perlop perlre perlrun perlfunc perlvar perlsub perlmod perlmodlib perlmodinstall perlform perllocale perlref perlreftut perldsc - perllol perltoot perlobj perltie perlbot perlipc perldbmfilter perldebug + perllol perltoot perltootc perlobj perltie perlbot perlipc + perldbmfilter perldebug perldiag perlsec perltrap perlport perlstyle perlpod perlbook perlembed perlapio perlxs perlxstut perlguts perlcall perlhist diff --git a/pod/perl.pod b/pod/perl.pod index 8f688c72c4..87696fe55d 100644 --- a/pod/perl.pod +++ b/pod/perl.pod @@ -4,19 +4,16 @@ perl - Practical Extraction and Report Language =head1 SYNOPSIS -B<perl> S<[ B<-sTuU> ]> - S<[ B<-hv> ] [ B<-V>[:I<configvar>] ]> - S<[ B<-cw> ] [ B<-d>[:I<debugger>] ] [ B<-D>[I<number/list>] ]> - S<[ B<-pna> ] [ B<-F>I<pattern> ] [ B<-l>[I<octal>] ] [ B<-0>[I<octal>] ]> - S<[ B<-I>I<dir> ] [ B<-m>[B<->]I<module> ] [ B<-M>[B<->]I<'module...'> ]> - S<[ B<-P> ]> - S<[ B<-S> ]> - S<[ B<-x>[I<dir>] ]> - S<[ B<-i>[I<extension>] ]> - S<[ B<-e> I<'command'> ] [ B<--> ] [ I<programfile> ] [ I<argument> ]...> - -For ease of access, the Perl manual has been split up into a number -of sections: +B<perl> S<[ B<-sTuU> ]> S<[ B<-hv> ] [ B<-V>[:I<configvar>] ]> + S<[ B<-cw> ] [ B<-d>[:I<debugger>] ] [ B<-D>[I<number/list>] ]> + S<[ B<-pna> ] [ B<-F>I<pattern> ] [ B<-l>[I<octal>] ] [ B<-0>[I<octal>] ]> + S<[ B<-I>I<dir> ] [ B<-m>[B<->]I<module> ] [ B<-M>[B<->]I<'module...'> ]> + S<[ B<-P> ]> S<[ B<-S> ]> S<[ B<-x>[I<dir>] ]> + S<[ B<-i>[I<extension>] ]> S<[ B<-e> I<'command'> ] + [ B<--> ] [ I<programfile> ] [ I<argument> ]...> + +For ease of access, the Perl manual has been split up into several +sections: perl Perl overview (this section) perldelta Perl changes since previous version @@ -40,11 +37,12 @@ of sections: perlform Perl formats perllocale Perl locale support - perlref Perl references perlreftut Perl references short introduction + perlref Perl references, the rest of the story perldsc Perl data structures intro - perllol Perl data structures: lists of lists - perltoot Perl OO tutorial + perllol Perl data structures: arrays of arrays + perltoot Perl OO tutorial, part 1 + perltootc Perl OO tutorial, part 2 perlobj Perl objects perltie Perl objects hidden behind simple variables perlbot Perl OO tricks and examples @@ -75,7 +73,7 @@ of sections: (If you're intending to read these straight through for the first time, the suggested order will tend to reduce the number of forward references.) -By default, all of the above manpages are installed in the +By default, the manpages listed above are installed in the F</usr/local/man/> directory. Extensive additional documentation for Perl modules is available. The @@ -119,17 +117,17 @@ Perl combines (in the author's opinion, anyway) some of the best features of C, B<sed>, B<awk>, and B<sh>, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of B<csh>, Pascal, and even -BASIC-PLUS.) Expression syntax corresponds quite closely to C +BASIC-PLUS.) Expression syntax corresponds closely to C expression syntax. Unlike most Unix utilities, Perl does not arbitrarily limit the size of your data--if you've got the memory, Perl can slurp in your whole file as a single string. Recursion is of unlimited depth. And the tables used by hashes (sometimes called "associative arrays") grow as necessary to prevent degraded performance. Perl can use sophisticated pattern matching techniques to -scan large amounts of data very quickly. Although optimized for +scan large amounts of data quickly. Although optimized for scanning text, Perl can also deal with binary data, and can make dbm files look like hashes. Setuid Perl scripts are safer than C programs -through a dataflow tracing mechanism which prevents many stupid +through a dataflow tracing mechanism that prevents many stupid security holes. If you have a problem that would ordinarily use B<sed> or B<awk> or @@ -140,107 +138,63 @@ scripts into Perl scripts. But wait, there's more... -Perl version 5 is nearly a complete rewrite, and provides -the following additional benefits: - -=over 5 - -=item * Many usability enhancements +Begun in 1993 (see L<perlhist>), Perl version 5 is nearly a complete +rewrite that provides the following additional benefits: -It is now possible to write much more readable Perl code (even within -regular expressions). Formerly cryptic variable names can be replaced -by mnemonic identifiers. Error messages are more informative, and the -optional warnings will catch many of the mistakes a novice might make. -This cannot be stressed enough. Whenever you get mysterious behavior, -try the B<-w> switch!!! Whenever you don't get mysterious behavior, -try using B<-w> anyway. +=over -=item * Simplified grammar +=item * modularity and reusability using innumerable modules -The new yacc grammar is one half the size of the old one. Many of the -arbitrary grammar rules have been regularized. The number of reserved -words has been cut by 2/3. Despite this, nearly all old Perl scripts -will continue to work unchanged. +Described in L<perlmod>, L<perlmodlib>, and L<perlmodinstall>. -=item * Lexical scoping +=item * embeddable and extensible -Perl variables may now be declared within a lexical scope, like "auto" -variables in C. Not only is this more efficient, but it contributes -to better privacy for "programming in the large". Anonymous -subroutines exhibit deep binding of lexical variables (closures). +Described in L<perlembed>, L<perlxstut>, L<perlxs>, L<perlcall>, +L<perlguts>, and L<xsubpp>. -=item * Arbitrarily nested data structures +=item * roll-your-own magic variables (including multiple simultaneous DBM implementations) -Any scalar value, including any array element, may now contain a -reference to any other variable or subroutine. You can easily create -anonymous variables and subroutines. Perl manages your reference -counts for you. +Described in L<perltie> and L<AnyDBM_File>. -=item * Modularity and reusability +=item * subroutines can now be overridden, autoloaded, and prototyped -The Perl library is now defined in terms of modules which can be easily -shared among various packages. A package may choose to import all or a -portion of a module's published interface. Pragmas (that is, compiler -directives) are defined and used by the same mechanism. +Described in L<perlsub>. -=item * Object-oriented programming +=item * arbitrarily nested data structures and anonymous functions -A package can function as a class. Dynamic multiple inheritance and -virtual methods are supported in a straightforward manner and with very -little new syntax. Filehandles may now be treated as objects. +Described in L<perlreftut>, L<perlref>, L<perldsc>, and L<perllol>. -=item * Embeddable and Extensible +=item * object-oriented programming -Perl may now be embedded easily in your C or C++ application, and can -either call or be called by your routines through a documented -interface. The XS preprocessor is provided to make it easy to glue -your C or C++ routines into Perl. Dynamic loading of modules is -supported, and Perl itself can be made into a dynamic library. +Described in L<perlobj>, L<perltoot>, and L<perlbot>. -=item * POSIX compliant +=item * compilability into C code or Perl bytecode -A major new module is the POSIX module, which provides access to all -available POSIX routines and definitions, via object classes where -appropriate. +Described in L<B> and L<B::Bytecode>. -=item * Package constructors and destructors +=item * support for light-weight processes (threads) -The new BEGIN and END blocks provide means to capture control as -a package is being compiled, and after the program exits. As a -degenerate case they work just like awk's BEGIN and END when you -use the B<-p> or B<-n> switches. +Described in L<perlthrtut> and L<Thread>. -=item * Multiple simultaneous DBM implementations +=item * support for internationalization, localization, and Unicode -A Perl program may now access DBM, NDBM, SDBM, GDBM, and Berkeley DB -files from the same script simultaneously. In fact, the old dbmopen -interface has been generalized to allow any variable to be tied -to an object class which defines its access methods. +Described in L<perllocale> and L<utf8>. -=item * Subroutine definitions may now be autoloaded +=item * lexical scoping -In fact, the AUTOLOAD mechanism also allows you to define any arbitrary -semantics for undefined subroutine calls. It's not for just autoloading. +Described in L<perlsub>. -=item * Regular expression enhancements +=item * regular expression enhancements -You can now specify nongreedy quantifiers. You can now do grouping -without creating a backreference. You can now write regular expressions -with embedded whitespace and comments for readability. A consistent -extensibility mechanism has been added that is upwardly compatible with -all old regular expressions. +Described in L<perlre>, with additional examples in L<perlop>. -=item * Innumerable Unbundled Modules +=item * enhanced debugger and interactive Perl environment, with integrated editor support -The Comprehensive Perl Archive Network described in L<perlmodlib> -contains hundreds of plug-and-play modules full of reusable code. -See F<http://www.perl.com/CPAN> for a site near you. +Described in L<perldebug>. -=item * Compilability +=item * POSIX 1003.1 compliant library -While not yet in full production mode, a working perl-to-C compiler -does exist. It can generate portable byte code, simple C, or -optimized C code. +Described in L<POSIX>. =back @@ -248,13 +202,12 @@ Okay, that's I<definitely> enough hype. =head1 AVAILABILITY -Perl is available for the vast majority of operating system platforms, -including most Unix-like platforms. The following situation is as of -February 1999 and Perl 5.005_03. +Perl is available for most operating systems, including virtually +all Unix-like platforms. -The following platforms are able to build Perl from the standard -source code distribution available at -F<http://www.perl.com/CPAN/src/index.html> +As of May 1999, the following platforms are able to build Perl +from the standard source code distribution available at +http://www.perl.com/CPAN/src/index.html AIX Linux SCO ODT/OSR A/UX MachTen Solaris @@ -275,10 +228,10 @@ F<http://www.perl.com/CPAN/src/index.html> 3) formerly known as Digital UNIX and before that DEC OSF/1 4) compilers: Borland, Cygwin32, Mingw32 EGCS/GCC, VC++ -The following platforms have been known to build Perl from the source -but for the Perl release 5.005_03 we haven't been able to verify them, -either because the hardware/software platforms are rather rare or -because we don't have an active champion on these platforms, or both. +The following platforms have been known to build Perl from source, +but we haven't been able to verify their status for the current release, +either because the hardware/software platforms are rare or +because we don't have an active champion on these platforms--or both. 3b1 FPS Plan 9 AmigaOS GENIX PowerUX @@ -291,9 +244,8 @@ because we don't have an active champion on these platforms, or both. EP/IX Opus Unisys Dynix ESIX Unixware -The following platforms are planned to be supported in the standard -source code distribution of the Perl release 5.006 but are not -supported in the Perl release 5.005_03: +Support for the following platforms is planned for the next major +Perl release. BS2000 Netware @@ -301,7 +253,7 @@ supported in the Perl release 5.005_03: VM/ESA The following platforms have their own source code distributions and -binaries available via F<http://www.perl.com/CPAN/ports/index.html>. +binaries available via http://www.perl.com/CPAN/ports/index.html. Perl release @@ -311,7 +263,7 @@ binaries available via F<http://www.perl.com/CPAN/ports/index.html>. Tandem Guardian 5.004 The following platforms have only binaries available via -F<http://www.perl.com/CPAN/ports/index.html>. +http://www.perl.com/CPAN/ports/index.html. Perl release @@ -325,12 +277,12 @@ See L<perlrun>. =head1 AUTHOR -Larry Wall <F<larry@wall.org>>, with the help of oodles of other folks. +Larry Wall <larry@wall.org>, with the help of oodles of other folks. If your Perl success stories and testimonials may be of help to others who wish to advocate the use of Perl in their applications, or if you wish to simply express your gratitude to Larry and the -Perl developers, please write to <F<perl-thanks@perl.org>>. +Perl developers, please write to perl-thanks@perl.org . =head1 FILES @@ -339,9 +291,11 @@ Perl developers, please write to <F<perl-thanks@perl.org>>. =head1 SEE ALSO a2p awk to perl translator - s2p sed to perl translator + http://www.perl.com/ the Perl Home Page + http://www.perl.com/CPAN the Comphrehensive Perl Archive + =head1 DIAGNOSTICS The B<-w> switch produces some lovely diagnostics. @@ -352,7 +306,7 @@ and errors into these longer forms. Compilation errors will tell you the line number of the error, with an indication of the next token or token type that was to be examined. -(In the case of a script passed to Perl via B<-e> switches, each +(In a script passed to Perl via B<-e> switches, each B<-e> is counted as one line.) Setuid scripts have additional constraints that can produce error @@ -381,10 +335,10 @@ so they are limited to a maximum of 65535 (higher numbers usually being affected by wraparound). You may mail your bug reports (be sure to include full configuration -information as output by the myconfig program in the perl source tree, -or by C<perl -V>) to <F<perlbug@perl.com>>. -If you've succeeded in compiling perl, the perlbug script in the utils/ -subdirectory can be used to help mail in a bug report. +information as output by the myconfig program in the perl source +tree, or by C<perl -V>) to perlbug@perl.com . If you've succeeded +in compiling perl, the perlbug script in the utils/ subdirectory +can be used to help mail in a bug report. Perl actually stands for Pathologically Eclectic Rubbish Lister, but don't tell anyone I said that. diff --git a/pod/perl5004delta.pod b/pod/perl5004delta.pod index 323830b465..43bfb51c66 100644 --- a/pod/perl5004delta.pod +++ b/pod/perl5004delta.pod @@ -268,7 +268,7 @@ referenced subroutine, with the given parameters (if any). This new syntax follows the pattern of S<C<$hashref-E<gt>{FOO}>> and S<C<$aryref-E<gt>[$foo]>>: You may now write S<C<&$subref($foo)>> as -S<C<$subref-E<gt>($foo)>>. All of these arrow terms may be chained; +S<C<$subref-E<gt>($foo)>>. All these arrow terms may be chained; thus, S<C<&{$table-E<gt>{FOO}}($bar)>> may now be written S<C<$table-E<gt>{FOO}-E<gt>($bar)>>. @@ -758,7 +758,7 @@ details on how to get started with building this port. There is also support for building perl under the Cygwin32 environment. Cygwin32 is a set of GNU tools that make it possible to compile and run -many UNIX programs under Windows NT by providing a mostly UNIX-like +many Unix programs under Windows NT by providing a mostly Unix-like interface for compilation and execution. See F<README.cygwin32> in the perl distribution for more details on this port and how to obtain the Cygwin32 toolkit. @@ -936,7 +936,7 @@ requested with the ":flock" tag (e.g. C<use Fcntl ':flock'>). =head2 IO -The IO module provides a simple mechanism to load all of the IO modules at one +The IO module provides a simple mechanism to load all the IO modules at one go. Currently this includes: IO::Handle diff --git a/pod/perlcall.pod b/pod/perlcall.pod index 2b837808a1..35c0f051d5 100644 --- a/pod/perlcall.pod +++ b/pod/perlcall.pod @@ -116,7 +116,7 @@ subroutine are stored on the Perl stack. As a general rule you should I<always> check the return value from these functions. Even if you are expecting only a particular number of values to be returned from the Perl subroutine, there is nothing to -stop someone from doing something unexpected - don't say you haven't +stop someone from doing something unexpected--don't say you haven't been warned. =head1 FLAG VALUES @@ -505,9 +505,9 @@ returned from I<perl_call_pv>. It will always be 0. =head2 Passing Parameters Now let's make a slightly more complex example. This time we want to -call a Perl subroutine, C<LeftString>, which will take 2 parameters - a -string (C<$s>) and an integer (C<$n>). The subroutine will simply -print the first C<$n> characters of the string. +call a Perl subroutine, C<LeftString>, which will take 2 parameters--a +string ($s) and an integer ($n). The subroutine will simply +print the first $n characters of the string. So the Perl subroutine would look like this @@ -555,7 +555,7 @@ as C<SP>. =item 2. If you are going to put something onto the Perl stack, you need to know -where to put it. This is the purpose of the macro C<dSP> - it declares +where to put it. This is the purpose of the macro C<dSP>--it declares and initializes a I<local> copy of the Perl stack pointer. All the other macros which will be used in this example require you to @@ -563,7 +563,7 @@ have used this macro. The exception to this rule is if you are calling a Perl subroutine directly from an XSUB function. In this case it is not necessary to -use the C<dSP> macro explicitly - it will be declared for you +use the C<dSP> macro explicitly--it will be declared for you automatically. =item 3. @@ -578,12 +578,12 @@ The C<PUSHMARK> macro tells Perl to make a mental note of the current stack pointer. Even if you aren't passing any parameters (like the example shown in the section I<No Parameters, Nothing returned>) you must still call the C<PUSHMARK> macro before you can call any of the -I<perl_call_*> functions - Perl still needs to know that there are no +I<perl_call_*> functions--Perl still needs to know that there are no parameters. The C<PUTBACK> macro sets the global copy of the stack pointer to be the same as our local copy. If we didn't do this I<perl_call_pv> -wouldn't know where the two parameters we pushed were - remember that +wouldn't know where the two parameters we pushed were--remember that up to now all the stack pointer manipulation we have done is with our local copy, I<not> the global copy. @@ -922,7 +922,7 @@ and here is a C function to call it. To be able to access the two parameters that were pushed onto the stack after they return from I<perl_call_pv> it is necessary to make a note -of their addresses - thus the two variables C<sva> and C<svb>. +of their addresses--thus the two variables C<sva> and C<svb>. The reason this is necessary is that the area of the Perl stack which held them will very likely have been overwritten by something else by @@ -1175,11 +1175,11 @@ the version of Perl you are using) Not a CODE reference at ... Undefined subroutine &main::47 called ... -The variable C<$ref> may have referred to the subroutine C<fred> +The variable $ref may have referred to the subroutine C<fred> whenever the call to C<SaveSub1> was made but by the time C<CallSavedSub1> gets called it now holds the number C<47>. Because we saved only a pointer to the original SV in C<SaveSub1>, any changes to -C<$ref> will be tracked by the pointer C<rememberSub>. This means that +$ref will be tracked by the pointer C<rememberSub>. This means that whenever C<CallSavedSub1> gets called, it will attempt to execute the code which is referenced by the SV* C<rememberSub>. In this case though, it now refers to the integer C<47>, so expect Perl to complain @@ -1351,7 +1351,7 @@ So the methods C<PrintID> and C<Display> can be invoked like this call_PrintID('Mine', 'PrintID') ; The only thing to note is that in both the static and virtual methods, -the method name is not passed via the stack - it is used as the first +the method name is not passed via the stack--it is used as the first parameter to I<perl_call_method>. =head2 Using GIMME_V @@ -1485,9 +1485,9 @@ enclosing scope at some stage. In the event driven scenario that may never happen. This means that as time goes on, your program will create more and more temporaries, none of which will ever be freed. As each of these temporaries consumes some memory your program will -eventually consume all the available memory in your system - kapow! +eventually consume all the available memory in your system--kapow! -So here is the bottom line - if you are sure that control will revert +So here is the bottom line--if you are sure that control will revert back to the enclosing Perl scope fairly quickly after the end of your callback, then it isn't absolutely necessary to dispose explicitly of any temporaries you may have created. Mind you, if you are at all @@ -1579,7 +1579,7 @@ require is a means of storing the mapping between the opened file and the Perl subroutine we want to be called for that file. Say the i/o library has a function C<asynch_read> which associates a C -function C<ProcessRead> with a file handle C<fh> - this assumes that it +function C<ProcessRead> with a file handle C<fh>--this assumes that it has also provided some routine to open the file and so obtain the file handle. diff --git a/pod/perldata.pod b/pod/perldata.pod index ad27db163b..f4c660d622 100644 --- a/pod/perldata.pod +++ b/pod/perldata.pod @@ -8,9 +8,9 @@ perldata - Perl data types Perl has three built-in data types: scalars, arrays of scalars, and associative arrays of scalars, known as "hashes". Normal arrays -are ordered lists indexed by number, starting with 0 and with +are ordered lists of scalars indexed by number, starting with 0 and with negative subscripts counting from the end. Hashes are unordered -collections of values indexed by their associated string key. +collections of scalar values indexed by their associated string key. Values are usually referred to by name, or through a named reference. The first character of the name tells you to what sort of data @@ -165,7 +165,7 @@ references are strongly-typed, uncastable pointers with builtin reference-counting and destructor invocation. A scalar value is interpreted as TRUE in the Boolean sense if it is not -the empty string or the number 0 (or its string equivalent, "0"). The +the null string or the number 0 (or its string equivalent, "0"). The Boolean context is just a special kind of scalar context where no conversion to a string or a number is ever performed. @@ -220,7 +220,7 @@ had to break this to make sure destructors were called when expected.) You can also gain some miniscule measure of efficiency by pre-extending an array that is going to get big. You can also extend an array by assigning to an element that is off the end of the array. You -can truncate an array down to nothing by assigning the empty list +can truncate an array down to nothing by assigning the null list () to it. The following are equivalent: @whatever = (); @@ -278,8 +278,8 @@ integer formats: String literals are usually delimited by either single or double quotes. They work much like quotes in the standard Unix shells: double-quoted string literals are subject to backslash and variable -substitution; single-quoted strings are not (except for "C<\'>" and -"C<\\>"). The usual C-style backslash rules apply for making +substitution; single-quoted strings are not (except for C<\'> and +C<\\>). The usual C-style backslash rules apply for making characters such as newline, tab, etc., as well as some more exotic forms. See L<perlop/"Quote and Quotelike Operators"> for a list. @@ -490,7 +490,7 @@ followed by all the elements returned by the subroutine named SomeSub called in list context, followed by the key/value pairs of %glarch. To make a list reference that does I<NOT> interpolate, see L<perlref>. -The empty list is represented by (). Interpolating it in a list +The null list is represented by (). Interpolating it in a list has no effect. Thus ((),(),()) is equivalent to (). Similarly, interpolating an array with no elements is the same as if no array had been interpolated at that point. @@ -530,7 +530,7 @@ produced by the expression on the right side of the assignment: $x = (($foo,$bar) = f()); # set $x to f()'s return count This is handy when you want to do a list assignment in a Boolean -context, because most list functions return a empty list when finished, +context, because most list functions return a null list when finished, which when assigned produces a 0, which is interpreted as FALSE. The final element may be an array or a hash: @@ -639,9 +639,10 @@ You couldn't just loop through C<values %hash> to do this because that function produces a new list which is a copy of the values, so changing them doesn't change the original. -As a special rule, if a slice would produce a list consisting entirely -of undefined values, the empty list is produced instead. This makes -it easy to write loops that terminate when an empty list is returned: +As a special rule, if a list slice would produce a list consisting +entirely of undefined values, the null list is produced instead. +This makes it easy to write loops that terminate when a null list +is returned: while ( ($home, $user) = (getpwent)[7,0]) { printf "%-8s %s\n", $user, $home; @@ -649,7 +650,7 @@ it easy to write loops that terminate when an empty list is returned: As noted earlier in this document, the scalar sense of list assignment is the number of elements on the right-hand side of the assignment. -The empty list contains no elements, so when the password file is +The null list contains no elements, so when the password file is exhausted, the result is 0, not 2. If you're confused about why you use an '@' there on a hash slice diff --git a/pod/perldebug.pod b/pod/perldebug.pod index ed77fd35c8..56997322d6 100644 --- a/pod/perldebug.pod +++ b/pod/perldebug.pod @@ -557,7 +557,7 @@ Quit. ("quit" doesn't work for this.) This is the only supported way to exit the debugger, though typing C<exit> twice may do it too. Set an C<O>ption C<inhibit_exit> to 0 if you want to be able to I<step -off> the end the script. You may also need to set C<$finished> to 0 at +off> the end the script. You may also need to set $finished to 0 at some moment if you want to step through global destruction. =item R @@ -968,7 +968,7 @@ application. The array C<@{"_E<lt>$filename"}> is the line-by-line contents of $filename for all the compiled files. Same for C<eval>ed strings which -contain subroutines, or which are currently executed. The C<$filename> +contain subroutines, or which are currently executed. The $filename for C<eval>ed strings looks like C<(eval 34)>. =item * diff --git a/pod/perldelta.pod b/pod/perldelta.pod index 5114ce1731..7d8c0cc607 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -21,29 +21,29 @@ None known at this time. Release 5.005 grandfathered old global symbol names by providing preprocessor macros for extension source compatibility. As of release 5.006, these preprocessor definitions are not available by default. You need to explicitly -compile perl with C<-DPERL_POLLUTE> in order to get these definitions. For -extensions that are still using the old symbols, this option can be +compile perl with C<-DPERL_POLLUTE> to get these definitions. For +extensions still using the old symbols, this option can be specified via MakeMaker: - perl Makefile.PL POLLUTE=1 + perl Makefile.PL POLLUTE=1 =item C<PERL_POLLUTE_MALLOC> -Enabling the use of Perl's malloc in release 5.005 and earlier caused +Enabling Perl's malloc in release 5.005 and earlier caused the namespace of system versions of the malloc family of functions to -be usurped by the Perl versions of these functions, since they used the -same names by default. +be usurped by the Perl versions, since by default they used the +same names. Besides causing problems on platforms that do not allow these functions to be cleanly replaced, this also meant that the system versions could not be called in programs that used Perl's malloc. Previous versions of Perl -have allowed this behavior to be suppressed with the HIDEMYMALLOC and +have allowed this behaviour to be suppressed with the HIDEMYMALLOC and EMBEDMYMALLOC preprocessor definitions. As of release 5.006, Perl's malloc family of functions have default names distinct from the system versions. You need to explicitly compile perl with -C<-DPERL_POLLUTE_MALLOC> in order to get the older behavior. HIDEMYMALLOC -and EMBEDMYMALLOC have no effect, since the behavior they enabled is now +C<-DPERL_POLLUTE_MALLOC> to get the older behaviour. HIDEMYMALLOC +and EMBEDMYMALLOC have no effect, since the behaviour they enabled is now the default. Note that these functions do B<not> constitute Perl's memory allocation API. @@ -52,7 +52,7 @@ See L<perlguts/"Memory Allocation"> for further information about that. =item C<PL_na> and C<dTHR> Issues The C<PL_na> global is now thread local, so a C<dTHR> declaration is needed -in the scope in which it appears. XSUBs should handle this automatically, +in the scope in which the global appears. XSUBs should handle this automatically, but if you have used C<PL_na> in support functions, you either need to change the C<PL_na> to a local variable (which is recommended), or put in a C<dTHR>. @@ -65,23 +65,23 @@ a C<dTHR>. =item C<PATCHLEVEL> is now C<PERL_VERSION> -The cpp macros C<PERL_REVISION>, C<PERL_VERSION> and C<PERL_SUBVERSION> +The cpp macros C<PERL_REVISION>, C<PERL_VERSION>, and C<PERL_SUBVERSION> are now available by default from perl.h, and reflect the base revision, -patchlevel and subversion respectively. C<PERL_REVISION> had no +patchlevel, and subversion respectively. C<PERL_REVISION> had no prior equivalent, while C<PERL_VERSION> and C<PERL_SUBVERSION> were previously available as C<PATCHLEVEL> and C<SUBVERSION>. -The new names cause less pollution of the cpp namespace, and reflect what +The new names cause less pollution of the B<cpp> namespace and reflect what the numbers have come to stand for in common practice. For compatibility, -the old names are still supported when patchlevel.h is explicitly +the old names are still supported when F<patchlevel.h> is explicitly included (as required before), so there is no source incompatibility -due to the change. +from the change. =back =head2 Binary Incompatibilities -This release is not binary compatible with the 5.005 release and its +This release is not binary compatible with the 5.005 release or its maintenance versions. =head1 Core Changes @@ -102,8 +102,8 @@ level using the C<use warning> pragma. See L<warning> for details. Binary numbers are now supported as literals, in s?printf formats, and C<oct()>: - $answer = 0b101010; - printf "The answer is: %b\n", oct("0b101010"); + $answer = 0b101010; + printf "The answer is: %b\n", oct("0b101010"); =head2 syswrite() ease-of-use @@ -117,28 +117,28 @@ extent of 64-bit support. Depending on the platform (hints file) more or less 64-awareness becomes available. As of 5.005_54 at least somewhat 64-bit aware platforms are HP-UX 11 or better, Solaris 2.6 or better, IRIX 6.2 or better. Naturally 64-bit platforms like Digital -UNIX and UNICOS also have 64-bit support. +Unix and UNICOS also have 64-bit support. =head2 Better syntax checks on parenthesized unary operators Expressions such as: - print defined(&foo,&bar,&baz); - print uc("foo","bar","baz"); - undef($foo,&bar); + print defined(&foo,&bar,&baz); + print uc("foo","bar","baz"); + undef($foo,&bar); used to be accidentally allowed in earlier versions, and produced -unpredictable behavior. Some of them produced ancillary warnings -when used in this way, while others silently did the wrong thing. +unpredictable behaviour. Some produced ancillary warnings +when used in this way; others silently did the wrong thing. The parenthesized forms of most unary operators that expect a single -argument will now ensure that they are not called with more than one -argument, making the above cases syntax errors. Note that the usual -behavior of: +argument now ensure that they are not called with more than one +argument, making the cases shown above syntax errors. The usual +behaviour of: - print defined &foo, &bar, &baz; - print uc "foo", "bar", "baz"; - undef $foo, &bar; + print defined &foo, &bar, &baz; + print uc "foo", "bar", "baz"; + undef $foo, &bar; remains unchanged. See L<perlop>. @@ -146,8 +146,8 @@ remains unchanged. See L<perlop>. The C<qw//> operator is now evaluated at compile time into a true list instead of being replaced with a run time call to C<split()>. This -removes the confusing behavior of C<qw//> in scalar context stemming from -the older implementation, which inherited the behavior from split(). +removes the confusing misbehaviour of C<qw//> in scalar context, which +had inherited that behaviour from split(). Thus: @@ -162,7 +162,7 @@ strings. See L<perlfunc/"pack">. =head2 pack() format modifier '!' supported -The new format type modifer '!' is useful for packing and unpacking +The new format type modifier '!' is useful for packing and unpacking native shorts, ints, and longs. See L<perlfunc/"pack">. =head2 $^X variables may now have names longer than one character @@ -171,36 +171,36 @@ Formerly, $^X was synonymous with ${"\cX"}, but $^XY was a syntax error. Now variable names that begin with a control character may be arbitrarily long. However, for compatibility reasons, these variables I<must> be written with explicit braces, as C<${^XY}> for example. -C<${^XYZ}> is synonymous with ${"\cXYZ"}. Variable names with more +C<${^XYZ}> is synonymous with ${"\cXYZ"}. Variable names with more than one control character, such as C<${^XY^Z}>, are illegal. -The old syntax has not changed. As before, the `^X' may either be a -literal control-X character or the two character sequence `caret' plus -`X'. When the braces are omitted, the variable name stops after the +The old syntax has not changed. As before, `^X' may be either a +literal control-X character or the two-character sequence `caret' plus +`X'. When braces are omitted, the variable name stops after the control character. Thus C<"$^XYZ"> continues to be synonymous with C<$^X . "YZ"> as before. As before, lexical variables may not have names beginning with control characters. As before, variables whose names begin with a control -character are always forced to be in package `main'. These variables -are all reserved for future extensions, except the ones that begin -with C<^_>, which may be used by user programs and will not acquire a -special meaning in any future version of Perl. +character are always forced to be in package `main'. All such variables +are reserved for future extensions, except those that begin with +C<^_>, which may be used by user programs and is guaranteed not to +acquire special meaning in any future version of Perl. =head1 Significant bug fixes =head2 E<lt>HANDLEE<gt> on empty files With C<$/> set to C<undef>, slurping an empty file returns a string of -zero length (instead of C<undef>, as it used to) for the first time the -HANDLE is read. Subsequent reads yield C<undef>. +zero length (instead of C<undef>, as it used to) the first time the +HANDLE is read. Further reads yield C<undef>. This means that the following will append "foo" to an empty file (it used -to not do anything before): +to do nothing): perl -0777 -pi -e 's/^/foo/' empty_file -Note that the behavior of: +The behaviour of: perl -pi -e 's/^/foo/' empty_file @@ -214,8 +214,8 @@ This has been corrected. Lexical lookups for variables appearing in C<eval '...'> within functions that were themselves called within an C<eval '...'> were -searching the wrong place for lexicals. They now correctly terminate -the lexical search at the subroutine call boundary. +searching the wrong place for lexicals. The lexical search now +correctly ends at the subroutine's block boundary. Parsing of here documents used to be flawed when they appeared as the replacement expression in C<eval 's/.../.../e'>. This has @@ -223,11 +223,11 @@ been fixed. =head2 Automatic flushing of output buffers -fork(), exec(), system(), qx// and pipe open()s now flush the buffers -of all files that were opened for output at the time the operation -was attempted. This mostly eliminates the often confusing effects of +fork(), exec(), system(), qx//, and pipe open()s now flush buffers +of all files opened for output when the operation +was attempted. This mostly eliminates confusing buffering mishaps suffered by users unaware of how Perl internally -handled I/O. +handles I/O. =head1 Supported Platforms @@ -263,7 +263,7 @@ Rhapsody is now supported. =item op/io_const IO constants (SEEK_*, _IO*). - + =item op/io_dir Directory-related IO methods (new, read, close, rewind, tied delete). @@ -303,10 +303,10 @@ Added Dumpvalue module provides screen dumps of Perl data. =item Benchmark You can now run tests for I<n> seconds instead of guessing the right -number of tests to run: e.g. timethese(-5, ...) will run each of the -codes for at least 5 CPU seconds. Zero as the "number of repetitions" +number of tests to run: e.g. timethese(-5, ...) will run each +code for at least 5 CPU seconds. Zero as the "number of repetitions" means "for at least 3 CPU seconds". The output format has also -changed. For example: +changed. For example: use Benchmark;$x=3;timethese(-5,{a=>sub{$x*$x},b=>sub{$x**2}}) @@ -322,12 +322,12 @@ and the "@ operations/CPU second (n=operations)". =item Devel::Peek The Devel::Peek module provides access to the internal representation -of Perl variables. It is a data debugging tool for the XS programmer. +of Perl variables and data. It is a data debugging tool for the XS programmer. =item Fcntl More Fcntl constants added: F_SETLK64, F_SETLKW64, O_LARGEFILE for -large (more than 4G) file access (the 64-bit support is not yet +large (more than 4G) file access (64-bit support is not yet working, though, so no need to get overly excited), Free/Net/OpenBSD locking behaviour flags F_FLOCK, F_POSIX, Linux F_SHLCK, and O_ACCMODE: the mask of O_RDONLY, O_WRONLY, and O_RDWR. @@ -335,62 +335,62 @@ O_ACCMODE: the mask of O_RDONLY, O_WRONLY, and O_RDWR. =item File::Spec New methods have been added to the File::Spec module: devnull() returns -the name of the null device (/dev/null on UNIX) and tmpdir() the name of -the temp directory (normally /tmp on UNIX). There are now also methods +the name of the null device (/dev/null on Unix) and tmpdir() the name of +the temp directory (normally /tmp on Unix). There are now also methods to convert between absolute and relative filenames: abs2rel() and -rel2abs(). For compatibility with operating systems that specify volume -names in file paths, the splitpath(), splitdir() and catdir() methods +rel2abs(). For compatibility with operating systems that specify volume +names in file paths, the splitpath(), splitdir(), and catdir() methods have been added. =item File::Spec::Functions The new File::Spec::Functions modules provides a function interface -to the File::Spec module. Allows shorthand +to the File::Spec module. Allows shorthand - $fullname = catfile($dir1, $dir2, $file); + $fullname = catfile($dir1, $dir2, $file); instead of - $fullname = File::Spec->catfile($dir1, $dir2, $file); + $fullname = File::Spec->catfile($dir1, $dir2, $file); =item Math::BigInt -The logical operations C<E<lt>E<lt>>, C<E<gt>E<gt>>, C<&>, C<|> +The logical operations C<E<lt>E<lt>>, C<E<gt>E<gt>>, C<&>, C<|>, and C<~> are now supported on bigints. =item Math::Complex -The accessor methods Re, Im, arg, abs, rho, and theta, can now also +The accessor methods Re, Im, arg, abs, rho, and theta can now also act as mutators (accessor $z->Re(), mutator $z->Re(3)). =item Math::Trig -A little bit of radial trigonometry (cylindrical and spherical) added, -radial coordinate conversions and the great circle distance. +A little bit of radial trigonometry (cylindrical and spherical), +radial coordinate conversions, and the great circle distance were added. =item SDBM_File An EXISTS method has been added to this module (and sdbm_exists() has been added to the underlying sdbm library), so one can now call exists -on an SDBM_File tied hash and get the correct result rather than a +on an SDBM_File tied hash and get the correct result, rather than a runtime error. =item Time::Local The timelocal() and timegm() functions used to silently return bogus results when the date exceeded the machine's integer range. They -consistently croak() if the date falls in an unsupported range. +now consistently croak() if the date falls in an unsupported range. =item Win32 The error return value in list context has been changed for all functions -that return a list of values. Previously these functions returned a list -with a single element C<undef> in case an error occurred. Now these functions -return the empty list in these situations. This applies to the following +that return a list of values. Previously these functions returned a list +with a single element C<undef> if an error occurred. Now these functions +return the empty list in these situations. This applies to the following functions: - Win32::FsType - Win32::GetOSVersion + Win32::FsType + Win32::GetOSVersion The remaining functions are unchanged and continue to return C<undef> on error even in list context. @@ -399,22 +399,22 @@ The Win32::SetLastError(ERROR) function has been added as a complement to the Win32::GetLastError() function. The new Win32::GetFullPathName(FILENAME) returns the full absolute -pathname for FILENAME in scalar context. In list context it returns -a two element list containing the fully qualified directory name and +pathname for FILENAME in scalar context. In list context it returns +a two-element list containing the fully qualified directory name and the filename. =item DBM Filters A new feature called "DBM Filters" has been added to all the -DBM modules -- DB_File, GDBM_File, NDBM_File, ODBM_File and SDBM_File. -DBM Filters add four new methods to each of the DBM modules +DBM modules--DB_File, GDBM_File, NDBM_File, ODBM_File, and SDBM_File. +DBM Filters add four new methods to each DBM module: filter_store_key filter_store_value filter_fetch_key filter_fetch_value -These can be used to filter the contents of keys/values before they are +These can be used to filter key-value pairs before the pairs are written to the database or just after they are read from the database. See L<perldbmfilter> for further information. @@ -422,16 +422,16 @@ See L<perldbmfilter> for further information. =head2 Pragmata -C<use utf8;>, to enable UTF-8 and Unicode support. +C<use utf8> to enable UTF-8 and Unicode support. Lexical warnings pragma, C<use warning;>, to control optional warnings. -C<use filetest;>, to control the behaviour of filetests (C<-r> C<-w> ...). +C<use filetest> to control the behaviour of filetests (C<-r> C<-w> ...). Currently only one subpragma implemented, "use filetest 'access';", -that enables the use of access(2) or equivalent to check the +that enables the use of access(2) or equivalent to check permissions instead of using stat(2) as usual. This matters -in filesystems where there are ACLs (access control lists), the -stat(2) might lie, while access(2) knows better. +in filesystems where there are ACLs (access control lists): the +stat(2) might lie, but access(2) knows better. =head1 Utility Changes @@ -449,6 +449,10 @@ A tutorial on using open() effectively. A tutorial that introduces the essentials of references. +=item perltootc.pod + +A tutorial on managing class data for object modules. + =back =head1 New Diagnostics @@ -490,14 +494,14 @@ because many scripts assume to find Perl in /usr/bin/perl. =head1 BUGS If you find what you think is a bug, you might check the headers of -recently posted articles in the comp.lang.perl.misc newsgroup. +articles recently posted to the comp.lang.perl.misc newsgroup. There may also be information at http://www.perl.com/perl/, the Perl Home Page. If you believe you have an unreported bug, please run the B<perlbug> -program included with your release. Make sure you trim your bug down +program included with your release. Make sure to trim your bug down to a tiny but sufficient test case. Your bug report, along with the -output of C<perl -V>, will be sent off to <F<perlbug@perl.com>> to be +output of C<perl -V>, will be sent off to perlbug@perl.com to be analysed by the Perl porting team. =head1 SEE ALSO diff --git a/pod/perldiag.pod b/pod/perldiag.pod index 72b419294e..688e847085 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -756,13 +756,15 @@ but there is no function to autoload. Most probable causes are a misprint in a function/method name or a failure to C<AutoSplit> the file, say, by doing C<make install>. -=item Can't locate %s in @INC - -(F) You said to do (or require, or use) a file that couldn't be found -in any of the libraries mentioned in @INC. Perhaps you need to set the -PERL5LIB or PERL5OPT environment variable to say where the extra library -is, or maybe the script needs to add the library name to @INC. Or maybe -you just misspelled the name of the file. See L<perlfunc/require>. +=item Can't locate %s + +(F) You said to C<do> (or C<require>, or C<use>) a file that couldn't be +found. Perl looks for the file in all the locations mentioned in @INC, +unless the file name included the full path to the file. Perhaps you need +to set the PERL5LIB or PERL5OPT environment variable to say where the extra +library is, or maybe the script needs to add the library name to @INC. Or +maybe you just misspelled the name of the file. See L<perlfunc/require> +and L<lib>. =item Can't locate object method "%s" via package "%s" diff --git a/pod/perldsc.pod b/pod/perldsc.pod index ef3ae750a5..5ab97e1795 100644 --- a/pod/perldsc.pod +++ b/pod/perldsc.pod @@ -8,8 +8,8 @@ The single feature most sorely lacking in the Perl programming language prior to its 5.0 release was complex data structures. Even without direct language support, some valiant programmers did manage to emulate them, but it was hard work and not for the faint of heart. You could occasionally -get away with the C<$m{$LoL,$b}> notation borrowed from I<awk> in which the -keys are actually more like a single concatenated string C<"$LoL$b">, but +get away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the +keys are actually more like a single concatenated string C<"$AoA$b">, but traversal and sorting were difficult. More desperate programmers even hacked Perl's internal symbol table directly, a strategy that proved hard to develop and maintain--to put it mildly. @@ -21,7 +21,7 @@ with three dimensions! for $x (1 .. 10) { for $y (1 .. 10) { for $z (1 .. 10) { - $LoL[$x][$y][$z] = + $AoA[$x][$y][$z] = $x ** $y + $z; } } @@ -30,7 +30,7 @@ with three dimensions! Alas, however simple this may appear, underneath it's a much more elaborate construct than meets the eye! -How do you print it out? Why can't you say just C<print @LoL>? How do +How do you print it out? Why can't you say just C<print @AoA>? How do you sort it? How can you pass it to a function or get one of these back from a function? Is is an object? Can you save it to disk to read back later? How do you access whole rows or columns of that matrix? Do @@ -93,8 +93,8 @@ level. It's just that you can I<use> it as though it were a two-dimensional one. This is actually the way almost all C multidimensional arrays work as well. - $list[7][12] # array of arrays - $list[7]{string} # array of hashes + $array[7][12] # array of arrays + $array[7]{string} # array of hashes $hash{string}[7] # hash of arrays $hash{string}{'another string'} # hash of hashes @@ -102,10 +102,10 @@ Now, because the top level contains only references, if you try to print out your array in with a simple print() function, you'll get something that doesn't look very nice, like this: - @LoL = ( [2, 3], [4, 5, 7], [0] ); - print $LoL[1][2]; + @AoA = ( [2, 3], [4, 5, 7], [0] ); + print $AoA[1][2]; 7 - print @LoL; + print @AoA; ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0) @@ -124,25 +124,25 @@ repeatedly. Here's the case where you just get the count instead of a nested array: for $i (1..10) { - @list = somefunc($i); - $LoL[$i] = @list; # WRONG! + @array = somefunc($i); + $AoA[$i] = @array; # WRONG! } -That's just the simple case of assigning a list to a scalar and getting +That's just the simple case of assigning an array to a scalar and getting its element count. If that's what you really and truly want, then you might do well to consider being a tad more explicit about it, like this: for $i (1..10) { - @list = somefunc($i); - $counts[$i] = scalar @list; + @array = somefunc($i); + $counts[$i] = scalar @array; } Here's the case of taking a reference to the same memory location again and again: for $i (1..10) { - @list = somefunc($i); - $LoL[$i] = \@list; # WRONG! + @array = somefunc($i); + $AoA[$i] = \@array; # WRONG! } So, what's the big problem with that? It looks right, doesn't it? @@ -150,8 +150,8 @@ After all, I just told you that you need an array of references, so by golly, you've made me one! Unfortunately, while this is true, it's still broken. All the references -in @LoL refer to the I<very same place>, and they will therefore all hold -whatever was last in @list! It's similar to the problem demonstrated in +in @AoA refer to the I<very same place>, and they will therefore all hold +whatever was last in @array! It's similar to the problem demonstrated in the following C program: #include <pwd.h> @@ -176,40 +176,40 @@ hash constructor C<{}> instead. Here's the right way to do the preceding broken code fragments: for $i (1..10) { - @list = somefunc($i); - $LoL[$i] = [ @list ]; + @array = somefunc($i); + $AoA[$i] = [ @array ]; } The square brackets make a reference to a new array with a I<copy> -of what's in @list at the time of the assignment. This is what +of what's in @array at the time of the assignment. This is what you want. Note that this will produce something similar, but it's much harder to read: for $i (1..10) { - @list = 0 .. $i; - @{$LoL[$i]} = @list; + @array = 0 .. $i; + @{$AoA[$i]} = @array; } Is it the same? Well, maybe so--and maybe not. The subtle difference is that when you assign something in square brackets, you know for sure it's always a brand new reference with a new I<copy> of the data. -Something else could be going on in this new case with the C<@{$LoL[$i]}}> +Something else could be going on in this new case with the C<@{$AoA[$i]}}> dereference on the left-hand-side of the assignment. It all depends on -whether C<$LoL[$i]> had been undefined to start with, or whether it -already contained a reference. If you had already populated @LoL with +whether C<$AoA[$i]> had been undefined to start with, or whether it +already contained a reference. If you had already populated @AoA with references, as in - $LoL[3] = \@another_list; + $AoA[3] = \@another_array; Then the assignment with the indirection on the left-hand-side would use the existing reference that was already there: - @{$LoL[3]} = @list; + @{$AoA[3]} = @array; Of course, this I<would> have the "interesting" effect of clobbering -@another_list. (Have you ever noticed how when a programmer says +@another_array. (Have you ever noticed how when a programmer says something is "interesting", that rather than meaning "intriguing", they're disturbingly more apt to mean that it's "annoying", "difficult", or both? :-) @@ -222,8 +222,8 @@ Surprisingly, the following dangerous-looking construct will actually work out fine: for $i (1..10) { - my @list = somefunc($i); - $LoL[$i] = \@list; + my @array = somefunc($i); + $AoA[$i] = \@array; } That's because my() is more of a run-time statement than it is a @@ -242,18 +242,18 @@ do the right thing behind the scenes. In summary: - $LoL[$i] = [ @list ]; # usually best - $LoL[$i] = \@list; # perilous; just how my() was that list? - @{ $LoL[$i] } = @list; # way too tricky for most programmers + $AoA[$i] = [ @array ]; # usually best + $AoA[$i] = \@array; # perilous; just how my() was that array? + @{ $AoA[$i] } = @array; # way too tricky for most programmers =head1 CAVEAT ON PRECEDENCE -Speaking of things like C<@{$LoL[$i]}>, the following are actually the +Speaking of things like C<@{$AoA[$i]}>, the following are actually the same thing: - $listref->[2][2] # clear - $$listref[2][2] # confusing + $aref->[2][2] # clear + $$aref[2][2] # confusing That's because Perl's precedence rules on its five prefix dereferencers (which look like someone swearing: C<$ @ * % &>) make them bind more @@ -263,11 +263,11 @@ accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th> element of C<a>. That is, they first take the subscript, and only then dereference the thing at that subscript. That's fine in C, but this isn't C. -The seemingly equivalent construct in Perl, C<$$listref[$i]> first does -the deref of C<$listref>, making it take $listref as a reference to an +The seemingly equivalent construct in Perl, C<$$aref[$i]> first does +the deref of $aref, making it take $aref as a reference to an array, and then dereference that, and finally tell you the I<i'th> value -of the array pointed to by $LoL. If you wanted the C notion, you'd have to -write C<${$LoL[$i]}> to force the C<$LoL[$i]> to get evaluated first +of the array pointed to by $AoA. If you wanted the C notion, you'd have to +write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first before the leading C<$> dereferencer. =head1 WHY YOU SHOULD ALWAYS C<use strict> @@ -283,19 +283,19 @@ This way, you'll be forced to declare all your variables with my() and also disallow accidental "symbolic dereferencing". Therefore if you'd done this: - my $listref = [ + my $aref = [ [ "fred", "barney", "pebbles", "bambam", "dino", ], [ "homer", "bart", "marge", "maggie", ], [ "george", "jane", "elroy", "judy", ], ]; - print $listref[2][2]; + print $aref[2][2]; The compiler would immediately flag that as an error I<at compile time>, -because you were accidentally accessing C<@listref>, an undeclared +because you were accidentally accessing C<@aref>, an undeclared variable, and it would thereby remind you to write instead: - print $listref->[2][2] + print $aref->[2][2] =head1 DEBUGGING @@ -303,10 +303,10 @@ Before version 5.002, the standard Perl debugger didn't do a very nice job of printing out complex data structures. With 5.002 or above, the debugger includes several new features, including command line editing as well as the C<x> command to dump out complex data structures. For -example, given the assignment to $LoL above, here's the debugger output: +example, given the assignment to $AoA above, here's the debugger output: - DB<1> x $LoL - $LoL = ARRAY(0x13b5a0) + DB<1> x $AoA + $AoA = ARRAY(0x13b5a0) 0 ARRAY(0x1f0a24) 0 'fred' 1 'barney' @@ -330,79 +330,79 @@ Presented with little comment (these will get their own manpages someday) here are short code examples illustrating access of various types of data structures. -=head1 LISTS OF LISTS +=head1 ARRAYS OF ARRAYS -=head2 Declaration of a LIST OF LISTS +=head2 Declaration of a ARRAY OF ARRAYS - @LoL = ( + @AoA = ( [ "fred", "barney" ], [ "george", "jane", "elroy" ], [ "homer", "marge", "bart" ], ); -=head2 Generation of a LIST OF LISTS +=head2 Generation of a ARRAY OF ARRAYS # reading from file while ( <> ) { - push @LoL, [ split ]; + push @AoA, [ split ]; } # calling a function for $i ( 1 .. 10 ) { - $LoL[$i] = [ somefunc($i) ]; + $AoA[$i] = [ somefunc($i) ]; } # using temp vars for $i ( 1 .. 10 ) { @tmp = somefunc($i); - $LoL[$i] = [ @tmp ]; + $AoA[$i] = [ @tmp ]; } # add to an existing row - push @{ $LoL[0] }, "wilma", "betty"; + push @{ $AoA[0] }, "wilma", "betty"; -=head2 Access and Printing of a LIST OF LISTS +=head2 Access and Printing of a ARRAY OF ARRAYS # one element - $LoL[0][0] = "Fred"; + $AoA[0][0] = "Fred"; # another element - $LoL[1][1] =~ s/(\w)/\u$1/; + $AoA[1][1] =~ s/(\w)/\u$1/; # print the whole thing with refs - for $aref ( @LoL ) { + for $aref ( @AoA ) { print "\t [ @$aref ],\n"; } # print the whole thing with indices - for $i ( 0 .. $#LoL ) { - print "\t [ @{$LoL[$i]} ],\n"; + for $i ( 0 .. $#AoA ) { + print "\t [ @{$AoA[$i]} ],\n"; } # print the whole thing one at a time - for $i ( 0 .. $#LoL ) { - for $j ( 0 .. $#{ $LoL[$i] } ) { - print "elt $i $j is $LoL[$i][$j]\n"; + for $i ( 0 .. $#AoA ) { + for $j ( 0 .. $#{ $AoA[$i] } ) { + print "elt $i $j is $AoA[$i][$j]\n"; } } -=head1 HASHES OF LISTS +=head1 HASHES OF ARRAYS -=head2 Declaration of a HASH OF LISTS +=head2 Declaration of a HASH OF ARRAYS - %HoL = ( + %HoA = ( flintstones => [ "fred", "barney" ], jetsons => [ "george", "jane", "elroy" ], simpsons => [ "homer", "marge", "bart" ], ); -=head2 Generation of a HASH OF LISTS +=head2 Generation of a HASH OF ARRAYS # reading from file # flintstones: fred barney wilma dino while ( <> ) { next unless s/^(.*?):\s*//; - $HoL{$1} = [ split ]; + $HoA{$1} = [ split ]; } # reading from file; more temps @@ -410,65 +410,65 @@ types of data structures. while ( $line = <> ) { ($who, $rest) = split /:\s*/, $line, 2; @fields = split ' ', $rest; - $HoL{$who} = [ @fields ]; + $HoA{$who} = [ @fields ]; } # calling a function that returns a list for $group ( "simpsons", "jetsons", "flintstones" ) { - $HoL{$group} = [ get_family($group) ]; + $HoA{$group} = [ get_family($group) ]; } # likewise, but using temps for $group ( "simpsons", "jetsons", "flintstones" ) { @members = get_family($group); - $HoL{$group} = [ @members ]; + $HoA{$group} = [ @members ]; } # append new members to an existing family - push @{ $HoL{"flintstones"} }, "wilma", "betty"; + push @{ $HoA{"flintstones"} }, "wilma", "betty"; -=head2 Access and Printing of a HASH OF LISTS +=head2 Access and Printing of a HASH OF ARRAYS # one element - $HoL{flintstones}[0] = "Fred"; + $HoA{flintstones}[0] = "Fred"; # another element - $HoL{simpsons}[1] =~ s/(\w)/\u$1/; + $HoA{simpsons}[1] =~ s/(\w)/\u$1/; # print the whole thing - foreach $family ( keys %HoL ) { - print "$family: @{ $HoL{$family} }\n" + foreach $family ( keys %HoA ) { + print "$family: @{ $HoA{$family} }\n" } # print the whole thing with indices - foreach $family ( keys %HoL ) { + foreach $family ( keys %HoA ) { print "family: "; - foreach $i ( 0 .. $#{ $HoL{$family} } ) { - print " $i = $HoL{$family}[$i]"; + foreach $i ( 0 .. $#{ $HoA{$family} } ) { + print " $i = $HoA{$family}[$i]"; } print "\n"; } # print the whole thing sorted by number of members - foreach $family ( sort { @{$HoL{$b}} <=> @{$HoL{$a}} } keys %HoL ) { - print "$family: @{ $HoL{$family} }\n" + foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) { + print "$family: @{ $HoA{$family} }\n" } # print the whole thing sorted by number of members and name foreach $family ( sort { - @{$HoL{$b}} <=> @{$HoL{$a}} + @{$HoA{$b}} <=> @{$HoA{$a}} || $a cmp $b - } keys %HoL ) + } keys %HoA ) { - print "$family: ", join(", ", sort @{ $HoL{$family} }), "\n"; + print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n"; } -=head1 LISTS OF HASHES +=head1 ARRAYS OF HASHES -=head2 Declaration of a LIST OF HASHES +=head2 Declaration of a ARRAY OF HASHES - @LoH = ( + @AoH = ( { Lead => "fred", Friend => "barney", @@ -485,7 +485,7 @@ types of data structures. } ); -=head2 Generation of a LIST OF HASHES +=head2 Generation of a ARRAY OF HASHES # reading from file # format: LEAD=fred FRIEND=barney @@ -495,7 +495,7 @@ types of data structures. ($key, $value) = split /=/, $field; $rec->{$key} = $value; } - push @LoH, $rec; + push @AoH, $rec; } @@ -503,34 +503,34 @@ types of data structures. # format: LEAD=fred FRIEND=barney # no temp while ( <> ) { - push @LoH, { split /[\s+=]/ }; + push @AoH, { split /[\s+=]/ }; } - # calling a function that returns a key,value list, like + # calling a function that returns a key/value pair list, like # "lead","fred","daughter","pebbles" while ( %fields = getnextpairset() ) { - push @LoH, { %fields }; + push @AoH, { %fields }; } # likewise, but using no temp vars while (<>) { - push @LoH, { parsepairs($_) }; + push @AoH, { parsepairs($_) }; } # add key/value to an element - $LoH[0]{pet} = "dino"; - $LoH[2]{pet} = "santa's little helper"; + $AoH[0]{pet} = "dino"; + $AoH[2]{pet} = "santa's little helper"; -=head2 Access and Printing of a LIST OF HASHES +=head2 Access and Printing of a ARRAY OF HASHES # one element - $LoH[0]{lead} = "fred"; + $AoH[0]{lead} = "fred"; # another element - $LoH[1]{lead} =~ s/(\w)/\u$1/; + $AoH[1]{lead} =~ s/(\w)/\u$1/; # print the whole thing with refs - for $href ( @LoH ) { + for $href ( @AoH ) { print "{ "; for $role ( keys %$href ) { print "$role=$href->{$role} "; @@ -539,18 +539,18 @@ types of data structures. } # print the whole thing with indices - for $i ( 0 .. $#LoH ) { + for $i ( 0 .. $#AoH ) { print "$i is { "; - for $role ( keys %{ $LoH[$i] } ) { - print "$role=$LoH[$i]{$role} "; + for $role ( keys %{ $AoH[$i] } ) { + print "$role=$AoH[$i]{$role} "; } print "}\n"; } # print the whole thing one at a time - for $i ( 0 .. $#LoH ) { - for $role ( keys %{ $LoH[$i] } ) { - print "elt $i $role is $LoH[$i]{$role}\n"; + for $i ( 0 .. $#AoH ) { + for $role ( keys %{ $AoH[$i] } ) { + print "elt $i $role is $AoH[$i]{$role}\n"; } } @@ -767,9 +767,9 @@ many different sorts: ########################################################### # now, you might want to make interesting extra fields that # include pointers back into the same data structure so if - # change one piece, it changes everywhere, like for examples - # if you wanted a {kids} field that was an array reference - # to a list of the kids' records without having duplicate + # change one piece, it changes everywhere, like for example + # if you wanted a {kids} field that was a reference + # to an array of the kids' records without having duplicate # records and thus update problems. ########################################################### foreach $family (keys %TV) { @@ -784,7 +784,7 @@ many different sorts: $rec->{kids} = [ @kids ]; } - # you copied the list, but the list itself contains pointers + # you copied the array, but the array itself contains pointers # to uncopied objects. this means that if you make bart get # older via diff --git a/pod/perlfaq.pod b/pod/perlfaq.pod index cb354931cc..56cf3d71be 100644 --- a/pod/perlfaq.pod +++ b/pod/perlfaq.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq - frequently asked questions about Perl ($Date: 1999/01/08 05:54:52 $) +perlfaq - frequently asked questions about Perl ($Date: 1999/05/23 20:38:02 $) =head1 DESCRIPTION @@ -199,6 +199,8 @@ miscellaneous data issues. =item * How do I find the week-of-the-year/day-of-the-year? +=item * How do I find the current century or millennium? + =item * How can I compare two dates and find the difference? =item * How can I take a string and turn it into epoch seconds? @@ -254,7 +256,7 @@ miscellaneous data issues. =item * What is the difference between $array[1] and @array[1]? -=item * How can I extract just the unique elements of an array? +=item * How can I remove duplicate elements from a list or array? =item * How can I tell whether a list or array contains a certain element? @@ -381,6 +383,8 @@ I/O and the "f" issues: filehandles, flushing, formats and footers. =item * How do I print to more than one file at once? +=item * How can I read in an entire file all at once? + =item * How can I read in a file by paragraphs? =item * How can I read a single character from a file? From the keyboard? @@ -426,7 +430,7 @@ Pattern matching and regular expressions. =item * How can I match a locale-smart version of C</[a-zA-Z]/>? -=item * How can I quote a variable to use in a regexp? +=item * How can I quote a variable to use in a regex? =item * What is C</o> really for? @@ -434,7 +438,7 @@ Pattern matching and regular expressions. =item * Can I use Perl regular expressions to match balanced text? -=item * What does it mean that regexps are greedy? How can I get around it? +=item * What does it mean that regexes are greedy? How can I get around it? =item * How do I process each word on each line? @@ -450,7 +454,7 @@ Pattern matching and regular expressions. =item * What good is C<\G> in a regular expression? -=item * Are Perl regexps DFAs or NFAs? Are they POSIX compliant? +=item * Are Perl regexes DFAs or NFAs? Are they POSIX compliant? =item * What's wrong with using grep or map in a void context? @@ -470,7 +474,7 @@ other sections. =item * Can I get a BNF/yacc/RE for the Perl language? -=item * What are all these $@%* punctuation signs, and how do I know when to use them? +=item * What are all these $@%&* punctuation signs, and how do I know when to use them? =item * Do I always/never have to quote my strings or use semicolons and commas? @@ -494,7 +498,7 @@ other sections. =item * What is variable suicide and how can I prevent it? -=item * How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regexp}? +=item * How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regex}? =item * How do I create a static variable? @@ -522,6 +526,8 @@ other sections. =item * How do I clear a package? +=item * How can I use a variable as a variable name? + =back @@ -620,7 +626,7 @@ Interprocess communication (IPC), control over the user-interface =item * How do I open a file without blocking? -=item * How do I install a CPAN module? +=item * How do I install a module from CPAN? =item * What's the difference between require and use? @@ -758,6 +764,15 @@ in respect of this information or its use. =over 4 +=item 23/May/99 + +Extensive updates from the net in preparation for 5.006 release. + +=item 13/April/99 + +More minor touch-ups. Added new question at the end +of perlfaq7 on variable names within variables. + =item 7/January/99 Small touchups here and there. Added all questions in this @@ -816,4 +831,3 @@ This is the initial release of version 3 of the FAQ; consequently there have been no changes since its initial release. =back - diff --git a/pod/perlfaq1.pod b/pod/perlfaq1.pod index d4cac42a9a..7566bf5cd0 100644 --- a/pod/perlfaq1.pod +++ b/pod/perlfaq1.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq1 - General Questions About Perl ($Revision: 1.20 $, $Date: 1999/01/08 04:22:09 $) +perlfaq1 - General Questions About Perl ($Revision: 1.23 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -56,13 +56,13 @@ You should definitely use version 5. Version 4 is old, limited, and no longer maintained; its last patch (4.036) was in 1992, long ago and far away. Sure, it's stable, but so is anything that's dead; in fact, perl4 had been called a dead, flea-bitten camel carcass. The most recent -production release is 5.005_02 (although 5.004_04 is still supported). -The most cutting-edge development release is 5.005_54. Further references +production release is 5.005_03 (although 5.004_05 is still supported). +The most cutting-edge development release is 5.005_57. Further references to the Perl language in this document refer to the production release -unless otherwise specified. There may be one or more official bug -fixes for 5.005_02 by the time you read this, and also perhaps some -experimental versions on the way to the next release. All releases -prior to 5.004 were subject to buffer overruns, a grave security issue. +unless otherwise specified. There may be one or more official bug fixes +by the time you read this, and also perhaps some experimental versions +on the way to the next release. All releases prior to 5.004 were subject +to buffer overruns, a grave security issue. =head2 What are perl4 and perl5? @@ -96,7 +96,7 @@ found in release 5. Written in nominally portable C++, Topaz hopes to maintain 100% source-compatibility with previous releases of Perl but to run significantly faster and smaller. The Topaz team hopes to provide an XS compatibility interface to allow most XS modules to work unchanged, -albeit perhaps without the efficiency that the new interface uowld allow. +albeit perhaps without the efficiency that the new interface would allow. New features in Topaz are as yet undetermined, and will be addressed once compatibility and performance goals are met. @@ -309,11 +309,11 @@ as soon as possible. =head1 AUTHOR AND COPYRIGHT -Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. +Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved. When included as an integrated part of the Standard Distribution -of Perl or of its documentation (printed or otherwise), this work is +of Perl or of its documentation (printed or otherwise), this works is covered under Perl's Artistic Licence. For separate distributions of all or part of this FAQ outside of that, see L<perlfaq>. @@ -322,4 +322,3 @@ domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required. - diff --git a/pod/perlfaq2.pod b/pod/perlfaq2.pod index 32970af58a..26865c7a83 100644 --- a/pod/perlfaq2.pod +++ b/pod/perlfaq2.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq2 - Obtaining and Learning about Perl ($Revision: 1.30 $, $Date: 1998/12/29 19:43:32 $) +perlfaq2 - Obtaining and Learning about Perl ($Revision: 1.31 $, $Date: 1999/04/14 03:46:19 $) =head1 DESCRIPTION @@ -45,8 +45,10 @@ Some URLs that might help you are: http://www.perl.com/latest/ http://www.perl.com/CPAN/ports/ -If you want information on proprietary systems. A simple installation -guide for MS-DOS is available at http://www.cs.ruu.nl/~piet/perl5dos.html +Someone looking for a Perl for Win16 might look to LMOLNAR's djgpp +port in http://www.perl.com/CPAN/ports/msdos/ , which comes with clear +installation instructions. A simple installation guide for MS-DOS using +IlyaZ's OS/2 port is available at http://www.cs.ruu.nl/~piet/perl5dos.html and similarly for Windows 3.1 at http://www.cs.ruu.nl/~piet/perlwin3.html . =head2 I don't have a C compiler on my system. How can I compile perl? @@ -364,7 +366,7 @@ let perlfaq-suggestions@perl.com know. =head2 Where can I buy a commercial version of Perl? -In a real sense, Perl already I<is> commercial software: It has a licence +In a real sense, Perl already I<is> commercial software: It has a license that you can grab and carefully read to your manager. It is distributed in releases and comes in well-defined packages. There is a very large user community and an extensive literature. The comp.lang.perl.* @@ -379,7 +381,7 @@ purchase order from a company whom they can sue should anything go awry. Or maybe they need very serious hand-holding and contractual obligations. Shrink-wrapped CDs with perl on them are available from several sources if that will help. For example, many perl books carry a perl distribution -on them, as do the O'Reily Perl Resource Kits (in both the Unix flavor +on them, as do the O'Reilly Perl Resource Kits (in both the Unix flavor and in the proprietary Microsoft flavor); the free Unix distributions also all come with Perl. @@ -447,8 +449,8 @@ Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. All rights reserved. When included as an integrated part of the Standard Distribution -of Perl or of its documentation (printed or otherwise), this work is -covered under Perl's Artistic Licence. For separate distributions of +of Perl or of its documentation (printed or otherwise), this works is +covered under Perl's Artistic License. For separate distributions of all or part of this FAQ outside of that, see L<perlfaq>. Irrespective of its distribution, all code examples here are public @@ -456,4 +458,3 @@ domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required. - diff --git a/pod/perlfaq3.pod b/pod/perlfaq3.pod index a811c3ce9b..4e56a54a5d 100644 --- a/pod/perlfaq3.pod +++ b/pod/perlfaq3.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq3 - Programming Tools ($Revision: 1.33 $, $Date: 1998/12/29 20:12:12 $) +perlfaq3 - Programming Tools ($Revision: 1.38 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -19,7 +19,7 @@ Have you read the appropriate man pages? Here's a brief index: Objects perlref, perlmod, perlobj, perltie Data Structures perlref, perllol, perldsc Modules perlmod, perlmodlib, perlsub - Regexps perlre, perlfunc, perlop, perllocale + Regexes perlre, perlfunc, perlop, perllocale Moving to perl5 perltrap, perl Linking w/C perlxstut, perlxs, perlcall, perlguts, perlembed Various http://www.perl.com/CPAN/doc/FMTEYEWTK/index.html @@ -130,7 +130,7 @@ can provide significant assistance. Tom swears by the following settings in vi and its clones: set ai sw=4 - map ^O {^M}^[O^T + map! ^O {^M}^[O^T Now put that in your F<.exrc> file (replacing the caret characters with control characters) and away you go. In insert mode, ^T is @@ -147,31 +147,42 @@ results are not particularly satisfying for sophisticated code. The a2ps at http://www.infres.enst.fr/~demaille/a2ps/ does lots of things related to generating nicely printed output of documents. -=head2 Is there a etags/ctags for perl? +=head2 Is there a ctags for Perl? -With respect to the source code for the Perl interpreter, yes. -There has been support for etags in the source for a long time. -Ctags was introduced in v5.005_54 (and probably 5.005_03). -After building perl, type 'make etags' or 'make ctags' and both -sets of tag files will be built. - -Now, if you're looking to build a tag file for perl code, then there's -a simple one at +There's a simple one at http://www.perl.com/CPAN/authors/id/TOMC/scripts/ptags.gz which may do the trick. And if not, it's easy to hack into what you want. =head2 Is there an IDE or Windows Perl Editor? -If you're on Unix, you already have an IDE -- Unix itself. -You just have to learn the toolbox. If you're not, then you -probably don't have a toolbox, so may need something else. - -PerlBuilder (XXX URL to follow) is an integrated development -environment for Windows that supports Perl development. Perl programs -are just plain text, though, so you could download emacs for Windows -(XXX) or vim for win32 (http://www.cs.vu.nl/~tmgil/vi.html). If -you're transferring Windows files to Unix, be sure to transfer in -ASCII mode so the ends of lines are appropriately converted. +If you're on Unix, you already have an IDE -- Unix itself. This powerful +IDE derives from its interoperability, flexibility, and configurability. +If you really want to get a feel for Unix-qua-IDE, the best thing to do +is to find some high-powered programmer whose native language is Unix. +Find someone who has been at this for many years, and just sit back +and watch them at work. They have created their own IDE, one that +suits their own tastes and aptitudes. Quietly observe them edit files, +move them around, compile them, debug them, test them, etc. The entire +development *is* integrated, like a top-of-the-line German sports car: +functional, powerful, and elegant. You will be absolutely astonished +at the speed and ease exhibited by the native speaker of Unix in his +home territory. The art and skill of a virtuoso can only be seen to be +believed. That is the path to mastery -- all these cobbled little IDEs +are expensive toys designed to sell a flashy demo using cheap tricks, +and being optimized for immediate but shallow understanding rather than +enduring use, are but a dim palimpsest of real tools. + +In short, you just have to learn the toolbox. However, if you're not +on Unix, then your vendor probably didn't bother to provide you with +a proper toolbox on the so-called complete system that you forked out +your hard-earned cash on. + +PerlBuilder (XXX URL to follow) is an integrated development environment +for Windows that supports Perl development. Perl programs are just plain +text, though, so you could download emacs for Windows (???) or a vi clone +(vim) which runs on for win32 (http://www.cs.vu.nl/~tmgil/vi.html). +If you're transferring Windows files to Unix, be sure to transfer in +ASCII mode so the ends of lines are appropriately mangled. =head2 Where can I get Perl macros for vi? @@ -192,7 +203,7 @@ which contains a cperl-mode that color-codes keywords, provides context-sensitive help, and other nifty things. Note that the perl-mode of emacs will have fits with C<"main'foo"> -(single quote), and mess up the indentation and hilighting. You +(single quote), and mess up the indentation and highlighting. You are probably using C<"main::foo"> in new Perl code anyway, so this shouldn't be an issue. @@ -368,7 +379,7 @@ care. See http://www.perl.com/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/ . A non-free, commercial product, ``The Velocity Engine for Perl'', -(http://www.binevolve.com/ or +(http://www.binevolve.com/ or http://www.binevolve.com/bine/vep) might also be worth looking at. It will allow you to increase the performance of your perl scripts, upto 25 times faster than normal CGI perl by running in persistent perl mode, or 4 to 5 times faster without any @@ -404,12 +415,12 @@ your code, but none can definitively conceal it (this is true of every language, not just Perl). If you're concerned about people profiting from your code, then the -bottom line is that nothing but a restrictive licence will give you +bottom line is that nothing but a restrictive license will give you legal security. License your software and pepper it with threatening statements like ``This is unpublished proprietary software of XYZ Corp. Your access to it does not give you permission to use it blah blah blah.'' We are not lawyers, of course, so you should see a lawyer if -you want to be sure your licence's wording will stand up in court. +you want to be sure your license's wording will stand up in court. =head2 How can I compile my Perl program into byte code or C? @@ -435,7 +446,7 @@ because as currently written, all programs are prepared for a full eval() statement. You can tremendously reduce this cost by building a shared I<libperl.so> library and linking against that. See the F<INSTALL> podfile in the perl source distribution for details. If -you link your main perl binary with this, it will make it miniscule. +you link your main perl binary with this, it will make it minuscule. For example, on one author's system, F</usr/bin/perl> is only 11k in size! @@ -470,13 +481,12 @@ F<INSTALL> file in the source distribution for more information). The Win95/NT installation, when using the ActiveState port of Perl, will modify the Registry to associate the C<.pl> extension with the -perl interpreter. If you install another port (Gurusamy Sarathy's is -the recommended Win95/NT port), or (eventually) build your own -Win95/NT Perl using a Windows port of gcc (e.g., with cygwin32 or -mingw32), then you'll have to modify the Registry yourself. In -addition to associating C<.pl> with the interpreter, NT people can -use: C<SET PATHEXT=%PATHEXT%;.PL> to let them run the program -C<install-linux.pl> merely by typing C<install-linux>. +perl interpreter. If you install another port, perhaps even building +your own Win95/NT Perl from the standard sources by using a Windows port +of gcc (e.g., with cygwin32 or mingw32), then you'll have to modify +the Registry yourself. In addition to associating C<.pl> with the +interpreter, NT people can use: C<SET PATHEXT=%PATHEXT%;.PL> to let them +run the program C<install-linux.pl> merely by typing C<install-linux>. Macintosh perl scripts will have the appropriate Creator and Type, so that double-clicking them will invoke the perl application. @@ -570,7 +580,7 @@ when it runs fine on the command line'', see these sources: http://www.boutell.com/faq/ CGI FAQ - http://www.webthing.com/tutorials/cgifaq.html + http://www.webthing.com/page.cgi/cgifaq HTTP Spec http://www.w3.org/pub/WWW/Protocols/HTTP/ @@ -585,7 +595,6 @@ when it runs fine on the command line'', see these sources: CGI Security FAQ http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt -Also take a look at L<perlfaq9> =head2 Where can I learn about object-oriented Perl programming? @@ -641,8 +650,8 @@ Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. All rights reserved. When included as an integrated part of the Standard Distribution -of Perl or of its documentation (printed or otherwise), this work is -covered under Perl's Artistic Licence. For separate distributions of +of Perl or of its documentation (printed or otherwise), this works is +covered under Perl's Artistic License. For separate distributions of all or part of this FAQ outside of that, see L<perlfaq>. Irrespective of its distribution, all code examples here are public @@ -650,4 +659,3 @@ domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required. - diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod index 92aee2c7af..700c42abf8 100644 --- a/pod/perlfaq4.pod +++ b/pod/perlfaq4.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq4 - Data Manipulation ($Revision: 1.40 $, $Date: 1999/01/08 04:26:39 $) +perlfaq4 - Data Manipulation ($Revision: 1.49 $, $Date: 1999/05/23 20:37:49 $) =head1 DESCRIPTION @@ -104,14 +104,21 @@ are not guaranteed. =head2 How do I convert bits into ints? To turn a string of 1s and 0s like C<10110110> into a scalar containing -its binary value, use the pack() function (documented in -L<perlfunc/"pack">): +its binary value, use the pack() and unpack() functions (documented in +L<perlfunc/"pack" L<perlfunc/"unpack">): - $decimal = pack('B8', '10110110'); + $decimal = unpack('c', pack('B8', '10110110')); + +This packs the string C<10110110> into an eight bit binary structure. +This is then unpack as a character, which returns its ordinal value. + +This does the same thing: + + $decimal = ord(pack('B8', '10110110')); Here's an example of going the other way: - $binary_string = join('', unpack('B*', "\x29")); + $binary_string = unpack('B*', "\x29"); =head2 Why doesn't & work the way I want it to? @@ -228,12 +235,34 @@ American businesses often consider the first week with a Monday in it to be Work Week #1, despite ISO 8601, which considers WW1 to be the first week with a Thursday in it. +=head2 How do I find the current century or millennium? + +Use the following simple functions: + + sub get_century { + return int((((localtime(shift || time))[5] + 1999))/100); + } + sub get_millennium { + return 1+int((((localtime(shift || time))[5] + 1899))/1000); + } + +On some systems, you'll find that the POSIX module's strftime() function +has been extended in a non-standard way to use a C<%C> format, which they +sometimes claim is the "century". It isn't, because on most such systems, +this is only the first two digits of the four-digit year, and thus cannot +be used to reliably determine the current century or millennium. + =head2 How can I compare two dates and find the difference? If you're storing your dates as epoch seconds then simply subtract one from the other. If you've got a structured date (distinct year, day, -month, hour, minute, seconds values) then use one of the Date::Manip -and Date::Calc modules from CPAN. +month, hour, minute, seconds values), then for reasons of accessibility, +simplicity, and efficiency, merely use either timelocal or timegm (from +the Time::Local module in the standard distribution) to reduce structured +dates to epoch seconds. However, if you don't know the precise format of +your dates, then you should probably use either of the Date::Manip and +Date::Calc modules from CPAN before you go hacking up your own parsing +routine to handle arbitrary date formats. =head2 How can I take a string and turn it into epoch seconds? @@ -244,22 +273,83 @@ and Date::Manip modules from CPAN. =head2 How can I find the Julian Day? -Neither Date::Manip nor Date::Calc deal with Julian days. Instead, -there is an example of Julian date calculation that should help you in -Time::JulianDay (part of the Time-modules bundle) which can be found at -http://www.perl.com/CPAN/modules/by-module/Time/. +You could use Date::Calc's Delta_Days function and calculate the number +of days from there. Assuming that's what you really want, that is. + +Before you immerse yourself too deeply in this, be sure to verify that it +is the I<Julian> Day you really want. Are they really just interested in +a way of getting serial days so that they can do date arithmetic? If you +are interested in performing date arithmetic, this can be done using +either Date::Manip or Date::Calc, without converting to Julian Day first. + +There is too much confusion on this issue to cover in this FAQ, but the +term is applied (correctly) to a calendar now supplanted by the Gregorian +Calendar, with the Julian Calendar failing to adjust properly for leap +years on centennial years (among other annoyances). The term is also used +(incorrectly) to mean: [1] days in the Gregorian Calendar; and [2] days +since a particular starting time or `epoch', usually 1970 in the Unix +world and 1980 in the MS-DOS/Windows world. If you find that it is not +the first meaning that you really want, then check out the Date::Manip +and Date::Calc modules. (Thanks to David Cassell for most of this text.) +There is also an example of Julian date calculation that should help you in +http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz =head2 How do I find yesterday's date? The C<time()> function returns the current time in seconds since the -epoch. Take one day off that: +epoch. Take twenty-four hours off that: $yesterday = time() - ( 24 * 60 * 60 ); Then you can pass this to C<localtime()> and get the individual year, month, day, hour, minute, seconds values. +Note very carefully that the code above assumes that your days are +twenty-four hours each. For most people, there are two days a year +when they aren't: the switch to and from summer time throws this off. +A solution to this issue is offered by Russ Allbery. + + sub yesterday { + my $now = defined $_[0] ? $_[0] : time; + my $then = $now - 60 * 60 * 24; + my $ndst = (localtime $now)[8] > 0; + my $tdst = (localtime $then)[8] > 0; + $then - ($tdst - $ndst) * 60 * 60; + } + # Should give you "this time yesterday" in seconds since epoch relative to + # the first argument or the current time if no argument is given and + # suitable for passing to localtime or whatever else you need to do with + # it. $ndst is whether we're currently in daylight savings time; $tdst is + # whether the point 24 hours ago was in daylight savings time. If $tdst + # and $ndst are the same, a boundary wasn't crossed, and the correction + # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more + # from yesterday's time since we gained an extra hour while going off + # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a + # negative hour (add an hour) to yesterday's time since we lost an hour. + # + # All of this is because during those days when one switches off or onto + # DST, a "day" isn't 24 hours long; it's either 23 or 25. + # + # The explicit settings of $ndst and $tdst are necessary because localtime + # only says it returns the system tm struct, and the system tm struct at + # least on Solaris doesn't guarantee any particuliar positive value (like, + # say, 1) for isdst, just a positive value. And that value can + # potentially be negative, if DST information isn't available (this sub + # just treats those cases like no DST). + # + # Note that between 2am and 3am on the day after the time zone switches + # off daylight savings time, the exact hour of "yesterday" corresponding + # to the current hour is not clearly defined. Note also that if used + # between 2am and 3am the day after the change to daylight savings time, + # the result will be between 3am and 4am of the previous day; it's + # arguable whether this is correct. + # + # This sub does not attempt to deal with leap seconds (most things don't). + # + # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu> + # This code is in the public domain + =head2 Does Perl have a year 2000 problem? Is Perl Y2K compliant? Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is @@ -312,7 +402,11 @@ This won't expand C<"\n"> or C<"\t"> or any other special escapes. To turn C<"abbcccd"> into C<"abccd">: - s/(.)\1/$1/g; + s/(.)\1/$1/g; # add /s to include newlines + +Here's a solution that turns "abbcccd" to "abcd": + + y///cs; # y == tr, but shorter :-) =head2 How do I expand function calls in a string? @@ -353,7 +447,7 @@ Dominus's excellent I<py> tool at http://www.plover.com/~mjd/perl/py/ One simple destructive, inside-out approach that you might try is to pull out the smallest nesting parts one at a time: - while (s//BEGIN((?:(?!BEGIN)(?!END).)*)END/gs) { + while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) { # do something with $1 } @@ -422,24 +516,25 @@ likely prefer: You have to keep track of N yourself. For example, let's say you want to change the fifth occurrence of C<"whoever"> or C<"whomever"> into -C<"whosoever"> or C<"whomsoever">, case insensitively. +C<"whosoever"> or C<"whomsoever">, case insensitively. These +all assume that $_ contains the string to be altered. $count = 0; s{((whom?)ever)}{ ++$count == 5 # is it the 5th? ? "${2}soever" # yes, swap : $1 # renege and leave it there - }igex; + }ige; In the more general case, you can use the C</g> modifier in a C<while> loop, keeping count of matches. $WANT = 3; $count = 0; + $_ = "One fish two fish red fish blue fish"; while (/(\w+)\s+fish\b/gi) { if (++$count == $WANT) { print "The third fish is a $1 one.\n"; - # Warning: don't `last' out of this loop } } @@ -456,7 +551,7 @@ C<tr///> function like so: $string = "ThisXlineXhasXsomeXx'sXinXit"; $count = ($string =~ tr/X//); - print "There are $count X charcters in the string"; + print "There are $count X characters in the string"; This is fine if you are just looking for a single character. However, if you are trying to count multiple character substrings within a @@ -499,7 +594,7 @@ characters by placing a C<use locale> pragma in your program. See L<perllocale> for endless details on locales. This is sometimes referred to as putting something into "title -case", but that's not quite accurate. Consdier the proper +case", but that's not quite accurate. Consider the proper capitalization of the movie I<Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb>, for example. @@ -546,8 +641,8 @@ Although the simplest approach would seem to be: $string =~ s/^\s*(.*?)\s*$/$1/; -This is unnecessarily slow, destructive, and fails with embedded newlines. -It is much better faster to do this in two steps: +Not only is this unnecessarily slow and destructive, it also fails with +embedded newlines. It is much faster to do this operation in two steps: $string =~ s/^\s+//; $string =~ s/\s+$//; @@ -562,7 +657,7 @@ Or more nicely written as: This idiom takes advantage of the C<foreach> loop's aliasing behavior to factor out common code. You can do this on several strings at once, or arrays, or even the -values of a hash if you use a slide: +values of a hash if you use a slice: # trim whitespace in the scalar, the array, # and all the values in the hash @@ -573,41 +668,48 @@ values of a hash if you use a slide: =head2 How do I pad a string with blanks or pad a number with zeroes? -(This answer contributed by Uri Guttman) +(This answer contributed by Uri Guttman, with kibitzing from +Bart Lateur.) In the following examples, C<$pad_len> is the length to which you wish -to pad the string, C<$text> or C<$num> contains the string to be -padded, and C<$pad_char> contains the padding character. You can use a -single character string constant instead of the C<$pad_char> variable -if you know what it is in advance. +to pad the string, C<$text> or C<$num> contains the string to be padded, +and C<$pad_char> contains the padding character. You can use a single +character string constant instead of the C<$pad_char> variable if you +know what it is in advance. And in the same way you can use an integer in +place of C<$pad_len> if you know the pad length in advance. -The simplest method use the C<sprintf> function. It can pad on the -left or right with blanks and on the left with zeroes. +The simplest method uses the C<sprintf> function. It can pad on the left +or right with blanks and on the left with zeroes and it will not +truncate the result. The C<pack> function can only pad strings on the +right with blanks and it will truncate the result to a maximum length of +C<$pad_len>. - # Left padding with blank: - $padded = sprintf( "%${pad_len}s", $text ) ; + # Left padding a string with blanks (no truncation): + $padded = sprintf("%${pad_len}s", $text); - # Right padding with blank: - $padded = sprintf( "%${pad_len}s", $text ) ; + # Right padding a string with blanks (no truncation): + $padded = sprintf("%-${pad_len}s", $text); - # Left padding with 0: - $padded = sprintf( "%0${pad_len}d", $num ) ; + # Left padding a number with 0 (no truncation): + $padded = sprintf("%0${pad_len}d", $num); -If you need to pad with a character other than blank or zero you can use -one of the following methods. + # Right padding a string with blanks using pack (will truncate): + $padded = pack("A$pad_len",$text); -These methods generate a pad string with the C<x> operator and -concatenate that with the original text. +If you need to pad with a character other than blank or zero you can use +one of the following methods. They all generate a pad string with the +C<x> operator and combine that with C<$text>. These methods do +not truncate C<$text>. -Left and right padding with any character: +Left and right padding with any character, creating a new string: - $padded = $pad_char x ( $pad_len - length( $text ) ) . $text ; - $padded = $text . $pad_char x ( $pad_len - length( $text ) ) ; + $padded = $pad_char x ( $pad_len - length( $text ) ) . $text; + $padded = $text . $pad_char x ( $pad_len - length( $text ) ); -Or you can left or right pad $text directly: +Left and right padding with any character, modifying C<$text> directly: - $text .= $pad_char x ( $pad_len - length( $text ) ) ; - substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ) ; + substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ); + $text .= $pad_char x ( $pad_len - length( $text ) ); =head2 How do I extract selected columns from a string? @@ -634,6 +736,13 @@ you can use this kind of thing: =head2 How do I find the soundex value of a string? Use the standard Text::Soundex module distributed with perl. +But before you do so, you may want to determine whether `soundex' is in +fact what you think it is. Knuth's soundex algorithm compresses words +into a small space, and so it does not necessarily distinguish between +two words which you might want to appear separately. For example, the +last names `Knuth' and `Kant' are both mapped to the soundex code K530. +If Text::Soundex does not do what you are looking for, you might want +to consider the String::Approx module available at CPAN. =head2 How can I expand variables in text strings? @@ -767,7 +876,7 @@ This works with leading special strings, dynamically determined: @@@ runops() { @@@ SAVEI32(runlevel); @@@ runlevel++; - @@@ while ( op = (*op->op_ppaddr)() ) ; + @@@ while ( op = (*op->op_ppaddr)() ); @@@ TAINT_NOT; @@@ return 0; @@@ } @@ -805,9 +914,9 @@ When you say $scalar = (2, 5, 7, 9); -you're using the comma operator in scalar context, so it evaluates the -left hand side, then evaluates and returns the left hand side. This -causes the last value to be returned: 9. +you're using the comma operator in scalar context, so it uses the scalar +comma operator. There never was a list there at all! This causes the +last value to be returned: 9. =head2 What is the difference between $array[1] and @array[1]? @@ -827,7 +936,7 @@ with The B<-w> flag will warn you about these matters. -=head2 How can I extract just the unique elements of an array? +=head2 How can I remove duplicate elements from a list or array? There are several possible ways, depending on whether the array is ordered and whether you wish to preserve the ordering. @@ -893,7 +1002,8 @@ array. This kind of an array will take up less space: @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31); undef @is_tiny_prime; - for (@primes) { $is_tiny_prime[$_] = 1; } + for (@primes) { $is_tiny_prime[$_] = 1 } + # or simply @istiny_prime[@primes] = (1) x @primes; Now you check whether $is_tiny_prime[$some_number]. @@ -916,7 +1026,7 @@ or worse yet These are slow (checks every element even if the first matches), inefficient (same reason), and potentially buggy (what if there are -regexp characters in $whatever?). If you're only testing once, then +regex characters in $whatever?). If you're only testing once, then use: $is_there = 0; @@ -941,6 +1051,9 @@ each element is unique in a given array: push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element; } +Note that this is the I<symmetric difference>, that is, all elements in +either A or in B, but not in both. Think of it as an xor operation. + =head2 How do I test whether two arrays or hashes are equal? The following code works for single-level arrays. It uses a stringwise @@ -1078,7 +1191,7 @@ Use this: fisher_yates_shuffle( \@array ); # permutes @array in place -You've probably seen shuffling algorithms that works using splice, +You've probably seen shuffling algorithms that work using splice, randomly picking another element to swap the current element with: srand; @@ -1185,7 +1298,7 @@ that's come to be known as the Schwartzian Transform: @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } - map { [ $_, uc((/\d+\s*(\S+)/ )[0] ] } @data; + map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data; If you need to sort on several fields, the following paradigm is useful. @@ -1311,7 +1424,19 @@ sorting the keys as shown in an earlier question. =head2 What happens if I add or remove keys from a hash while iterating over it? -Don't do that. +Don't do that. :-) + +[lwall] In Perl 4, you were not allowed to modify a hash at all while +interating over it. In Perl 5 you can delete from it, but you still +can't add to it, because that might cause a doubling of the hash table, +in which half the entries get copied up to the new top half of the +table, at which point you've totally bamboozled the interator code. +Even if the table doesn't double, there's no telling whether your new +entry will be inserted before or after the current iterator position. + +Either treasure up your changes and make them after the iterator finishes, +or use keys to fetch all the old keys at once, and iterate over the list +of keys. =head2 How do I look up a hash element by value? @@ -1327,8 +1452,13 @@ to use: $by_value{$value} = $key; } -If your hash could have repeated values, the methods above will only -find one of the associated keys. This may or may not worry you. +If your hash could have repeated values, the methods above will only find +one of the associated keys. This may or may not worry you. If it does +worry you, you can always reverse the hash into a hash of arrays instead: + + while (($key, $value) = each %by_key) { + push @{$key_list_by_value{$value}}, $key; + } =head2 How can I know how many entries are in a hash? @@ -1337,8 +1467,9 @@ take the scalar sense of the keys() function: $num_keys = scalar keys %hash; -In void context it just resets the iterator, which is faster -for tied hashes. +In void context, the keys() function just resets the iterator, which is +faster for tied hashes than would be iterating through the whole +hash, one key-value pair at a time. =head2 How do I sort a hash (optionally by value instead of key)? @@ -1467,8 +1598,8 @@ re-enter it, the hash iterator has been reset. =head2 How can I get the unique keys from two hashes? -First you extract the keys from the hashes into arrays, and then solve -the uniquifying the array problem described above. For example: +First you extract the keys from the hashes into lists, then solve +the "removing duplicates" problem described above. For example: %seen = (); for $element (keys(%foo), keys(%bar)) { @@ -1560,9 +1691,11 @@ this works fine (assuming the files are found): print "Your kernel is GNU-zip enabled!\n"; } -On some legacy systems, however, you have to play tedious games with -"text" versus "binary" files. See L<perlfunc/"binmode">, or the upcoming -L<perlopentut> manpage. +On less elegant (read: Byzantine) systems, however, you have +to play tedious games with "text" versus "binary" files. See +L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking +systems are curses out of Microsoft, who seem to be committed to putting +the backward into backward compatibility. If you're concerned about 8-bit ASCII data, then see L<perllocale>. @@ -1606,10 +1739,10 @@ if you just want to say, ``Is this a float?'' sub is_numeric { defined &getnum } -Or you could check out String::Scanf which can be found at -http://www.perl.com/CPAN/modules/by-module/String/. -The POSIX module (part of the standard Perl distribution) provides -the C<strtol> and C<strtod> for converting strings to double +Or you could check out +http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz +instead. The POSIX module (part of the standard Perl distribution) +provides the C<strtol> and C<strtod> for converting strings to double and longs, respectively. =head2 How do I keep persistent data across program calls? @@ -1663,7 +1796,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I<outside> of that package require that special arrangements be made with copyright holder. @@ -1673,4 +1806,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. - diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index 99c25b775b..1e8252bfa6 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq5 - Files and Formats ($Revision: 1.34 $, $Date: 1999/01/08 05:46:13 $) +perlfaq5 - Files and Formats ($Revision: 1.38 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -69,7 +69,7 @@ or even this: Note the bizarrely hardcoded carriage return and newline in their octal equivalents. This is the ONLY way (currently) to assure a proper flush -on all platforms, including Macintosh. That the way things work in +on all platforms, including Macintosh. That's the way things work in network programming: you really should specify the exact bit pattern on the network line terminator. In practice, C<"\n\n"> often works, but this is not portable. @@ -491,8 +491,12 @@ I<then> gives you read-write access: open(FH, "+> /path/name"); # WRONG (almost always) Whoops. You should instead use this, which will fail if the file -doesn't exist. Using "E<gt>" always clobbers or creates. -Using "E<lt>" never does either. The "+" doesn't change this. +doesn't exist. + + open(FH, "+< /path/name"); # open for update + +Using "E<gt>" always clobbers or creates. Using "E<lt>" never does +either. The "+" doesn't change this. Here are examples of many kinds of file opens. Those using sysopen() all assume @@ -606,10 +610,14 @@ For more information, see also the new L<perlopentut> if you have it =head2 How can I reliably rename a file? -Well, usually you just use Perl's rename() function. But that may -not work everywhere, in particular, renaming files across file systems. -If your operating system supports a mv(1) program or its moral equivalent, -this works: +Well, usually you just use Perl's rename() function. But that may not +work everywhere, in particular, renaming files across file systems. +Some sub-Unix systems have broken ports that corrupt the semantics of +rename() -- for example, WinNT does this right, but Win95 and Win98 +are broken. (The last two parts are not surprising, but the first is. :-) + +If your operating system supports a proper mv(1) program or its moral +equivalent, this works: rename($old, $new) or system("mv", $old, $new); @@ -643,11 +651,25 @@ filehandle be open for writing (or appending, or read/writing). =item 3 -Some versions of flock() can't lock files over a network (e.g. on NFS -file systems), so you'd need to force the use of fcntl(2) when you -build Perl. See the flock entry of L<perlfunc>, and the F<INSTALL> -file in the source distribution for information on building Perl to do -this. +Some versions of flock() can't lock files over a network (e.g. on NFS file +systems), so you'd need to force the use of fcntl(2) when you build Perl. +But even this is dubious at best. See the flock entry of L<perlfunc>, +and the F<INSTALL> file in the source distribution for information on +building Perl to do this. + +Two potentially non-obvious but traditional flock semantics are that +it waits indefinitely until the lock is granted, and that its locks +I<merely advisory>. Such discretionary locks are more flexible, but +offer fewer guarantees. This means that files locked with flock() may +be modified by programs that do not also use flock(). Cars that stop +for red lights get on well with each other, but not with cars that don't +stop for red lights. See the perlport manpage, your port's specific +documentation, or your system-specific local manpages for details. It's +best to assume traditional behavior if you're writing portable programs. +(But if you're not, you should as always feel perfectly free to write +for your own system's idiosyncrasies (sometimes called "features"). +Slavish adherence to portability concerns shouldn't get in the way of +your getting your job done.) For more information on file locking, see also L<perlopentut/"File Locking"> if you have it (new for 5.006). @@ -797,6 +819,59 @@ at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is written in Perl and offers much greater functionality than the stock version. +=head2 How can I read in an entire file all at once? + +The customary Perl approach for processing all the lines in a file is to +do so one line at a time: + + open (INPUT, $file) || die "can't open $file: $!"; + while (<INPUT>) { + chomp; + # do something with $_ + } + close(INPUT) || die "can't close $file: $!"; + +This is tremendously more efficient than reading the entire file into +memory as an array of lines and then processing it one element at a time, +which is often -- if not almost always -- the wrong approach. Whenever +you see someone do this: + + @lines = <INPUT>; + +You should think long and hard about why you need everything loaded +at once. It's just not a scalable solution. You might also find it +more fun to use the the standard DB_File module's $DB_RECNO bindings, +which allow you to tie an array to a file so that accessing an element +the array actually accesses the corresponding line in the file. + +On very rare occasion, you may have an algorithm that demands that +the entire file be in memory at once as one scalar. The simplest solution +to that is: + + $var = `cat $file`; + +Being in scalar context, you get the whole thing. In list context, +you'd get a list of all the lines: + + @lines = `cat $file`; + +This tiny but expedient solution is neat, clean, and portable to all +systems that you've bothered to install decent tools on, even if you are +a Prisoner of Bill. For those die-hards PoBs who've paid their billtax +and refuse to use the toolbox, or who like writing complicated code for +job security, you can of course read the file manually. + + { + local(*INPUT, $/); + open (INPUT, $file) || die "can't open $file: $!"; + $var = <INPUT>; + } + +That temporarily undefs your record separator, and will automatically +close the file at block exit. If the file is already open, just use this: + + $var = do { local $/; <INPUT> }; + =head2 How can I read in a file by paragraphs? Use the C<$/> variable (see L<perlvar> for details). You can either @@ -1043,6 +1118,14 @@ to, you may be able to do this: $rc = syscall(&SYS_close, $fd + 0); # must force numeric die "can't sysclose $fd: $!" unless $rc == -1; +Or just use the fdopen(3S) feature of open(): + + { + local *F; + open F, "<&=$fd" or die "Cannot reopen fd=$fd: $!"; + close F; + } + =head2 Why can't I use "C:\temp\foo" in DOS paths? What doesn't `C:\temp\foo.exe` work? Whoops! You just put a tab and a formfeed into that filename! @@ -1121,8 +1204,8 @@ Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. All rights reserved. When included as an integrated part of the Standard Distribution -of Perl or of its documentation (printed or otherwise), this work is -covered under Perl's Artistic Licence. For separate distributions of +of Perl or of its documentation (printed or otherwise), this works is +covered under Perl's Artistic License. For separate distributions of all or part of this FAQ outside of that, see L<perlfaq>. Irrespective of its distribution, all code examples here are public @@ -1130,4 +1213,3 @@ domain. You are permitted and encouraged to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit to the FAQ would be courteous but is not required. - diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod index 234570df47..de6093a5ba 100644 --- a/pod/perlfaq6.pod +++ b/pod/perlfaq6.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq6 - Regexps ($Revision: 1.25 $, $Date: 1999/01/08 04:50:47 $) +perlfaq6 - Regexes ($Revision: 1.27 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -18,7 +18,7 @@ understandable. =over 4 -=item Comments Outside the Regexp +=item Comments Outside the Regex Describe what you're doing and how you're doing it, using normal Perl comments. @@ -27,9 +27,9 @@ comments. # number of characters on the rest of the line s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg; -=item Comments Inside the Regexp +=item Comments Inside the Regex -The C</x> modifier causes whitespace to be ignored in a regexp pattern +The C</x> modifier causes whitespace to be ignored in a regex pattern (except in a character class), and also allows you to use normal comments there, too. As you can imagine, whitespace and comments help a lot. @@ -177,11 +177,46 @@ appear within a certain time. =head2 How do I substitute case insensitively on the LHS, but preserving case on the RHS? -It depends on what you mean by "preserving case". The following -script makes the substitution have the same case, letter by letter, as -the original. If the substitution has more characters than the string -being substituted, the case of the last character is used for the rest -of the substitution. +Here's a lovely Perlish solution by Larry Rosler. It exploits +properties of bitwise xor on ASCII strings. + + $_= "this is a TEsT case"; + + $old = 'test'; + $new = 'success'; + + s{(\Q$old\E} + { uc $new | (uc $1 ^ $1) . + (uc(substr $1, -1) ^ substr $1, -1) x + (length($new) - length $1) + }egi; + + print; + +And here it is as a subroutine, modelled after the above: + + sub preserve_case($$) { + my ($old, $new) = @_; + my $mask = uc $old ^ $old; + + uc $new | $mask . + substr($mask, -1) x (length($new) - length($old)) + } + + $a = "this is a TEsT case"; + $a =~ s/(test)/preserve_case($1, "success")/egi; + print "$a\n"; + +This prints: + + this is a SUcCESS case + +Just to show that C programmers can write C in any programming language, +if you prefer a more C-like solution, the following script makes the +substitution have the same case, letter by letter, as the original. +(It also happens to run about 240% slower than the Perlish solution runs.) +If the substitution has more characters than the string being substituted, +the case of the last character is used for the rest of the substitution. # Original by Nathan Torkington, massaged by Jeffrey Friedl # @@ -214,14 +249,6 @@ of the substitution. return $new; } - $a = "this is a TEsT case"; - $a =~ s/(test)/preserve_case($1, "success")/gie; - print "$a\n"; - -This prints: - - this is a SUcCESS case - =head2 How can I make C<\w> match national character sets? See L<perllocale>. @@ -232,41 +259,41 @@ One alphabetic character would be C</[^\W\d_]/>, no matter what locale you're in. Non-alphabetics would be C</[\W\d_]/> (assuming you don't consider an underscore a letter). -=head2 How can I quote a variable to use in a regexp? +=head2 How can I quote a variable to use in a regex? The Perl parser will expand $variable and @variable references in regular expressions unless the delimiter is a single quote. Remember, too, that the right-hand side of a C<s///> substitution is considered a double-quoted string (see L<perlop> for more details). Remember -also that any regexp special characters will be acted on unless you +also that any regex special characters will be acted on unless you precede the substitution with \Q. Here's an example: $string = "to die?"; $lhs = "die?"; - $rhs = "sleep no more"; + $rhs = "sleep, no more"; $string =~ s/\Q$lhs/$rhs/; # $string is now "to sleep no more" -Without the \Q, the regexp would also spuriously match "di". +Without the \Q, the regex would also spuriously match "di". =head2 What is C</o> really for? Using a variable in a regular expression match forces a re-evaluation (and perhaps recompilation) each time through. The C</o> modifier -locks in the regexp the first time it's used. This always happens in a +locks in the regex the first time it's used. This always happens in a constant regular expression, and in fact, the pattern was compiled into the internal format at the same time your entire program was. Use of C</o> is irrelevant unless variable interpolation is used in -the pattern, and if so, the regexp engine will neither know nor care +the pattern, and if so, the regex engine will neither know nor care whether the variables change after the pattern is evaluated the I<very first> time. C</o> is often used to gain an extra measure of efficiency by not performing subsequent evaluations when you know it won't matter (because you know the variables won't change), or more rarely, when -you don't want the regexp to notice if they do. +you don't want the regex to notice if they do. For example, here's a "paragrep" program: @@ -286,23 +313,66 @@ For example, this one-liner will work in many but not all cases. You see, it's too simple-minded for certain kinds of C programs, in particular, those with what appear to be comments in quoted strings. For that, you'd need something like this, -created by Jeffrey Friedl: +created by Jeffrey Friedl and later modified by Fred Curtis. $/ = undef; $_ = <>; - s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|\n+|.[^/"'\\]*)#$2#g; + s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#$2#gs print; This could, of course, be more legibly written with the C</x> modifier, adding -whitespace and comments. +whitespace and comments. Here it is expanded, courtesy of Fred Curtis. + + s{ + /\* ## Start of /* ... */ comment + [^*]*\*+ ## Non-* followed by 1-or-more *'s + ( + [^/*][^*]*\*+ + )* ## 0-or-more things which don't start with / + ## but do end with '*' + / ## End of /* ... */ comment + + | ## OR various things which aren't comments: + + ( + " ## Start of " ... " string + ( + \\. ## Escaped char + | ## OR + [^"\\] ## Non "\ + )* + " ## End of " ... " string + + | ## OR + + ' ## Start of ' ... ' string + ( + \\. ## Escaped char + | ## OR + [^'\\] ## Non '\ + )* + ' ## End of ' ... ' string + + | ## OR + + . ## Anything other char + [^/"'\\]* ## Chars which doesn't start a comment, string or escape + ) + }{$2}gxs; + +A slight modification also removes C++ comments: + + s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#$2#gs; =head2 Can I use Perl regular expressions to match balanced text? Although Perl regular expressions are more powerful than "mathematical" regular expressions, because they feature conveniences like backreferences -(C<\1> and its ilk), they still aren't powerful enough. You still need -to use non-regexp techniques to parse balanced text, such as the text -enclosed between matching parentheses or braces, for example. +(C<\1> and its ilk), they still aren't powerful enough -- with +the possible exception of bizarre and experimental features in the +development-track releases of Perl. You still need to use non-regex +techniques to parse balanced text, such as the text enclosed between +matching parentheses or braces, for example. An elaborate subroutine (for 7-bit ASCII only) to pull out balanced and possibly nested single chars, like C<`> and C<'>, C<{> and C<}>, @@ -312,9 +382,9 @@ http://www.perl.com/CPAN/authors/id/TOMC/scripts/pull_quotes.gz . The C::Scan module from CPAN contains such subs for internal usage, but they are undocumented. -=head2 What does it mean that regexps are greedy? How can I get around it? +=head2 What does it mean that regexes are greedy? How can I get around it? -Most people mean that greedy regexps match as much as they can. +Most people mean that greedy regexes match as much as they can. Technically speaking, it's actually the quantifiers (C<?>, C<*>, C<+>, C<{}>) that are greedy rather than the whole pattern; Perl prefers local greed and immediate gratification to overall greed. To get non-greedy @@ -422,7 +492,7 @@ characters. Neither is correct. C<\b> is the place between a C<\w> character and a C<\W> character (that is, C<\b> is the edge of a "word"). It's a zero-width assertion, just like C<^>, C<$>, and all the other anchors, so it doesn't consume any characters. L<perlre> -describes the behaviour of all the regexp metacharacters. +describes the behavior of all the regex metacharacters. Here are examples of the incorrect application of C<\b>, with fixes: @@ -446,8 +516,8 @@ not "this" or "island". Because once Perl sees that you need one of these variables anywhere in the program, it has to provide them on each and every pattern match. The same mechanism that handles these provides for the use of $1, $2, -etc., so you pay the same price for each regexp that contains capturing -parentheses. But if you never use $&, etc., in your script, then regexps +etc., so you pay the same price for each regex that contains capturing +parentheses. But if you never use $&, etc., in your script, then regexes I<without> capturing parentheses won't be penalized. So avoid $&, $', and $` if you can, but if you can't, once you've used them at all, use them at will because you've already paid the price. Remember that some @@ -515,7 +585,7 @@ Of course, that could have been written as But then you lose the vertical alignment of the regular expressions. -=head2 Are Perl regexps DFAs or NFAs? Are they POSIX compliant? +=head2 Are Perl regexes DFAs or NFAs? Are they POSIX compliant? While it's true that Perl's regular expressions resemble the DFAs (deterministic finite automata) of the egrep(1) program, they are in @@ -620,7 +690,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I<outside> of that package require that special arrangements be made with copyright holder. @@ -630,4 +700,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. - diff --git a/pod/perlfaq7.pod b/pod/perlfaq7.pod index a4ea872b85..070d9653d4 100644 --- a/pod/perlfaq7.pod +++ b/pod/perlfaq7.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq7 - Perl Language Issues ($Revision: 1.24 $, $Date: 1999/01/08 05:32:11 $) +perlfaq7 - Perl Language Issues ($Revision: 1.28 $, $Date: 1999/05/23 20:36:18 $) =head1 DESCRIPTION @@ -18,19 +18,17 @@ In the words of Chaim Frenkel: "Perl's grammar can not be reduced to BNF. The work of parsing perl is distributed between yacc, the lexer, smoke and mirrors." -=head2 What are all these $@%* punctuation signs, and how do I know when to use them? +=head2 What are all these $@%&* punctuation signs, and how do I know when to use them? They are type specifiers, as detailed in L<perldata>: $ for scalar values (number, string or reference) @ for arrays % for hashes (associative arrays) + & for subroutines (aka functions, procedures, methods) * for all types of that symbol name. In version 4 you used them like pointers, but in modern perls you can just use references. -While there are a few places where you don't actually need these type -specifiers, you should always use them. - A couple of others that you're likely to encounter that aren't really type specifiers are: @@ -180,7 +178,7 @@ own module. Make sure to change the names appropriately. # if using RCS/CVS, this next line may be preferred, # but beware two-digit versions. - $VERSION = do{my@r=q$Revision: 1.24 $=~/\d+/g;sprintf '%d.'.'%02d'x$#r,@r}; + $VERSION = do{my@r=q$Revision: 1.28 $=~/\d+/g;sprintf '%d.'.'%02d'x$#r,@r}; @ISA = qw(Exporter); @EXPORT = qw(&func1 &func2 &func3); @@ -330,12 +328,13 @@ harder. Take this code: print "Finally $f\n"; The $f that has "bar" added to it three times should be a new C<$f> -(C<my $f> should create a new local variable each time through the -loop). It isn't, however. This is a bug, and will be fixed. +(C<my $f> should create a new local variable each time through the loop). +It isn't, however. This was a bug, now fixed in the latest releases +(tested against 5.004_05, 5.005_03, and 5.005_56). -=head2 How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regexp}? +=head2 How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regex}? -With the exception of regexps, you need to pass references to these +With the exception of regexes, you need to pass references to these objects. See L<perlsub/"Pass by Reference"> for this particular question, and L<perlref> for information on references. @@ -391,28 +390,42 @@ If you're planning on generating new filehandles, you could do this: $fh = openit('< /etc/motd'); print <$fh>; -=item Passing Regexps +=item Passing Regexes + +To pass regexes around, you'll need to be using a release of Perl +sufficiently recent as to support the C<qr//> construct, pass around +strings and use an exception-trapping eval, or else be very, very clever. -To pass regexps around, you'll need to either use one of the highly -experimental regular expression modules from CPAN (Nick Ing-Simmons's -Regexp or Ilya Zakharevich's Devel::Regexp), pass around strings -and use an exception-trapping eval, or else be very, very clever. -Here's an example of how to pass in a string to be regexp compared: +Here's an example of how to pass in a string to be regex compared +using C<qr//>: sub compare($$) { - my ($val1, $regexp) = @_; - my $retval = eval { $val =~ /$regexp/ }; + my ($val1, $regex) = @_; + my $retval = $val1 =~ /$regex/; + return $retval; + } + $match = compare("old McDonald", qr/d.*D/i); + +Notice how C<qr//> allows flags at the end. That pattern was compiled +at compile time, although it was executed later. The nifty C<qr//> +notation wasn't introduced until the 5.005 release. Before that, you +had to approach this problem much less intuitively. For example, here +it is again if you don't have C<qr//>: + + sub compare($$) { + my ($val1, $regex) = @_; + my $retval = eval { $val1 =~ /$regex/ }; die if $@; return $retval; } - $match = compare("old McDonald", q/d.*D/); + $match = compare("old McDonald", q/($?i)d.*D/); Make sure you never say something like this: - return eval "\$val =~ /$regexp/"; # WRONG + return eval "\$val =~ /$regex/"; # WRONG -or someone can sneak shell escapes into the regexp due to the double +or someone can sneak shell escapes into the regex due to the double interpolation of the eval and the double-quoted string. For example: $pattern_of_evil = 'danger ${ system("rm -rf * &") } danger'; @@ -630,7 +643,7 @@ where they don't belong. This is explained in more depth in the L<perlsyn>. Briefly, there's no official case statement, because of the variety of tests possible in Perl (numeric comparison, string comparison, glob comparison, -regexp matching, overloaded comparisons, ...). Larry couldn't decide +regex matching, overloaded comparisons, ...). Larry couldn't decide how best to do this, so he left it out, even though it's been on the wish list since perl1. @@ -826,6 +839,106 @@ Use this code, provided by Mark-Jason Dominus: Or, if you're using a recent release of Perl, you can just use the Symbol::delete_package() function instead. +=head2 How can I use a variable as a variable name? + +Beginners often think they want to have a variable contain the name +of a variable. + + $fred = 23; + $varname = "fred"; + ++$$varname; # $fred now 24 + +This works I<sometimes>, but it is a very bad idea for two reasons. + +The first reason is that they I<only work on global variables>. +That means above that if $fred is a lexical variable created with my(), +that the code won't work at all: you'll accidentally access the global +and skip right over the private lexical altogether. Global variables +are bad because they can easily collide accidentally and in general make +for non-scalable and confusing code. + +Symbolic references are forbidden under the C<use strict> pragma. +They are not true references and consequently are not reference counted +or garbage collected. + +The other reason why using a variable to hold the name of another +variable a bad idea is that the question often stems from a lack of +understanding of Perl data structures, particularly hashes. By using +symbolic references, you are just using the package's symbol-table hash +(like C<%main::>) instead of a user-defined hash. The solution is to +use your own hash or a real reference instead. + + $fred = 23; + $varname = "fred"; + $USER_VARS{$varname}++; # not $$varname++ + +There we're using the %USER_VARS hash instead of symbolic references. +Sometimes this comes up in reading strings from the user with variable +references and wanting to expand them to the values of your perl +program's variables. This is also a bad idea because it conflates the +program-addressable namespace and the user-addressable one. Instead of +reading a string and expanding it to the actual contents of your program's +own variables: + + $str = 'this has a $fred and $barney in it'; + $str =~ s/(\$\w+)/$1/eeg; # need double eval + +Instead, it would be better to keep a hash around like %USER_VARS and have +variable references actually refer to entries in that hash: + + $str =~ s/\$(\w+)/$USER_VARS{$1}/g; # no /e here at all + +That's faster, cleaner, and safer than the previous approach. Of course, +you don't need to use a dollar sign. You could use your own scheme to +make it less confusing, like bracketed percent symbols, etc. + + $str = 'this has a %fred% and %barney% in it'; + $str =~ s/%(\w+)%/$USER_VARS{$1}/g; # no /e here at all + +Another reason that folks sometimes think they want a variable to contain +the name of a variable is because they don't know how to build proper +data structures using hashes. For example, let's say they wanted two +hashes in their program: %fred and %barney, and to use another scalar +variable to refer to those by name. + + $name = "fred"; + $$name{WIFE} = "wilma"; # set %fred + + $name = "barney"; + $$name{WIFE} = "betty"; # set %barney + +This is still a symbolic reference, and is still saddled with the +problems enumerated above. It would be far better to write: + + $folks{"fred"}{WIFE} = "wilma"; + $folks{"barney"}{WIFE} = "betty"; + +And just use a multilevel hash to start with. + +The only times that you absolutely I<must> use symbolic references are +when you really must refer to the symbol table. This may be because it's +something that can't take a real reference to, such as a format name. +Doing so may also be important for method calls, since these always go +through the symbol table for resolution. + +In those cases, you would turn off C<strict 'refs'> temporarily so you +can play around with the symbol table. For example: + + @colors = qw(red blue green yellow orange purple violet); + for my $name (@colors) { + no strict 'refs'; # renege for the block + *$name = sub { "<FONT COLOR='$name'>@_</FONT>" }; + } + +All those functions (red(), blue(), green(), etc.) appear to be separate, +but the real code in the closure actually was compiled only once. + +So, sometimes you might want to use symbolic references to directly +manipulate the symbol table. This doesn't matter for formats, handles, and +subroutines, because they are always global -- you can't use my() on them. +But for scalars, arrays, and hashes -- and usually for subroutines -- +you probably want to use hard references only. + =head1 AUTHOR AND COPYRIGHT Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. @@ -833,7 +946,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I<outside> of that package require that special arrangements be made with copyright holder. @@ -843,4 +956,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. - diff --git a/pod/perlfaq8.pod b/pod/perlfaq8.pod index 9ef41af63a..26efa3fbb2 100644 --- a/pod/perlfaq8.pod +++ b/pod/perlfaq8.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq8 - System Interaction ($Revision: 1.36 $, $Date: 1999/01/08 05:36:34 $) +perlfaq8 - System Interaction ($Revision: 1.39 $, $Date: 1999/05/23 18:37:57 $) =head1 DESCRIPTION @@ -15,8 +15,9 @@ contain more detailed information on the vagaries of your perl. =head2 How do I find out which operating system I'm running under? -The $^O variable ($OSNAME if you use English) contains the operating -system that your perl binary was built for. +The $^O variable ($OSNAME if you use English) contains an indication of +the name of the operating system (not its release number) that your perl +binary was built for. =head2 How come exec() doesn't return? @@ -74,7 +75,7 @@ Or like this: =head2 How do I read just one key without waiting for a return key? Controlling input buffering is a remarkably system-dependent matter. -If most systems, you can just use the B<stty> command as shown in +On many systems, you can just use the B<stty> command as shown in L<perlfunc/getc>, but as you see, that's already getting you into portability snags. @@ -167,7 +168,7 @@ not to block: =head2 How do I clear the screen? -If you only have to so infrequently, use C<system>: +If you only have do so infrequently, use C<system>: system("clear"); @@ -421,7 +422,7 @@ properly, the getpw*() functions described in L<perlfunc> should in theory provide (read-only) access to entries in the shadow password file. To change the file, make a new shadow password file (the format varies from system to system - see L<passwd(5)> for specifics) and use -pwd_mkdb(8) to install it (see L<pwd_mkdb(5)> for more details). +pwd_mkdb(8) to install it (see L<pwd_mkdb(8)> for more details). =head2 How do I set the time and date? @@ -461,7 +462,7 @@ something like this: $done = $start = pack($TIMEVAL_T, ()); - syscall( &SYS_gettimeofday, $start, 0) != -1 + syscall(&SYS_gettimeofday, $start, 0) != -1 or die "gettimeofday: $!"; ########################## @@ -699,7 +700,7 @@ case the fork()/exec() description still applies. Strictly speaking, nothing. Stylistically speaking, it's not a good way to write maintainable code because backticks have a (potentially -humungous) return value, and you're ignoring it. It's may also not be very +humongous) return value, and you're ignoring it. It's may also not be very efficient, because you have to read in all the lines of output, allocate memory for them, and then throw it away. Too often people are lulled to writing: @@ -725,7 +726,7 @@ In most cases, this could and probably should be written as system("cat /etc/termcap") == 0 or die "cat program failed!"; -Which will get the output quickly (as its generated, instead of only +Which will get the output quickly (as it is generated, instead of only at the end) and also check the return value. system() also provides direct control over whether shell wildcard @@ -751,8 +752,14 @@ You have to do this: } Just as with system(), no shell escapes happen when you exec() a list. +Further examples of this can be found in L<perlipc/"Safe Pipe Opens">. -There are more examples of this L<perlipc/"Safe Pipe Opens">. +Note that if you're stuck on Microsoft, no solution to this vexing issue +is even possible. Even if Perl were to emulate fork(), you'd still +be hosed, because Microsoft gives no argc/argv-style API. Their API +always reparses from a single string, which is fundamentally wrong, +but you're not likely to get the Gods of Redmond to acknowledge this +and fix it for you. =head2 Why can't my script read from STDIN after I gave it EOF (^D on Unix, ^Z on MS-DOS)? @@ -970,12 +977,15 @@ sysopen(): sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644) or die "can't open /tmp/somefile: $!": -=head2 How do I install a CPAN module? + -The easiest way is to have the CPAN module do it for you. This module -comes with perl version 5.004 and later. To manually install the CPAN -module, or any well-behaved CPAN module for that matter, follow these -steps: + +=head2 How do I install a module from CPAN? + +The easiest way is to have a module also named CPAN do it for you. +This module comes with perl version 5.004 and later. To manually install +the CPAN module, or any well-behaved CPAN module for that matter, follow +these steps: =over 4 @@ -1085,7 +1095,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I<outside> of that package require that special arrangements be made with copyright holder. @@ -1095,4 +1105,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. - diff --git a/pod/perlfaq9.pod b/pod/perlfaq9.pod index 6536064360..91d432e443 100644 --- a/pod/perlfaq9.pod +++ b/pod/perlfaq9.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq9 - Networking ($Revision: 1.24 $, $Date: 1999/01/08 05:39:48 $) +perlfaq9 - Networking ($Revision: 1.26 $, $Date: 1999/05/23 16:08:30 $) =head1 DESCRIPTION @@ -20,7 +20,7 @@ may not be so well received. The useful FAQs and related documents are: CGI FAQ - http://www.webthing.com/tutorials/cgifaq.html + http://www.webthing.com/page.cgi/cgifaq Web FAQ http://www.boutell.com/faq/ @@ -100,7 +100,7 @@ a solution: <IMG SRC = "foo.gif" ALT = "A > B"> - <IMG SRC = "foo.gif" + <IMG SRC = "foo.gif" ALT = "A > B"> <!-- <A comment> --> @@ -131,12 +131,11 @@ A quick but imperfect approach is }gsix; This version does not adjust relative URLs, understand alternate -bases, deal with HTML comments, deal with HREF and NAME attributes in -the same tag, or accept URLs themselves as arguments. It also runs -about 100x faster than a more "complete" solution using the LWP suite -of modules, such as the -http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz -program. +bases, deal with HTML comments, deal with HREF and NAME attributes +in the same tag, understand extra qualifiers like TARGET, or accept +URLs themselves as arguments. It also runs about 100x faster than a +more "complete" solution using the LWP suite of modules, such as the +http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program. =head2 How do I download a file from the user's machine? How do I open a file on another machine? @@ -159,8 +158,9 @@ on your system, is this: $html_code = `lynx -source $url`; $text_data = `lynx -dump $url`; -The libwww-perl (LWP) modules from CPAN provide a more powerful way to -do this. They work through proxies, and don't require lynx: +The libwww-perl (LWP) modules from CPAN provide a more powerful way +to do this. They don't require lynx, but like lynx, can still work +through proxies: # simplest version use LWP::Simple; @@ -213,7 +213,7 @@ Here's an example of decoding: $string =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/ge; Encoding is a bit harder, because you can't just blindly change -all the non-alphanumeric characters (C<\W>) into their hex escapes. +all the non-alphanumunder character (C<\W>) into their hex escapes. It's important that characters with special meaning like C</> and C<?> I<not> be translated. Probably the easiest way to get this right is to avoid reinventing the wheel and just use the URI::Escape module, @@ -236,9 +236,21 @@ because of "optimizations" that servers do. print "Location: $url\n\n"; exit; -To be correct to the spec, each of those C<"\n"> -should really each be C<"\015\012">, but unless you're -stuck on MacOS, you probably won't notice. +To target a particular frame in a frameset, include the "Window-target:" +in the header. + + print <<EOF; + Location: http://www.domain.com/newpage + Window-target: <FrameName> + + EOF + +To be correct to the spec, each of those virtual newlines should really be +physical C<"\015\012"> sequences by the time you hit the client browser. +Except for NPH scripts, though, that local newline should get translated +by your server into standard form, so you shouldn't have a problem +here, even if you are stuck on MacOS. Everybody else probably won't +even notice. =head2 How do I put a password on my web pages? @@ -329,7 +341,7 @@ RFC-822 (the mail header standard) compliant, and addresses that aren't deliverable which are compliant. Many are tempted to try to eliminate many frequently-invalid -mail addresses with a simple regexp, such as +mail addresses with a simple regex, such as C</^[\w.-]+\@([\w.-]\.)+\w+$/>. It's a very bad idea. However, this also throws out many valid ones, and says nothing about potential deliverability, so is not suggested. Instead, see @@ -423,7 +435,12 @@ the message into the queue. This last option means your message won't be immediately delivered, so leave it out if you want immediate delivery. -Or use the CPAN module Mail::Mailer: +Alternate, less convenient approaches include calling mail (sometimes +called mailx) directly or simply opening up port 25 have having an +intimate conversation between just you and the remote SMTP daemon, +probably sendmail. + +Or you might be able use the CPAN module Mail::Mailer: use Mail::Mailer; @@ -438,34 +455,17 @@ Or use the CPAN module Mail::Mailer: The Mail::Internet module uses Net::SMTP which is less Unix-centric than Mail::Mailer, but less reliable. Avoid raw SMTP commands. There -are many reasons to use a mail transport agent like sendmail. These +are many reasons to use a mail transport agent like sendmail. These include queueing, MX records, and security. =head2 How do I read mail? -Use the Mail::Folder module from CPAN (part of the MailFolder package) or -the Mail::Internet module from CPAN (also part of the MailTools package). - - # sending mail - use Mail::Internet; - use Mail::Header; - # say which mail host to use - $ENV{SMTPHOSTS} = 'mail.frii.com'; - # create headers - $header = new Mail::Header; - $header->add('From', 'gnat@frii.com'); - $header->add('Subject', 'Testing'); - $header->add('To', 'gnat@frii.com'); - # create body - $body = 'This is a test, ignore'; - # create mail object - $mail = new Mail::Internet(undef, Header => $header, Body => \[$body]); - # send it - $mail->smtpsend or die; - -Often a module is overkill, though. Here's a mail sorter. - - #!/usr/bin/perl +While you could use the Mail::Folder module from CPAN (part of the +MailFolder package) or the Mail::Internet module from CPAN (also part +of the MailTools package), often a module is overkill, though. Here's a +mail sorter. + + #!/usr/bin/perl # bysub1 - simple sort by subject my(@msgs, @sub); my $msgno = -1; @@ -476,12 +476,12 @@ Often a module is overkill, though. Here's a mail sorter. $sub[++$msgno] = lc($1) || ''; } $msgs[$msgno] .= $_; - } + } for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) { print $msgs[$i]; } -Or more succinctly, +Or more succinctly, #!/usr/bin/perl -n00 # bysub2 - awkish sort-by-subject @@ -541,7 +541,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I<outside> of that package require that special arrangements be made with copyright holder. @@ -551,4 +551,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. - diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index d409319a09..ed3de62a23 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -30,7 +30,7 @@ Elements of the LIST should be separated by commas. Any function in the list below may be used either with or without parentheses around its arguments. (The syntax descriptions omit the parentheses.) If you use the parentheses, the simple (but occasionally -surprising) rule is this: It I<LOOKS> like a function, therefore it I<IS> a +surprising) rule is this: It I<looks> like a function, therefore it I<is> a function, and precedence doesn't matter. Otherwise it's a list operator or unary operator, and precedence does matter. And whitespace between the function and left parenthesis doesn't count--so you need to @@ -80,8 +80,8 @@ In general, functions in Perl that serve as wrappers for system calls of the same name (like chown(2), fork(2), closedir(2), etc.) all return true when they succeed and C<undef> otherwise, as is usually mentioned in the descriptions below. This is different from the C interfaces, -which return C<-1> on failure. Exceptions to this rule are C<wait()>, -C<waitpid()>, and C<syscall()>. System calls also set the special C<$!> +which return C<-1> on failure. Exceptions to this rule are C<wait>, +C<waitpid>, and C<syscall>. System calls also set the special C<$!> variable on failure. Other functions do not, except accidentally. =head2 Perl Functions by Category @@ -255,7 +255,7 @@ A file test, where X is one of the letters listed below. This unary operator takes one argument, either a filename or a filehandle, and tests the associated file to see if something is true about it. If the argument is omitted, tests C<$_>, except for C<-t>, which tests STDIN. -Unless otherwise documented, it returns C<1> for TRUE and C<''> for FALSE, or +Unless otherwise documented, it returns C<1> for true and C<''> for false, or the undefined value if the file doesn't exist. Despite the funny names, precedence is the same as any other named unary operator, and the argument may be parenthesized like any other unary operator. The @@ -339,12 +339,12 @@ characters with the high bit set. If too many strange characters (E<gt>30%) are found, it's a C<-B> file, otherwise it's a C<-T> file. Also, any file containing null in the first block is considered a binary file. If C<-T> or C<-B> is used on a filehandle, the current stdio buffer is examined -rather than the first block. Both C<-T> and C<-B> return TRUE on a null +rather than the first block. Both C<-T> and C<-B> return true on a null file, or a file at EOF when testing a filehandle. Because you have to read a file to do the C<-T> test, on most occasions you want to use a C<-f> against the file first, as in C<next unless -f $file && -T $file>. -If any of the file tests (or either the C<stat()> or C<lstat()> operators) are given +If any of the file tests (or either the C<stat> or C<lstat> operators) are given the special filehandle consisting of a solitary underline, then the stat structure of the previous file test (or stat operator) is used, saving a system call. (This doesn't work with C<-t>, and you need to remember @@ -373,7 +373,7 @@ If VALUE is omitted, uses C<$_>. =item accept NEWSOCKET,GENERICSOCKET Accepts an incoming socket connect, just as the accept(2) system call -does. Returns the packed address if it succeeded, FALSE otherwise. +does. Returns the packed address if it succeeded, false otherwise. See the example in L<perlipc/"Sockets: Client/Server Communication">. =item alarm SECONDS @@ -391,18 +391,18 @@ starting a new one. The returned value is the amount of time remaining on the previous timer. For delays of finer granularity than one second, you may use Perl's -four-arugment version of select() leaving the first three arguments -undefined, or you might be able to use the C<syscall()> interface to +four-argument version of select() leaving the first three arguments +undefined, or you might be able to use the C<syscall> interface to access setitimer(2) if your system supports it. The Time::HiRes module from CPAN may also prove useful. -It is usually a mistake to intermix C<alarm()> -and C<sleep()> calls. +It is usually a mistake to intermix C<alarm> +and C<sleep> calls. -If you want to use C<alarm()> to time out a system call you need to use an -C<eval()>/C<die()> pair. You can't rely on the alarm causing the system call to +If you want to use C<alarm> to time out a system call you need to use an +C<eval>/C<die> pair. You can't rely on the alarm causing the system call to fail with C<$!> set to C<EINTR> because Perl sets up signal handlers to -restart system calls on some systems. Using C<eval()>/C<die()> always works, +restart system calls on some systems. Using C<eval>/C<die> always works, modulo the caveats given in L<perlipc/"Signals">. eval { @@ -431,29 +431,51 @@ function, or use the familiar relation: =item bind SOCKET,NAME Binds a network address to a socket, just as the bind system call -does. Returns TRUE if it succeeded, FALSE otherwise. NAME should be a +does. Returns true if it succeeded, false otherwise. NAME should be a packed address of the appropriate type for the socket. See the examples in L<perlipc/"Sockets: Client/Server Communication">. =item binmode FILEHANDLE -Arranges for the file to be read or written in "binary" mode in operating -systems that distinguish between binary and text files. Files that -are not in binary mode have CR LF sequences translated to LF on input -and LF translated to CR LF on output. Binmode has no effect under -many sytems, but in MS-DOS and similarly archaic systems, it may be -imperative--otherwise your MS-DOS-damaged C library may mangle your file. -The key distinction between systems that need C<binmode()> and those -that don't is their text file formats. Systems like Unix, MacOS, and -Plan9 that delimit lines with a single character, and that encode that -character in C as C<"\n">, do not need C<binmode()>. The rest may need it. -If FILEHANDLE is an expression, the value is taken as the name of the -filehandle. - -If the system does care about it, using it when you shouldn't is just as -perilous as failing to use it when you should. Fortunately for most of -us, you can't go wrong using binmode() on systems that don't care about -it, though. +Arranges for FILEHANDLE to be read or written in "binary" mode on +systems whose run-time libraries force the programmer to guess +between binary and text files. If FILEHANDLE is an expression, the +value is taken as the name of the filehandle. binmode() should be +called after the C<open> but before any I/O is done on the filehandle. +The only way to reset binary mode on a filehandle is to reopen the +file. + +The operating system, device drivers, C libraries, and Perl run-time +system all conspire to let the programmer conveniently treat a +simple, one-byte C<\n> as the line terminator, irrespective of its +external representation. On Unix and its brethren, the native file +representation exactly matches the internal representation, making +everyone's lives unbelievably simpler. Consequently, L<binmode> +has no effect under Unix, Plan9, or Mac OS, all of which use C<\n> +to end each line. (Unix and Plan9 think C<\n> means C<\cJ> and +C<\r> means C<\cM>, whereas the Mac goes the other way--it uses +C<\cM> for c<\n> and C<\cJ> to mean C<\r>. But that's ok, because +it's only one byte, and the internal and external representations +match.) + +In legacy systems like MS-DOS and its embellishments, your program +sees a C<\n> as a simple C<\cJ> (just as in Unix), but oddly enough, +that's not what's physically stored on disk. What's worse, these +systems refuse to help you with this; it's up to you to remember +what to do. And you mustn't go applying binmode() with wild abandon, +either, because if your system does care about binmode(), then using +it when you shouldn't is just as perilous as failing to use it when +you should. + +That means that on any version of Microsoft WinXX that you might +care to name (or not), binmode() causes C<\cM\cJ> sequences on disk +to be converted to C<\n> when read into your program, and causes +any C<\n> in your program to be converted back to C<\cM\cJ> on +output to disk. This sad discrepancy leads to no end of +problems in not just the readline operator, but also when using +seek(), tell(), and read() calls. See L<perlport> for other painful +details. See the C<$/> and C<$\> variables in L<perlvar> for how +to manually set your input and output line-termination sequences. =item bless REF,CLASSNAME @@ -461,7 +483,7 @@ it, though. This function tells the thingy referenced by REF that it is now an object in the CLASSNAME package. If CLASSNAME is omitted, the current package -is used. Because a C<bless()> is often the last thing in a constructor. +is used. Because a C<bless> is often the last thing in a constructor, it returns the reference for convenience. Always use the two-argument version if the function doing the blessing might be inherited by a derived class. See L<perltoot> and L<perlobj> for more about the blessing @@ -481,7 +503,7 @@ See L<perlmod/"Perl Modules">. Returns the context of the current subroutine call. In scalar context, returns the caller's package name if there is a caller, that is, if -we're in a subroutine or C<eval()> or C<require()>, and the undefined value +we're in a subroutine or C<eval> or C<require>, and the undefined value otherwise. In list context, returns ($package, $filename, $line) = caller; @@ -493,12 +515,12 @@ to go back before the current one. ($package, $filename, $line, $subroutine, $hasargs, $wantarray, $evaltext, $is_require) = caller($i); -Here C<$subroutine> may be C<"(eval)"> if the frame is not a subroutine -call, but an C<eval()>. In such a case additional elements C<$evaltext> and +Here $subroutine may be C<"(eval)"> if the frame is not a subroutine +call, but an C<eval>. In such a case additional elements $evaltext and C<$is_require> are set: C<$is_require> is true if the frame is created by a -C<require> or C<use> statement, C<$evaltext> contains the text of the +C<require> or C<use> statement, $evaltext contains the text of the C<eval EXPR> statement. In particular, for a C<eval BLOCK> statement, -C<$filename> is C<"(eval)">, but C<$evaltext> is undefined. (Note also that +$filename is C<"(eval)">, but $evaltext is undefined. (Note also that each C<use> statement creates a C<require> frame inside an C<eval EXPR>) frame. @@ -507,16 +529,16 @@ detailed information: it sets the list variable C<@DB::args> to be the arguments with which the subroutine was invoked. Be aware that the optimizer might have optimized call frames away before -C<caller()> had a chance to get the information. That means that C<caller(N)> +C<caller> had a chance to get the information. That means that C<caller(N)> might not return information about the call frame you expect it do, for -C<N E<gt> 1>. In particular, C<@DB::args> might have information from the -previous time C<caller()> was called. +C<N E<gt> 1>. In particular, C<@DB::args> might have information from the +previous time C<caller> was called. =item chdir EXPR Changes the working directory to EXPR, if possible. If EXPR is omitted, -changes to the user's home directory. Returns TRUE upon success, -FALSE otherwise. See the example under C<die()>. +changes to the user's home directory. Returns true upon success, +false otherwise. See the example under C<die>. =item chmod LIST @@ -548,7 +570,8 @@ that the final record may be missing its newline. When in paragraph mode (C<$/ = "">), it removes all trailing newlines from the string. When in slurp mode (C<$/ = undef>) or fixed-length record mode (C<$/> is a reference to an integer or the like, see L<perlvar>) chomp() won't -remove anything. If VARIABLE is omitted, it chomps C<$_>. Example: +remove anything. +If VARIABLE is omitted, it chomps C<$_>. Example: while (<>) { chomp; # avoid \n on last field @@ -588,16 +611,18 @@ You can actually chop anything that's an lvalue, including an assignment: chop($answer = <STDIN>); If you chop a list, each element is chopped. Only the value of the -last C<chop()> is returned. +last C<chop> is returned. -Note that C<chop()> returns the last character. To return all but the last +Note that C<chop> returns the last character. To return all but the last character, use C<substr($string, 0, -1)>. =item chown LIST Changes the owner (and group) of a list of files. The first two -elements of the list must be the I<NUMERICAL> uid and gid, in that order. -Returns the number of files successfully changed. +elements of the list must be the I<numeric> uid and gid, in that +order. A value of -1 in either position is interpreted by most +systems to leave that value unchanged. Returns the number of files +successfully changed. $cnt = chown $uid, $gid, 'foo', 'bar'; chown $uid, $gid, @filenames; @@ -605,9 +630,9 @@ Returns the number of files successfully changed. Here's an example that looks up nonnumeric uids in the passwd file: print "User: "; - chop($user = <STDIN>); + chomp($user = <STDIN>); print "Files: "; - chop($pattern = <STDIN>); + chomp($pattern = <STDIN>); ($login,$pass,$uid,$gid) = getpwnam($user) or die "$user not in passwd file"; @@ -619,6 +644,10 @@ On most systems, you are not allowed to change the ownership of the file unless you're the superuser, although you should be able to change the group to any of your secondary groups. On insecure systems, these restrictions may be relaxed, but this is not a portable assumption. +On POSIX systems, you can detect this condition this way: + + use POSIX qw(sysconf _PC_CHOWN_RESTRICTED); + $can_chown_giveaway = not sysconf(_PC_CHOWN_RESTRICTED); =item chr NUMBER @@ -641,30 +670,35 @@ named directory the new root directory for all further pathnames that begin with a C<"/"> by your process and all its children. (It doesn't change your current working directory, which is unaffected.) For security reasons, this call is restricted to the superuser. If FILENAME is -omitted, does a C<chroot()> to C<$_>. +omitted, does a C<chroot> to C<$_>. =item close FILEHANDLE =item close -Closes the file or pipe associated with the file handle, returning TRUE +Closes the file or pipe associated with the file handle, returning true only if stdio successfully flushes buffers and closes the system file -descriptor. Closes the currently selected filehandle if the argument +descriptor. Closes the currently selected filehandle if the argument is omitted. You don't have to close FILEHANDLE if you are immediately going to do -another C<open()> on it, because C<open()> will close it for you. (See -C<open()>.) However, an explicit C<close()> on an input file resets the line -counter (C<$.>), while the implicit close done by C<open()> does not. +another C<open> on it, because C<open> will close it for you. (See +C<open>.) However, an explicit C<close> on an input file resets the line +counter (C<$.>), while the implicit close done by C<open> does not. -If the file handle came from a piped open C<close()> will additionally -return FALSE if one of the other system calls involved fails or if the +If the file handle came from a piped open C<close> will additionally +return false if one of the other system calls involved fails or if the program exits with non-zero status. (If the only problem was that the program exited non-zero C<$!> will be set to C<0>.) Closing a pipe also waits for the process executing on the pipe to complete, in case you want to look at the output of the pipe afterwards, and implicitly puts the exit status value of that command into C<$?>. +Prematurely closing the read end of a pipe (i.e. before the process +writing to it at the other end has closed it) will result in a +SIGPIPE being delivered to the writer. If the other end can't +handle that, be sure to read all the data before closing the pipe. + Example: open(OUTPUT, '|sort >foo') # pipe to sort @@ -681,7 +715,7 @@ filehandle, usually the real filehandle name. =item closedir DIRHANDLE -Closes a directory opened by C<opendir()> and returns the success of that +Closes a directory opened by C<opendir> and returns the success of that system call. DIRHANDLE may be an expression whose value can be used as an indirect @@ -690,7 +724,7 @@ dirhandle, usually the real dirhandle name. =item connect SOCKET,NAME Attempts to connect to a remote socket, just as the connect system call -does. Returns TRUE if it succeeded, FALSE otherwise. NAME should be a +does. Returns true if it succeeded, false otherwise. NAME should be a packed address of the appropriate type for the socket. See the examples in L<perlipc/"Sockets: Client/Server Communication">. @@ -705,8 +739,8 @@ continued via the C<next> statement (which is similar to the C C<continue> statement). C<last>, C<next>, or C<redo> may appear within a C<continue> -block. C<last> and C<redo> will behave as if they had been executed within -the main block. So will C<next>, but since it will execute a C<continue> +block. C<last> and C<redo> will behave as if they had been executed within +the main block. So will C<next>, but since it will execute a C<continue> block, it may be more entertaining. while (EXPR) { @@ -720,7 +754,7 @@ block, it may be more entertaining. ### last always comes here Omitting the C<continue> section is semantically equivalent to using an -empty one, logically enough. In that case, C<next> goes directly back +empty one, logically enough. In that case, C<next> goes directly back to check the condition at the top of the loop. =item cos EXPR @@ -741,14 +775,14 @@ extirpated as a potential munition). This can prove useful for checking the password file for lousy passwords, amongst other things. Only the guys wearing white hats should do this. -Note that C<crypt()> is intended to be a one-way function, much like breaking +Note that C<crypt> is intended to be a one-way function, much like breaking eggs to make an omelette. There is no (known) corresponding decrypt function. As a result, this function isn't all that useful for cryptography. (For that, see your nearby CPAN mirror.) When verifying an existing encrypted string you should use the encrypted text as the salt (like C<crypt($plain, $crypted) eq $crypted>). This -allows your code to work with the standard C<crypt()> and with more +allows your code to work with the standard C<crypt> and with more exotic implementations. When choosing a new salt create a random two character string whose characters come from the set C<[./0-9A-Za-z]> (like C<join '', ('.', '/', 0..9, 'A'..'Z', 'a'..'z')[rand 64, rand 64]>). @@ -773,34 +807,40 @@ their own password: Of course, typing in your own password to whoever asks you for it is unwise. +The L<crypt> function is unsuitable for encrypting large quantities +of data, not least of all because you can't get the information +back. Look at the F<by-module/Crypt> and F<by-module/PGP> directories +on your favorite CPAN mirror for a slew of potentially useful +modules. + =item dbmclose HASH -[This function has been largely superseded by the C<untie()> function.] +[This function has been largely superseded by the C<untie> function.] Breaks the binding between a DBM file and a hash. -=item dbmopen HASH,DBNAME,MODE +=item dbmopen HASH,DBNAME,MASK -[This function has been largely superseded by the C<tie()> function.] +[This function has been largely superseded by the C<tie> function.] This binds a dbm(3), ndbm(3), sdbm(3), gdbm(3), or Berkeley DB file to a -hash. HASH is the name of the hash. (Unlike normal C<open()>, the first -argument is I<NOT> a filehandle, even though it looks like one). DBNAME +hash. HASH is the name of the hash. (Unlike normal C<open>, the first +argument is I<not> a filehandle, even though it looks like one). DBNAME is the name of the database (without the F<.dir> or F<.pag> extension if any). If the database does not exist, it is created with protection -specified by MODE (as modified by the C<umask()>). If your system supports -only the older DBM functions, you may perform only one C<dbmopen()> in your +specified by MASK (as modified by the C<umask>). If your system supports +only the older DBM functions, you may perform only one C<dbmopen> in your program. In older versions of Perl, if your system had neither DBM nor -ndbm, calling C<dbmopen()> produced a fatal error; it now falls back to +ndbm, calling C<dbmopen> produced a fatal error; it now falls back to sdbm(3). If you don't have write access to the DBM file, you can only read hash variables, not set them. If you want to test whether you can write, -either use file tests or try setting a dummy hash entry inside an C<eval()>, +either use file tests or try setting a dummy hash entry inside an C<eval>, which will trap the error. -Note that functions such as C<keys()> and C<values()> may return huge lists -when used on large DBM files. You may prefer to use the C<each()> +Note that functions such as C<keys> and C<values> may return huge lists +when used on large DBM files. You may prefer to use the C<each> function to iterate over large DBM files. Example: # print out history file offsets @@ -835,13 +875,13 @@ conditions. This function allows you to distinguish C<undef> from other values. (A simple Boolean test will not distinguish among C<undef>, zero, the empty string, and C<"0">, which are all equally false.) Note that since C<undef> is a valid scalar, its presence -doesn't I<necessarily> indicate an exceptional condition: C<pop()> +doesn't I<necessarily> indicate an exceptional condition: C<pop> returns C<undef> when its argument is an empty array, I<or> when the element to return happens to be C<undef>. -You may also use C<defined()> to check whether a subroutine exists, by +You may also use C<defined> to check whether a subroutine exists, by saying C<defined &func> without parentheses. On the other hand, use -of C<defined()> upon aggregates (hashes and arrays) is not guaranteed to +of C<defined> upon aggregates (hashes and arrays) is not guaranteed to produce intuitive results, and should probably be avoided. When used on a hash element, it tells you whether the value is defined, @@ -857,7 +897,7 @@ Examples: sub foo { defined &$bar ? &$bar(@_) : die "No bar"; } $debugging = 0 unless defined $debugging; -Note: Many folks tend to overuse C<defined()>, and then are surprised to +Note: Many folks tend to overuse C<defined>, and then are surprised to discover that the number C<0> and C<""> (the zero-length string) are, in fact, defined values. For example, if you say @@ -868,11 +908,11 @@ matched "nothing". But it didn't really match nothing--rather, it matched something that happened to be zero characters long. This is all very above-board and honest. When a function returns an undefined value, it's an admission that it couldn't give you an honest answer. So you -should use C<defined()> only when you're questioning the integrity of what +should use C<defined> only when you're questioning the integrity of what you're trying to do. At other times, a simple comparison to C<0> or C<""> is what you want. -Currently, using C<defined()> on an entire array or hash reports whether +Currently, using C<defined> on an entire array or hash reports whether memory for that aggregate has ever been allocated. So an array you set to the empty list appears undefined initially, and one that once was full and that you then set to the empty list still appears defined. You @@ -881,13 +921,13 @@ should instead use a simple test for size: if (@an_array) { print "has array elements\n" } if (%a_hash) { print "has hash members\n" } -Using C<undef()> on these, however, does clear their memory and then report +Using C<undef> on these, however, does clear their memory and then report them as not defined anymore, but you shouldn't do that unless you don't plan to use them again, because it saves time when you load them up again to have memory already ready to be filled. The normal way to free up space used by an aggregate is to assign the empty list. -This counterintuitive behavior of C<defined()> on aggregates may be +This counterintuitive behavior of C<defined> on aggregates may be changed, fixed, or broken in a future release of Perl. See also L</undef>, L</exists>, L</ref>. @@ -898,7 +938,7 @@ Deletes the specified key(s) and their associated values from a hash. For each key, returns the deleted value associated with that key, or the undefined value if there was no such key. Deleting from C<$ENV{}> modifies the environment. Deleting from a hash tied to a DBM file -deletes the entry from the DBM file. (But deleting from a C<tie()>d hash +deletes the entry from the DBM file. (But deleting from a C<tie>d hash doesn't necessarily return anything.) The following deletes all the values of a hash: @@ -925,12 +965,13 @@ operation is a hash element lookup or hash slice: =item die LIST -Outside an C<eval()>, prints the value of LIST to C<STDERR> and exits with -the current value of C<$!> (errno). If C<$!> is C<0>, exits with the value of -C<($? E<gt>E<gt> 8)> (backtick `command` status). If C<($? E<gt>E<gt> 8)> -is C<0>, exits with C<255>. Inside an C<eval(),> the error message is stuffed into -C<$@> and the C<eval()> is terminated with the undefined value. This makes -C<die()> the way to raise an exception. +Outside an C<eval>, prints the value of LIST to C<STDERR> and +exits with the current value of C<$!> (errno). If C<$!> is C<0>, +exits with the value of C<($? E<gt>E<gt> 8)> (backtick `command` +status). If C<($? E<gt>E<gt> 8)> is C<0>, exits with C<255>. Inside +an C<eval(),> the error message is stuffed into C<$@> and the +C<eval> is terminated with the undefined value. This makes +C<die> the way to raise an exception. Equivalent examples: @@ -984,25 +1025,26 @@ regular expressions. Here's an example: } } -Since perl will stringify uncaught exception messages before displaying +Because perl will stringify uncaught exception messages before displaying them, you may want to overload stringification operations on such custom exception objects. See L<overload> for details about that. -You can arrange for a callback to be run just before the C<die()> does -its deed, by setting the C<$SIG{__DIE__}> hook. The associated handler -will be called with the error text and can change the error message, if -it sees fit, by calling C<die()> again. See L<perlvar/$SIG{expr}> for details on -setting C<%SIG> entries, and L<"eval BLOCK"> for some examples. - -Note that the C<$SIG{__DIE__}> hook is currently called even inside -eval()ed blocks/strings! If one wants the hook to do nothing in such -situations, put +You can arrange for a callback to be run just before the C<die> +does its deed, by setting the C<$SIG{__DIE__}> hook. The associated +handler will be called with the error text and can change the error +message, if it sees fit, by calling C<die> again. See +L<perlvar/$SIG{expr}> for details on setting C<%SIG> entries, and +L<"eval BLOCK"> for some examples. Although this feature was meant +to be run only right before your program was to exit, this is not +currently the case--the C<$SIG{__DIE__}> hook is currently called +even inside eval()ed blocks/strings! If one wants the hook to do +nothing in such situations, put die @_ if $^S; -as the first line of the handler (see L<perlvar/$^S>). Because this -promotes action at a distance, this counterintuitive behavior may be fixed -in a future release. +as the first line of the handler (see L<perlvar/$^S>). Because +this promotes strange action at a distance, this counterintuitive +behavior may be fixed in a future release. =item do BLOCK @@ -1046,7 +1088,7 @@ successfully compiled, C<do> returns the value of the last expression evaluated. Note that inclusion of library modules is better done with the -C<use()> and C<require()> operators, which also do automatic error checking +C<use> and C<require> operators, which also do automatic error checking and raise an exception if there's a problem. You might like to use C<do> to read in a program configuration @@ -1067,40 +1109,31 @@ file. Manual error checking can be done this way: =item dump -This causes an immediate core dump. Primarily this is so that you can -use the B<undump> program to turn your core dump into an executable binary -after having initialized all your variables at the beginning of the -program. When the new binary is executed it will begin by executing a -C<goto LABEL> (with all the restrictions that C<goto> suffers). Think of -it as a goto with an intervening core dump and reincarnation. If C<LABEL> -is omitted, restarts the program from the top. WARNING: Any files -opened at the time of the dump will NOT be open any more when the -program is reincarnated, with possible resulting confusion on the part -of Perl. See also B<-u> option in L<perlrun>. - -Example: - - #!/usr/bin/perl - require 'getopt.pl'; - require 'stat.pl'; - %days = ( - 'Sun' => 1, - 'Mon' => 2, - 'Tue' => 3, - 'Wed' => 4, - 'Thu' => 5, - 'Fri' => 6, - 'Sat' => 7, - ); - - dump QUICKSTART if $ARGV[0] eq '-d'; - - QUICKSTART: - Getopt('f'); - -This operator is largely obsolete, partly because it's very hard to -convert a core file into an executable, and because the real perl-to-C -compiler has superseded it. +This function causes an immediate core dump. See also the B<-u> +command-line switch in L<perlrun>, which does the same thing. +Primarily this is so that you can use the B<undump> program (not +supplied) to turn your core dump into an executable binary after +having initialized all your variables at the beginning of the +program. When the new binary is executed it will begin by executing +a C<goto LABEL> (with all the restrictions that C<goto> suffers). +Think of it as a goto with an intervening core dump and reincarnation. +If C<LABEL> is omitted, restarts the program from the top. + +B<WARNING>: Any files opened at the time of the dump will I<not> +be open any more when the program is reincarnated, with possible +resulting confusion on the part of Perl. + +This function is now largely obsolete, partly because it's very +hard to convert a core file into an executable, and because the +real compiler backends for generating portable bytecode and compilable +C code have superseded it. + +If you're looking to use L<dump> to speed up your program, consider +generating bytecode or native C code as described in L<perlcc>. If +you're just trying to accelerate a CGI script, consider using the +C<mod_perl> extension to B<Apache>, or the CPAN module, Fast::CGI. +You might also consider autoloading or selfloading, which at least +make your program I<appear> to run faster. =item each HASH @@ -1113,14 +1146,14 @@ for this reason.) Entries are returned in an apparently random order. The actual random order is subject to change in future versions of perl, but it is guaranteed -to be in the same order as either the C<keys()> or C<values()> function +to be in the same order as either the C<keys> or C<values> function would produce on the same (unmodified) hash. When the hash is entirely read, a null array is returned in list context -(which when assigned produces a FALSE (C<0>) value), and C<undef> in -scalar context. The next call to C<each()> after that will start iterating -again. There is a single iterator for each hash, shared by all C<each()>, -C<keys()>, and C<values()> function calls in the program; it can be reset by +(which when assigned produces a false (C<0>) value), and C<undef> in +scalar context. The next call to C<each> after that will start iterating +again. There is a single iterator for each hash, shared by all C<each>, +C<keys>, and C<values> function calls in the program; it can be reset by reading all the elements from the hash, or by evaluating C<keys HASH> or C<values HASH>. If you add or delete elements of a hash while you're iterating over it, you may get entries skipped or duplicated, so don't. @@ -1132,7 +1165,7 @@ only in a different order: print "$key=$value\n"; } -See also C<keys()>, C<values()> and C<sort()>. +See also C<keys>, C<values> and C<sort>. =item eof FILEHANDLE @@ -1143,17 +1176,18 @@ See also C<keys()>, C<values()> and C<sort()>. Returns 1 if the next read on FILEHANDLE will return end of file, or if FILEHANDLE is not open. FILEHANDLE may be an expression whose value gives the real filehandle. (Note that this function actually -reads a character and then C<ungetc()>s it, so isn't very useful in an +reads a character and then C<ungetc>s it, so isn't very useful in an interactive context.) Do not read from a terminal file (or call -C<eof(FILEHANDLE)> on it) after end-of-file is reached. Filetypes such +C<eof(FILEHANDLE)> on it) after end-of-file is reached. File types such as terminals may lose the end-of-file condition if you do. An C<eof> without an argument uses the last file read as argument. Using C<eof()> with empty parentheses is very different. It indicates -the pseudo file formed of the files listed on the command line, i.e., -C<eof()> is reasonable to use inside a C<while (E<lt>E<gt>)> loop to -detect the end of only the last file. Use C<eof(ARGV)> or eof without the -parentheses to test I<EACH> file in a while (E<lt>E<gt>) loop. Examples: +the pseudo file formed of the files listed on the command line, +i.e., C<eof()> is reasonable to use inside a C<while (E<lt>E<gt>)> +loop to detect the end of only the last file. Use C<eof(ARGV)> or +C<eof> without the parentheses to test I<each> file in a while +(E<lt>E<gt>) loop. Examples: # reset line numbering on each input file while (<>) { @@ -1206,16 +1240,16 @@ as with subroutines. The expression providing the return value is evaluated in void, scalar, or list context, depending on the context of the eval itself. See L</wantarray> for more on how the evaluation context can be determined. -If there is a syntax error or runtime error, or a C<die()> statement is -executed, an undefined value is returned by C<eval()>, and C<$@> is set to the +If there is a syntax error or runtime error, or a C<die> statement is +executed, an undefined value is returned by C<eval>, and C<$@> is set to the error message. If there was no error, C<$@> is guaranteed to be a null -string. Beware that using C<eval()> neither silences perl from printing +string. Beware that using C<eval> neither silences perl from printing warnings to STDERR, nor does it stuff the text of warning messages into C<$@>. To do either of those, you have to use the C<$SIG{__WARN__}> facility. See L</warn> and L<perlvar>. -Note that, because C<eval()> traps otherwise-fatal errors, it is useful for -determining whether a particular feature (such as C<socket()> or C<symlink()>) +Note that, because C<eval> traps otherwise-fatal errors, it is useful for +determining whether a particular feature (such as C<socket> or C<symlink>) is implemented. It is also Perl's exception trapping mechanism, where the die operator is used to raise exceptions. @@ -1247,7 +1281,7 @@ as shown in this example: warn $@ if $@; This is especially significant, given that C<__DIE__> hooks can call -C<die()> again, which has the effect of changing their error messages: +C<die> again, which has the effect of changing their error messages: # __DIE__ hooks may modify error messages { @@ -1257,10 +1291,10 @@ C<die()> again, which has the effect of changing their error messages: print $@ if $@; # prints "bar lives here" } -Because this promotes action at a distance, this counterintuive behavior +Because this promotes action at a distance, this counterintuitive behavior may be fixed in a future release. -With an C<eval()>, you should be especially careful to remember what's +With an C<eval>, you should be especially careful to remember what's being looked at when: eval $x; # CASE 1 @@ -1273,13 +1307,13 @@ being looked at when: $$x++; # CASE 6 Cases 1 and 2 above behave identically: they run the code contained in -the variable C<$x>. (Although case 2 has misleading double quotes making +the variable $x. (Although case 2 has misleading double quotes making the reader wonder what else might be happening (nothing is).) Cases 3 and 4 likewise behave in the same way: they run the code C<'$x'>, which -does nothing but return the value of C<$x>. (Case 4 is preferred for +does nothing but return the value of $x. (Case 4 is preferred for purely visual reasons, but it also has the advantage of compiling at compile-time instead of at run-time.) Case 5 is a place where -normally you I<WOULD> like to use double quotes, except that in this +normally you I<would> like to use double quotes, except that in this particular situation, you can just use symbolic references instead, as in case 6. @@ -1290,15 +1324,15 @@ C<next>, C<last>, or C<redo> cannot be used to leave or restart the block. =item exec PROGRAM LIST -The C<exec()> function executes a system command I<AND NEVER RETURNS> - -use C<system()> instead of C<exec()> if you want it to return. It fails and -returns FALSE only if the command does not exist I<and> it is executed +The C<exec> function executes a system command I<and never returns>-- +use C<system> instead of C<exec> if you want it to return. It fails and +returns false only if the command does not exist I<and> it is executed directly instead of via your system's command shell (see below). -Since it's a common mistake to use C<exec()> instead of C<system()>, Perl -warns you if there is a following statement which isn't C<die()>, C<warn()>, -or C<exit()> (if C<-w> is set - but you always do that). If you -I<really> want to follow an C<exec()> with some other statement, you +Since it's a common mistake to use C<exec> instead of C<system>, Perl +warns you if there is a following statement which isn't C<die>, C<warn>, +or C<exit> (if C<-w> is set - but you always do that). If you +I<really> want to follow an C<exec> with some other statement, you can use one of these styles to avoid the warning: exec ('foo') or print STDERR "couldn't exec foo: $!"; @@ -1311,9 +1345,11 @@ the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is C</bin/sh -c> on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into -words and passed directly to C<execvp()>, which is more efficient. +words and passed directly to C<execvp>, which is more efficient. +Examples: -All files opened for output are flushed before attempting the exec(). + exec '/bin/echo', 'Your arguments are: ', @ARGV; + exec "sort $outfile | uniq"; If you don't really want to execute the first argument, but want to lie to the program you are executing about its own name, you can specify @@ -1333,10 +1369,11 @@ When the arguments get executed via the system shell, results will be subject to its quirks and capabilities. See L<perlop/"`STRING`"> for details. -Using an indirect object with C<exec()> or C<system()> is also more secure. -This usage forces interpretation of the arguments as a multivalued list, -even if the list had just one argument. That way you're safe from the -shell expanding wildcards or splitting up words with whitespace in them. +Using an indirect object with C<exec> or C<system> is also more +secure. This usage (which also works fine with system()) forces +interpretation of the arguments as a multivalued list, even if the +list had just one argument. That way you're safe from the shell +expanding wildcards or splitting up words with whitespace in them. @args = ( "echo surprise" ); @@ -1349,19 +1386,19 @@ program, passing it C<"surprise"> an argument. The second version didn't--it tried to run a program literally called I<"echo surprise">, didn't find it, and set C<$?> to a non-zero value indicating failure. -Note that C<exec()> will not call your C<END> blocks, nor will it call +Note that C<exec> will not call your C<END> blocks, nor will it call any C<DESTROY> methods in your objects. =item exists EXPR -Returns TRUE if the specified hash key exists in its hash array, even +Returns true if the specified hash key exists in its hash, even if the corresponding value is undefined. print "Exists\n" if exists $array{$key}; print "Defined\n" if defined $array{$key}; print "True\n" if $array{$key}; -A hash element can be TRUE only if it's defined, and defined if +A hash element can be true only if it's defined, and defined if it exists, but the reverse doesn't necessarily hold true. Note that the EXPR can be arbitrarily complicated as long as the final @@ -1391,20 +1428,20 @@ Evaluates EXPR and exits immediately with that value. Example: $ans = <STDIN>; exit 0 if $ans =~ /^[Xx]/; -See also C<die()>. If EXPR is omitted, exits with C<0> status. The only +See also C<die>. If EXPR is omitted, exits with C<0> status. The only universally recognized values for EXPR are C<0> for success and C<1> for error; other values are subject to interpretation depending on the environment in which the Perl program is running. For example, exiting 69 (EX_UNAVAILABLE) from a I<sendmail> incoming-mail filter will cause the mailer to return the item undelivered, but that's not true everywhere. -Don't use C<exit()> to abort a subroutine if there's any chance that -someone might want to trap whatever error happened. Use C<die()> instead, -which can be trapped by an C<eval()>. +Don't use C<exit> to abort a subroutine if there's any chance that +someone might want to trap whatever error happened. Use C<die> instead, +which can be trapped by an C<eval>. -The exit() function does not always exit immediately. It calls any +The exit() function does not always exit immediately. It calls any defined C<END> routines first, but these C<END> routines may not -themselves abort the exit. Likewise any object destructors that need to +themselves abort the exit. Likewise any object destructors that need to be called are called before the real exit. If this is a problem, you can call C<POSIX:_exit($status)> to avoid END and destructor processing. See L<perlsub> for details. @@ -1423,20 +1460,20 @@ Implements the fcntl(2) function. You'll probably have to say use Fcntl; first to get the correct constant definitions. Argument processing and -value return works just like C<ioctl()> below. +value return works just like C<ioctl> below. For example: use Fcntl; fcntl($filehandle, F_GETFL, $packed_return_buffer) or die "can't fcntl F_GETFL: $!"; -You don't have to check for C<defined()> on the return from C<fnctl()>. -Like C<ioctl()>, it maps a C<0> return from the system call into "C<0> -but true" in Perl. This string is true in boolean context and C<0> +You don't have to check for C<defined> on the return from C<fnctl>. +Like C<ioctl>, it maps a C<0> return from the system call into C<"0 +but true"> in Perl. This string is true in boolean context and C<0> in numeric context. It is also exempt from the normal B<-w> warnings on improper numeric conversions. -Note that C<fcntl()> will produce a fatal error if used on a machine that +Note that C<fcntl> will produce a fatal error if used on a machine that doesn't implement fcntl(2). See the Fcntl module or your fcntl(2) manpage to learn what functions are available on your system. @@ -1444,7 +1481,7 @@ manpage to learn what functions are available on your system. Returns the file descriptor for a filehandle, or undefined if the filehandle is not open. This is mainly useful for constructing -bitmaps for C<select()> and low-level POSIX tty-handling operations. +bitmaps for C<select> and low-level POSIX tty-handling operations. If FILEHANDLE is an expression, the value is taken as an indirect filehandle, generally its name. @@ -1457,17 +1494,17 @@ same underlying descriptor: =item flock FILEHANDLE,OPERATION -Calls flock(2), or an emulation of it, on FILEHANDLE. Returns TRUE -for success, FALSE on failure. Produces a fatal error if used on a +Calls flock(2), or an emulation of it, on FILEHANDLE. Returns true +for success, false on failure. Produces a fatal error if used on a machine that doesn't implement flock(2), fcntl(2) locking, or lockf(3). -C<flock()> is Perl's portable file locking interface, although it locks +C<flock> is Perl's portable file locking interface, although it locks only entire files, not records. Two potentially non-obvious but traditional C<flock> semantics are that it waits indefinitely until the lock is granted, and that its locks B<merely advisory>. Such discretionary locks are more flexible, but offer -fewer guarantees. This means that files locked with C<flock()> may be -modified by programs that do not also use C<flock()>. See L<perlport>, +fewer guarantees. This means that files locked with C<flock> may be +modified by programs that do not also use C<flock>. See L<perlport>, your port's specific documentation, or your system-specific local manpages for details. It's best to assume traditional behavior if you're writing portable programs. (But if you're not, you should as always feel perfectly @@ -1481,7 +1518,7 @@ you can use the symbolic names if import them from the Fcntl module, either individually, or as a group using the ':flock' tag. LOCK_SH requests a shared lock, LOCK_EX requests an exclusive lock, and LOCK_UN releases a previously requested lock. If LOCK_NB is added to LOCK_SH or -LOCK_EX then C<flock()> will return immediately rather than blocking +LOCK_EX then C<flock> will return immediately rather than blocking waiting for the lock (check the return status to see if you got it). To avoid the possibility of miscoordination, Perl now flushes FILEHANDLE @@ -1493,8 +1530,8 @@ are the semantics that lockf(3) implements. Most if not all systems implement lockf(3) in terms of fcntl(2) locking, though, so the differing semantics shouldn't bite too many people. -Note also that some versions of C<flock()> cannot lock things over the -network; you would need to use the more system-specific C<fcntl()> for +Note also that some versions of C<flock> cannot lock things over the +network; you would need to use the more system-specific C<fcntl> for that. If you like you can force Perl to ignore your system's flock(2) function, and so provide its own fcntl(2)-based emulation, by passing the switch C<-Ud_flock> to the F<Configure> program when you configure @@ -1541,7 +1578,7 @@ dominant paradigm for multitasking over the last few decades. All files opened for output are flushed before forking the child process. -If you C<fork()> without ever waiting on your children, you will +If you C<fork> without ever waiting on your children, you will accumulate zombies. On some systems, you can avoid this by setting C<$SIG{CHLD}> to C<"IGNORE">. See also L<perlipc> for more examples of forking and reaping moribund children. @@ -1549,12 +1586,12 @@ forking and reaping moribund children. Note that if your forked child inherits system file descriptors like STDIN and STDOUT that are actually connected by a pipe or socket, even if you exit, then the remote server (such as, say, a CGI script or a -backgrounded job launced from a remote shell) won't think you're done. +backgrounded job launched from a remote shell) won't think you're done. You should reopen those to F</dev/null> if it's any issue. =item format -Declare a picture format for use by the C<write()> function. For +Declare a picture format for use by the C<write> function. For example: format Something = @@ -1575,18 +1612,18 @@ This is an internal function used by C<format>s, though you may call it, too. It formats (see L<perlform>) a list of values according to the contents of PICTURE, placing the output into the format output accumulator, C<$^A> (or C<$ACCUMULATOR> in English). -Eventually, when a C<write()> is done, the contents of +Eventually, when a C<write> is done, the contents of C<$^A> are written to some filehandle, but you could also read C<$^A> yourself and then set C<$^A> back to C<"">. Note that a format typically -does one C<formline()> per line of form, but the C<formline()> function itself +does one C<formline> per line of form, but the C<formline> function itself doesn't care how many newlines are embedded in the PICTURE. This means that the C<~> and C<~~> tokens will treat the entire PICTURE as a single line. You may therefore need to use multiple formlines to implement a single record format, just like the format compiler. -Be careful if you put double quotes around the picture, because an "C<@>" +Be careful if you put double quotes around the picture, because an C<@> character may be taken to mean the beginning of an array name. -C<formline()> always returns TRUE. See L<perlform> for other examples. +C<formline> always returns true. See L<perlform> for other examples. =item getc FILEHANDLE @@ -1619,7 +1656,7 @@ something more like: Determination of whether $BSD_STYLE should be set is left as an exercise to the reader. -The C<POSIX::getattr()> function can do this more portably on +The C<POSIX::getattr> function can do this more portably on systems purporting POSIX compliance. See also the C<Term::ReadKey> module from your nearest CPAN site; details on CPAN can be found on L<perlmodlib/CPAN>. @@ -1628,12 +1665,12 @@ L<perlmodlib/CPAN>. Implements the C library function of the same name, which on most systems returns the current login from F</etc/utmp>, if any. If null, -use C<getpwuid()>. +use C<getpwuid>. $login = getlogin || getpwuid($<) || "Kilroy"; -Do not consider C<getlogin()> for authentication: it is not as -secure as C<getpwuid()>. +Do not consider C<getlogin> for authentication: it is not as +secure as C<getpwuid>. =item getpeername SOCKET @@ -1641,7 +1678,7 @@ Returns the packed sockaddr address of other end of the SOCKET connection. use Socket; $hersockaddr = getpeername(SOCK); - ($port, $iaddr) = unpack_sockaddr_in($hersockaddr); + ($port, $iaddr) = sockaddr_in($hersockaddr); $herhostname = gethostbyaddr($iaddr, AF_INET); $herstraddr = inet_ntoa($iaddr); @@ -1651,7 +1688,7 @@ Returns the current process group for the specified PID. Use a PID of C<0> to get the current process group for the current process. Will raise an exception if used on a machine that doesn't implement getpgrp(2). If PID is omitted, returns process -group of current process. Note that the POSIX version of C<getpgrp()> +group of current process. Note that the POSIX version of C<getpgrp> does not accept a PID argument, so only C<PID==0> is truly portable. =item getppid @@ -1750,20 +1787,20 @@ lookup by name, in which case you get the other thing, whatever it is. $name = getgrent(); #etc. -In I<getpw*()> the fields C<$quota>, C<$comment>, and C<$expire> are +In I<getpw*()> the fields $quota, $comment, and $expire are special cases in the sense that in many systems they are unsupported. -If the C<$quota> is unsupported, it is an empty scalar. If it is -supported, it usually encodes the disk quota. If the C<$comment> +If the $quota is unsupported, it is an empty scalar. If it is +supported, it usually encodes the disk quota. If the $comment field is unsupported, it is an empty scalar. If it is supported it usually encodes some administrative comment about the user. In some -systems the $quota field may be C<$change> or C<$age>, fields that have -to do with password aging. In some systems the C<$comment> field may -be C<$class>. The C<$expire> field, if present, encodes the expiration +systems the $quota field may be $change or $age, fields that have +to do with password aging. In some systems the $comment field may +be $class. The $expire field, if present, encodes the expiration period of the account or the password. For the availability and the exact meaning of these fields in your system, please consult your getpwnam(3) documentation and your F<pwd.h> file. You can also find -out from within Perl what your C<$quota> and C<$comment> fields mean -and whether you have the C<$expire> field by using the C<Config> module +out from within Perl what your $quota and $comment fields mean +and whether you have the $expire field by using the C<Config> module and the values C<d_pwquota>, C<d_pwage>, C<d_pwchange>, C<d_pwcomment>, and C<d_pwexpire>. Shadow password files are only supported if your vendor has implemented them in the intuitive fashion that calling the @@ -1771,7 +1808,7 @@ regular C library routines gets the shadow versions if you're running under privilege. Those that incorrectly implement a separate library call are not supported. -The C<$members> value returned by I<getgr*()> is a space separated list of +The $members value returned by I<getgr*()> is a space separated list of the login names of the members of the group. For the I<gethost*()> functions, if the C<h_errno> variable is supported in @@ -1790,29 +1827,36 @@ The Socket library makes this slightly easier: $name = gethostbyaddr($iaddr, AF_INET); # or going the other way - $straddr = inet_ntoa($iaddr"); + $straddr = inet_ntoa($iaddr); -If you get tired of remembering which element of the return list contains -which return value, by-name interfaces are also provided in modules: -C<File::stat>, C<Net::hostent>, C<Net::netent>, C<Net::protoent>, C<Net::servent>, -C<Time::gmtime>, C<Time::localtime>, and C<User::grent>. These override the -normal built-in, replacing them with versions that return objects with -the appropriate names for each field. For example: +If you get tired of remembering which element of the return list +contains which return value, by-name interfaces are provided +in standard modules: C<File::stat>, C<Net::hostent>, C<Net::netent>, +C<Net::protoent>, C<Net::servent>, C<Time::gmtime>, C<Time::localtime>, +and C<User::grent>. These override the normal built-ins, supplying +versions that return objects with the appropriate names +for each field. For example: use File::stat; use User::pwent; $is_his = (stat($filename)->uid == pwent($whoever)->uid); Even though it looks like they're the same method calls (uid), -they aren't, because a C<File::stat> object is different from a C<User::pwent> object. +they aren't, because a C<File::stat> object is different from +a C<User::pwent> object. =item getsockname SOCKET -Returns the packed sockaddr address of this end of the SOCKET connection. +Returns the packed sockaddr address of this end of the SOCKET connection, +in case you don't know the address because you have several different +IPs that the connection might have come in on. use Socket; $mysockaddr = getsockname(SOCK); - ($port, $myaddr) = unpack_sockaddr_in($mysockaddr); + ($port, $myaddr) = sockaddr_in($mysockaddr); + printf "Connect to %s [%s]\n", + scalar gethostbyaddr($myaddr, AF_INET), + inet_ntoa($myaddr); =item getsockopt SOCKET,LEVEL,OPTNAME @@ -1830,7 +1874,7 @@ discussed in more detail in L<perlop/"I/O Operators">. =item gmtime EXPR -Converts a time as returned by the time function to a 9-element array +Converts a time as returned by the time function to a 9-element list with the time localized for the standard Greenwich time zone. Typically used as follows: @@ -1838,10 +1882,10 @@ Typically used as follows: ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime(time); -All array elements are numeric, and come straight out of a struct tm. -In particular this means that C<$mon> has the range C<0..11> and C<$wday> -has the range C<0..6> with sunday as day C<0>. Also, C<$year> is the -number of years since 1900, that is, C<$year> is C<123> in year 2023, +All list elements are numeric, and come straight out of a struct tm. +In particular this means that $mon has the range C<0..11> and $wday +has the range C<0..6> with sunday as day C<0>. Also, $year is the +number of years since 1900, that is, $year is C<123> in year 2023, I<not> simply the last two digits of the year. If you assume it is, then you create non-Y2K-compliant programs--and you wouldn't want to do that, would you? @@ -1852,7 +1896,7 @@ In scalar context, returns the ctime(3) value: $now_string = gmtime; # e.g., "Thu Oct 13 04:54:34 1994" -Also see the C<timegm()> function provided by the C<Time::Local> module, +Also see the C<timegm> function provided by the C<Time::Local> module, and the strftime(3) function available via the POSIX module. This scalar value is B<not> locale dependent (see L<perllocale>), but @@ -1879,10 +1923,10 @@ The C<goto-LABEL> form finds the statement labeled with LABEL and resumes execution there. It may not be used to go into any construct that requires initialization, such as a subroutine or a C<foreach> loop. It also can't be used to go into a construct that is optimized away, -or to get out of a block or subroutine given to C<sort()>. +or to get out of a block or subroutine given to C<sort>. It can be used to go almost anywhere else within the dynamic scope, including out of subroutines, but it's usually better to use some other -construct such as C<last> or C<die()>. The author of Perl has never felt the +construct such as C<last> or C<die>. The author of Perl has never felt the need to use this form of C<goto> (in Perl, that is--C is another matter). The C<goto-EXPR> form expects a label name, whose scope will be resolved @@ -1896,7 +1940,7 @@ named subroutine for the currently running subroutine. This is used by C<AUTOLOAD> subroutines that wish to load another subroutine and then pretend that the other subroutine had been called in the first place (except that any modifications to C<@_> in the current subroutine are -propagated to the other subroutine.) After the C<goto>, not even C<caller()> +propagated to the other subroutine.) After the C<goto>, not even C<caller> will be able to tell that this routine was called first. =item grep BLOCK LIST @@ -1908,8 +1952,8 @@ relatives. In particular, it is not limited to using regular expressions. Evaluates the BLOCK or EXPR for each element of LIST (locally setting C<$_> to each element) and returns the list value consisting of those -elements for which the expression evaluated to TRUE. In scalar -context, returns the number of times the expression was TRUE. +elements for which the expression evaluated to true. In scalar +context, returns the number of times the expression was true. @foo = grep(!/^#/, @bar); # weed out comments @@ -1922,11 +1966,11 @@ be used to modify the elements of the array. While this is useful and supported, it can cause bizarre results if the LIST is not a named array. Similarly, grep returns aliases into the original list, much as a for loop's index variable aliases the list elements. That is, modifying an -element of a list returned by grep (for example, in a C<foreach>, C<map()> -or another C<grep()>) actually modifies the element in the original list. +element of a list returned by grep (for example, in a C<foreach>, C<map> +or another C<grep>) actually modifies the element in the original list. This is usually something to be avoided when writing clear code. -See also L</map> for an array composed of the results of the BLOCK or EXPR. +See also L</map> for a list composed of the results of the BLOCK or EXPR. =item hex EXPR @@ -1939,11 +1983,14 @@ L</oct>.) If EXPR is omitted, uses C<$_>. print hex '0xAf'; # prints '175' print hex 'aF'; # same +Hex strings may only represent integers. Strings that would cause +integer overflow trigger a mandatory error message. + =item import -There is no builtin C<import()> function. It is just an ordinary +There is no builtin C<import> function. It is just an ordinary method (subroutine) defined (or inherited) by modules that wish to export -names to another module. The C<use()> function calls the C<import()> method +names to another module. The C<use> function calls the C<import> method for the package used. See also L</use()>, L<perlmod>, and L<Exporter>. =item index STR,SUBSTR,POSITION @@ -1968,7 +2015,7 @@ towards C<0>, and two because machine representations of floating point numbers can sometimes produce counterintuitive results. For example, C<int(-6.725/0.025)> produces -268 rather than the correct -269; that's because it's really more like -268.99999999999994315658 instead. Usually, -the C<sprintf()>, C<printf()>, or the C<POSIX::floor> and C<POSIX::ceil> +the C<sprintf>, C<printf>, or the C<POSIX::floor> and C<POSIX::ceil> functions will serve you better than will int(). =item ioctl FILEHANDLE,FUNCTION,SCALAR @@ -1983,33 +2030,21 @@ own, based on your C header files such as F<E<lt>sys/ioctl.hE<gt>>. (There is a Perl script called B<h2ph> that comes with the Perl kit that may help you in this, but it's nontrivial.) SCALAR will be read and/or written depending on the FUNCTION--a pointer to the string value of SCALAR -will be passed as the third argument of the actual C<ioctl()> call. (If SCALAR +will be passed as the third argument of the actual C<ioctl> call. (If SCALAR has no string value but does have a numeric value, that value will be passed rather than a pointer to the string value. To guarantee this to be -TRUE, add a C<0> to the scalar before using it.) The C<pack()> and C<unpack()> -functions are useful for manipulating the values of structures used by -C<ioctl()>. The following example sets the erase character to DEL. - - require 'ioctl.ph'; - $getp = &TIOCGETP; - die "NO TIOCGETP" if $@ || !$getp; - $sgttyb_t = "ccccs"; # 4 chars and a short - if (ioctl(STDIN,$getp,$sgttyb)) { - @ary = unpack($sgttyb_t,$sgttyb); - $ary[2] = 127; - $sgttyb = pack($sgttyb_t,@ary); - ioctl(STDIN,&TIOCSETP,$sgttyb) - || die "Can't ioctl: $!"; - } +true, add a C<0> to the scalar before using it.) The C<pack> and C<unpack> +functions may be needed to manipulate the values of structures used by +C<ioctl>. -The return value of C<ioctl()> (and C<fcntl()>) is as follows: +The return value of C<ioctl> (and C<fcntl>) is as follows: if OS returns: then Perl returns: -1 undefined value 0 string "0 but true" anything else that number -Thus Perl returns TRUE on success and FALSE on failure, yet you can +Thus Perl returns true on success and false on failure, yet you can still easily determine the actual value returned by the operating system: @@ -2019,6 +2054,18 @@ system: The special string "C<0> but true" is exempt from B<-w> complaints about improper numeric conversions. +Here's an example of setting a filehandle named C<REMOTE> to be +non-blocking at the system level. You'll have to negotiate C<$|> +on your own, though. + + use Fcntl qw(F_GETFL F_SETFL O_NONBLOCK); + + $flags = fcntl(REMOTE, F_GETFL, 0) + or die "Can't get flags for the socket: $!\n"; + + $flags = fcntl(REMOTE, F_SETFL, $flags | O_NONBLOCK) + or die "Can't set flags for the socket: $!\n"; + =item join EXPR,LIST Joins the separate strings of LIST into a single string with fields @@ -2030,11 +2077,11 @@ See L</split>. =item keys HASH -Returns a list consisting of all the keys of the named hash. (In a +Returns a list consisting of all the keys of the named hash. (In scalar context, returns the number of keys.) The keys are returned in an apparently random order. The actual random order is subject to change in future versions of perl, but it is guaranteed to be the same -order as either the C<values()> or C<each()> function produces (given +order as either the C<values> or C<each> function produces (given that the hash has not been modified). As a side effect, it resets HASH's iterator. @@ -2042,7 +2089,7 @@ Here is yet another way to print your environment: @keys = keys %ENV; @values = values %ENV; - while ($#keys >= 0) { + while (@keys) { print pop(@keys), '=', pop(@values), "\n"; } @@ -2052,14 +2099,14 @@ or how about sorted by key: print $key, '=', $ENV{$key}, "\n"; } -To sort a hash by value, you'll need to use a C<sort()> function. +To sort a hash by value, you'll need to use a C<sort> function. Here's a descending numeric sort of a hash by its values: foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) { printf "%4d %s\n", $hash{$key}, $key; } -As an lvalue C<keys()> allows you to increase the number of hash buckets +As an lvalue C<keys> allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big. (This is similar to pre-extending an array by assigning a larger number to $#array.) If you say @@ -2071,10 +2118,10 @@ in fact, since it rounds up to the next power of two. These buckets will be retained even if you do C<%hash = ()>, use C<undef %hash> if you want to free the storage while C<%hash> is still in scope. You can't shrink the number of buckets allocated for the hash using -C<keys()> in this way (but you needn't worry about doing this by accident, +C<keys> in this way (but you needn't worry about doing this by accident, as trying has no effect). -See also C<each()>, C<values()> and C<sort()>. +See also C<each>, C<values> and C<sort>. =item kill LIST @@ -2118,7 +2165,8 @@ C<redo> work. Returns an lowercased version of EXPR. This is the internal function implementing the C<\L> escape in double-quoted strings. -Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>. +Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale> +and L<utf8>. If EXPR is omitted, uses C<$_>. @@ -2143,17 +2191,17 @@ For that, use C<scalar @array> and C<scalar keys %hash> respectively. =item link OLDFILE,NEWFILE -Creates a new filename linked to the old filename. Returns TRUE for -success, FALSE otherwise. +Creates a new filename linked to the old filename. Returns true for +success, false otherwise. =item listen SOCKET,QUEUESIZE -Does the same thing that the listen system call does. Returns TRUE if -it succeeded, FALSE otherwise. See the example in L<perlipc/"Sockets: Client/Server Communication">. +Does the same thing that the listen system call does. Returns true if +it succeeded, false otherwise. See the example in L<perlipc/"Sockets: Client/Server Communication">. =item local EXPR -You really probably want to be using C<my()> instead, because C<local()> isn't +You really probably want to be using C<my> instead, because C<local> isn't what most people think of as "local". See L<perlsub/"Private Variables via my()"> for details. @@ -2164,7 +2212,7 @@ for details, including issues with tied arrays and hashes. =item localtime EXPR -Converts a time as returned by the time function to a 9-element array +Converts a time as returned by the time function to a 9-element list with the time analyzed for the local time zone. Typically used as follows: @@ -2172,10 +2220,10 @@ follows: ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time); -All array elements are numeric, and come straight out of a struct tm. -In particular this means that C<$mon> has the range C<0..11> and C<$wday> -has the range C<0..6> with sunday as day C<0>. Also, C<$year> is the -number of years since 1900, that is, C<$year> is C<123> in year 2023, +All list elements are numeric, and come straight out of a struct tm. +In particular this means that $mon has the range C<0..11> and $wday +has the range C<0..6> with sunday as day C<0>. Also, $year is the +number of years since 1900, that is, $year is C<123> in year 2023, and I<not> simply the last two digits of the year. If you assume it is, then you create non-Y2K-compliant programs--and you wouldn't want to do that, would you? @@ -2199,13 +2247,24 @@ and try for example: Note that the C<%a> and C<%b>, the short forms of the day of the week and the month of the year, may not necessarily be three characters wide. +=item lock + + lock I<THING> + +This function places an advisory lock on a variable, subroutine, +or referenced object contained in I<THING> until the lock goes out +of scope. This is a built-in function only if your version of Perl +was built with threading enabled, and if you've said C<use Threads>. +Otherwise a user-defined function by this name will be called. See +L<Thread>. + =item log EXPR =item log Returns the natural logarithm (base I<e>) of EXPR. If EXPR is omitted, returns log of C<$_>. To get the log of another base, use basic algebra: -The base-N log of a number is is equal to the natural log of that number +The base-N log of a number is equal to the natural log of that number divided by the natural log of N. For example: sub log10 { @@ -2221,10 +2280,10 @@ See also L</exp> for the inverse operation. =item lstat -Does the same thing as the C<stat()> function (including setting the +Does the same thing as the C<stat> function (including setting the special C<_> filehandle) but stats a symbolic link instead of the file the symbolic link points to. If symbolic links are unimplemented on -your system, a normal C<stat()> is done. +your system, a normal C<stat> is done. If EXPR is omitted, stats C<$_>. @@ -2236,12 +2295,12 @@ The match operator. See L<perlop>. =item map EXPR,LIST -Evaluates the BLOCK or EXPR for each element of LIST (locally setting C<$_> to each -element) and returns the list value composed of the results of each such -evaluation. Evaluates BLOCK or EXPR in a list context, so each element of LIST -may produce zero, one, or more elements in the returned value. - -In scalar context, returns the total number of elements so generated. +Evaluates the BLOCK or EXPR for each element of LIST (locally setting +C<$_> to each element) and returns the list value composed of the +results of each such evaluation. In scalar context, returns the +total number of elements so generated. Evaluates BLOCK or EXPR in +list context, so each element of LIST may produce zero, one, or +more elements in the returned value. @chars = map(chr, @nums); @@ -2263,18 +2322,18 @@ Using a regular C<foreach> loop for this purpose would be clearer in most cases. See also L</grep> for an array composed of those items of the original list for which the BLOCK or EXPR evaluates to true. -=item mkdir FILENAME,MODE +=item mkdir FILENAME,MASK Creates the directory specified by FILENAME, with permissions -specified by MODE (as modified by C<umask>). If it succeeds it -returns TRUE, otherwise it returns FALSE and sets C<$!> (errno). +specified by MASK (as modified by C<umask>). If it succeeds it +returns true, otherwise it returns false and sets C<$!> (errno). -In general, it is better to create directories with permissive MODEs, +In general, it is better to create directories with permissive MASK, and let the user modify that with their C<umask>, than it is to supply -a restrictive MODE and give the user no way to be more permissive. +a restrictive MASK and give the user no way to be more permissive. The exceptions to this rule are when the file or directory should be kept private (mail files, for instance). The perlfunc(1) entry on -C<umask> discusses the choice of MODE in more detail. +C<umask> discusses the choice of MASK in more detail. =item msgctl ID,CMD,ARG @@ -2284,22 +2343,22 @@ Calls the System V IPC function msgctl(2). You'll probably have to say first to get the correct constant definitions. If CMD is C<IPC_STAT>, then ARG must be a variable which will hold the returned C<msqid_ds> -structure. Returns like C<ioctl()>: the undefined value for error, "C<0> but -true" for zero, or the actual return value otherwise. See also -C<IPC::SysV> and C<IPC::Semaphore::Msg> documentation. +structure. Returns like C<ioctl>: the undefined value for error, C<"0 but +true"> for zero, or the actual return value otherwise. See also +C<IPC::SysV> and C<IPC::Semaphore> documentation. =item msgget KEY,FLAGS Calls the System V IPC function msgget(2). Returns the message queue id, or the undefined value if there is an error. See also C<IPC::SysV> -and C<IPC::SysV::Msg> documentation. +and C<IPC::Msg> documentation. =item msgsnd ID,MSG,FLAGS Calls the System V IPC function msgsnd to send the message MSG to the message queue ID. MSG must begin with the long integer message type, -which may be created with C<pack("l", $type)>. Returns TRUE if -successful, or FALSE if there is an error. See also C<IPC::SysV> +which may be created with C<pack("l", $type)>. Returns true if +successful, or false if there is an error. See also C<IPC::SysV> and C<IPC::SysV::Msg> documentation. =item msgrcv ID,VAR,SIZE,TYPE,FLAGS @@ -2308,13 +2367,13 @@ Calls the System V IPC function msgrcv to receive a message from message queue ID into variable VAR with a maximum message size of SIZE. Note that if a message is received, the message type will be the first thing in VAR, and the maximum length of VAR is SIZE plus the -size of the message type. Returns TRUE if successful, or FALSE if +size of the message type. Returns true if successful, or false if there is an error. See also C<IPC::SysV> and C<IPC::SysV::Msg> documentation. =item my EXPR -A C<my()> declares the listed variables to be local (lexically) to the -enclosing block, file, or C<eval()>. If +A C<my> declares the listed variables to be local (lexically) to the +enclosing block, file, or C<eval>. If more than one value is listed, the list must be placed in parentheses. See L<perlsub/"Private Variables via my()"> for details. @@ -2357,10 +2416,16 @@ hex in the standard Perl or C notation: $val = oct($val) if $val =~ /^0/; -If EXPR is omitted, uses C<$_>. This function is commonly used when -a string such as C<644> needs to be converted into a file mode, for -example. (Although perl will automatically convert strings into -numbers as needed, this automatic conversion assumes base 10.) +If EXPR is omitted, uses C<$_>. To go the other way (produce a number +in octal), use sprintf() or printf(): + + $perms = (stat("filename"))[2] & 07777; + $oct_perms = sprintf "%lo", $perms; + +The oct() function is commonly used when a string such as C<644> needs +to be converted into a file mode, for example. (Although perl will +automatically convert strings into numbers as needed, this automatic +conversion assumes base 10.) =item open FILEHANDLE,EXPR @@ -2370,14 +2435,14 @@ Opens the file whose filename is given by EXPR, and associates it with FILEHANDLE. If FILEHANDLE is an expression, its value is used as the name of the real filehandle wanted. If EXPR is omitted, the scalar variable of the same name as the FILEHANDLE contains the filename. -(Note that lexical variables--those declared with C<my()>--will not work -for this purpose; so if you're using C<my()>, specify EXPR in your call +(Note that lexical variables--those declared with C<my>--will not work +for this purpose; so if you're using C<my>, specify EXPR in your call to open.) See L<perlopentut> for a kinder, gentler explanation of opening files. If the filename begins with C<'E<lt>'> or nothing, the file is opened for input. If the filename begins with C<'E<gt>'>, the file is truncated and opened for -output, being created if necessary. If the filename begins with C<'E<gt>E<gt>'>, +output, being created if necessary. If the filename begins with C<'E<gt>E<gt>'>, the file is opened for appending, again being created if necessary. You can put a C<'+'> in front of the C<'E<gt>'> or C<'E<lt>'> to indicate that you want both read and write access to the file; thus C<'+E<lt>'> is almost @@ -2395,29 +2460,29 @@ If the filename begins with C<'|'>, the filename is interpreted as a command to which output is to be piped, and if the filename ends with a C<'|'>, the filename is interpreted as a command which pipes output to us. See L<perlipc/"Using open() for IPC"> -for more examples of this. (You are not allowed to C<open()> to a command +for more examples of this. (You are not allowed to C<open> to a command that pipes both in I<and> out, but see L<IPC::Open2>, L<IPC::Open3>, and L<perlipc/"Bidirectional Communication"> for alternatives.) Opening C<'-'> opens STDIN and opening C<'E<gt>-'> opens STDOUT. Open returns -nonzero upon success, the undefined value otherwise. If the C<open()> +nonzero upon success, the undefined value otherwise. If the C<open> involved a pipe, the return value happens to be the pid of the subprocess. If you're unfortunate enough to be running Perl on a system that distinguishes between text files and binary files (modern operating systems don't care), then you should check out L</binmode> for tips for -dealing with this. The key distinction between systems that need C<binmode()> +dealing with this. The key distinction between systems that need C<binmode> and those that don't is their text file formats. Systems like Unix, MacOS, and Plan9, which delimit lines with a single character, and which encode that -character in C as C<"\n">, do not need C<binmode()>. The rest need it. +character in C as C<"\n">, do not need C<binmode>. The rest need it. When opening a file, it's usually a bad idea to continue normal execution -if the request failed, so C<open()> is frequently used in connection with -C<die()>. Even if C<die()> won't do what you want (say, in a CGI script, +if the request failed, so C<open> is frequently used in connection with +C<die>. Even if C<die> won't do what you want (say, in a CGI script, where you want to make a nicely formatted error message (but there are modules that can help with that problem)) you should always check -the return value from opening a file. The infrequent exception is when +the return value from opening a file. The infrequent exception is when working with an unopened filehandle is actually what you want to do. Examples: @@ -2496,7 +2561,7 @@ STDERR: print STDERR "stderr 2\n"; If you specify C<'E<lt>&=N'>, where C<N> is a number, then Perl will do an -equivalent of C's C<fdopen()> of that file descriptor; this is more +equivalent of C's C<fdopen> of that file descriptor; this is more parsimonious of file descriptors. For example: open(FILEHANDLE, "<&=$fd") @@ -2545,8 +2610,8 @@ necessary to protect any leading and trailing whitespace: $file =~ s#^(\s)#./$1#; open(FOO, "< $file\0"); -If you want a "real" C C<open()> (see L<open(2)> on your system), then you -should use the C<sysopen()> function, which involves no such magic. This is +If you want a "real" C C<open> (see L<open(2)> on your system), then you +should use the C<sysopen> function, which involves no such magic. This is another way to protect your filenames from interpretation. For example: use IO::Handle; @@ -2579,8 +2644,8 @@ See L</seek> for some details about mixing reading and writing. =item opendir DIRHANDLE,EXPR -Opens a directory named EXPR for processing by C<readdir()>, C<telldir()>, -C<seekdir()>, C<rewinddir()>, and C<closedir()>. Returns TRUE if successful. +Opens a directory named EXPR for processing by C<readdir>, C<telldir>, +C<seekdir>, C<rewinddir>, and C<closedir>. Returns true if successful. DIRHANDLEs have their own namespace separate from FILEHANDLEs. =item ord EXPR @@ -2593,7 +2658,7 @@ See L<utf8> for more about Unicode. =item pack TEMPLATE,LIST -Takes an array or list of values and packs it into a binary structure, +Takes a list of values and packs it into a binary structure, returning the string containing the structure. The TEMPLATE is a sequence of characters that give the order and type of values, as follows: @@ -2617,7 +2682,7 @@ follows: i A signed integer value. I An unsigned integer value. - (This 'integer' is _at least_ 32 bits wide. Its exact + (This 'integer' is _at_least_ 32 bits wide. Its exact size depends on what a local C compiler calls 'int', and may even be larger than the 'long' described in the next item.) @@ -2692,7 +2757,7 @@ The C<"p"> type packs a pointer to a null-terminated string. You are responsible for ensuring the string is not a temporary value (which can potentially get deallocated before you get around to using the packed result). The C<"P"> type packs a pointer to a structure of the size indicated by the -length. A NULL pointer is created if the corresponding value for C<"p"> or +length. A NULL pointer is created if the corresponding value for C<"p"> or C<"P"> is C<undef>. =item * @@ -2713,10 +2778,15 @@ they are identical to C<"i"> and C<"I">. The actual sizes (in bytes) of native shorts, ints, and longs on the platform where Perl was built are also available via L<Config>: - use Config; - print $Config{shortsize}, "\n"; - print $Config{intsize}, "\n"; - print $Config{longsize}, "\n"; +The actual sizes (in bytes) of native shorts, ints, longs, and long +longs on the platform where Perl was built are also available via +L<Config>: + + use Config; + print $Config{shortsize}, "\n"; + print $Config{intsize}, "\n"; + print $Config{longsize}, "\n"; + print $Config{longlongsize}, "\n"; =item * @@ -2732,12 +2802,12 @@ because they obey the native byteorder and endianness. For example a Basically, the Intel, Alpha, and VAX CPUs and little-endian, while everybody else, for example Motorola m68k/88k, PPC, Sparc, HP PA, Power, and Cray are big-endian. MIPS can be either: Digital used it -in little-endian mode, SGI uses it in big-endian mode. +in little-endian mode; SGI uses it in big-endian mode. -The names `big-endian' and `little-endian' are joking references to +The names `big-endian' and `little-endian' are comic references to the classic "Gulliver's Travels" (via the paper "On Holy Wars and a Plea for Peace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980) and -the egg-eating habits of the lilliputs. +the egg-eating habits of the Lilliputians. Some systems may even have weird byte orders such as @@ -2774,7 +2844,7 @@ of the IEEE spec). Note that Perl uses doubles internally for all numeric calculation, and converting from double into float and thence back to double again will lose precision (i.e., C<unpack("f", pack("f", $foo)>) will not in general -equal C<$foo>). +equal $foo). =back @@ -2825,11 +2895,11 @@ The same template may generally also be used in unpack(). Declares the compilation unit as being in the given namespace. The scope of the package declaration is from the declaration itself through the end -of the enclosing block, file, or eval (the same as the C<my()> operator). +of the enclosing block, file, or eval (the same as the C<my> operator). All further unqualified dynamic identifiers will be in this namespace. A package statement affects only dynamic variables--including those -you've used C<local()> on--but I<not> lexical variables, which are created -with C<my()>. Typically it would be the first declaration in a file to +you've used C<local> on--but I<not> lexical variables, which are created +with C<my>. Typically it would be the first declaration in a file to be included by the C<require> or C<use> operator. You can switch into a package in more than one place; it merely influences which symbol table is used by the compiler for the rest of that block. You can refer to @@ -2866,13 +2936,14 @@ See L<perlvar/$^F>. =item pop Pops and returns the last value of the array, shortening the array by -one element. Has a similar effect to +one element. Has an effect similar to - $tmp = $ARRAY[$#ARRAY--]; + $ARRAY[$#ARRAY--] -If there are no elements in the array, returns the undefined value. -If ARRAY is omitted, pops the C<@ARGV> array in the main program, and -the C<@_> array in subroutines, just like C<shift()>. +If there are no elements in the array, returns the undefined value +(although this may happen at other times as well). If ARRAY is +omitted, pops the C<@ARGV> array in the main program, and the C<@_> +array in subroutines, just like C<shift>. =item pos SCALAR @@ -2890,22 +2961,26 @@ L<perlop>. =item print -Prints a string or a comma-separated list of strings. Returns TRUE -if successful. FILEHANDLE may be a scalar variable name, in which case -the variable contains the name of or a reference to the filehandle, thus -introducing one level of indirection. (NOTE: If FILEHANDLE is a variable -and the next token is a term, it may be misinterpreted as an operator +Prints a string or a list of strings. Returns true if successful. +FILEHANDLE may be a scalar variable name, in which case the variable +contains the name of or a reference to the filehandle, thus introducing +one level of indirection. (NOTE: If FILEHANDLE is a variable and +the next token is a term, it may be misinterpreted as an operator unless you interpose a C<+> or put parentheses around the arguments.) -If FILEHANDLE is omitted, prints by default to standard output (or to the -last selected output channel--see L</select>). If LIST is also omitted, -prints C<$_> to the currently selected output channel. To set the default -output channel to something other than STDOUT use the select operation. -Note that, because print takes a LIST, anything in the LIST is evaluated -in list context, and any subroutine that you call will have one or -more of its expressions evaluated in list context. Also be careful -not to follow the print keyword with a left parenthesis unless you want -the corresponding right parenthesis to terminate the arguments to the -print--interpose a C<+> or put parentheses around all the arguments. +If FILEHANDLE is omitted, prints by default to standard output (or +to the last selected output channel--see L</select>). If LIST is +also omitted, prints C<$_> to the currently selected output channel. +To set the default output channel to something other than STDOUT +use the select operation. The current value of C<$,> (if any) is +printed between each LIST item. The current value of C<$\> (if +any) is printed after the entire LIST has been printed. Because +print takes a LIST, anything in the LIST is evaluated in list +context, and any subroutine that you call will have one or more of +its expressions evaluated in list context. Also be careful not to +follow the print keyword with a left parenthesis unless you want +the corresponding right parenthesis to terminate the arguments to +the print--interpose a C<+> or put parentheses around all the +arguments. Note that if you're storing FILEHANDLES in an array or other expression, you will have to use a block returning its value instead: @@ -2919,12 +2994,12 @@ you will have to use a block returning its value instead: Equivalent to C<print FILEHANDLE sprintf(FORMAT, LIST)>, except that C<$\> (the output record separator) is not appended. The first argument -of the list will be interpreted as the C<printf()> format. If C<use locale> is +of the list will be interpreted as the C<printf> format. If C<use locale> is in effect, the character used for the decimal point in formatted real numbers is affected by the LC_NUMERIC locale. See L<perllocale>. -Don't fall into the trap of using a C<printf()> when a simple -C<print()> would do. The C<print()> is more efficient and less +Don't fall into the trap of using a C<printf> when a simple +C<print> would do. The C<print> is more efficient and less error prone. =item prototype FUNCTION @@ -2936,7 +3011,7 @@ the function whose prototype you want to retrieve. If FUNCTION is a string starting with C<CORE::>, the rest is taken as a name for Perl builtin. If the builtin is not I<overridable> (such as C<qw//>) or its arguments cannot be expressed by a prototype (such as -C<system()>) returns C<undef> because the builtin does not really behave +C<system>) returns C<undef> because the builtin does not really behave like a Perl function. Otherwise, the string describing the equivalent prototype is returned. @@ -2983,8 +3058,8 @@ If EXPR is omitted, uses C<$_>. Returns a random fractional number greater than or equal to C<0> and less than the value of EXPR. (EXPR should be positive.) If EXPR is -omitted, the value C<1> is used. Automatically calls C<srand()> unless -C<srand()> has already been called. See also C<srand()>. +omitted, the value C<1> is used. Automatically calls C<srand> unless +C<srand> has already been called. See also C<srand>. (Note: If your rand function consistently returns numbers that are too large or too small, then your version of Perl was probably compiled @@ -3000,18 +3075,18 @@ C<0> at end of file, or undef if there was an error. SCALAR will be grown or shrunk to the length actually read. An OFFSET may be specified to place the read data at some other place than the beginning of the string. This call is actually implemented in terms of stdio's fread(3) -call. To get a true read(2) system call, see C<sysread()>. +call. To get a true read(2) system call, see C<sysread>. =item readdir DIRHANDLE -Returns the next directory entry for a directory opened by C<opendir()>. +Returns the next directory entry for a directory opened by C<opendir>. If used in list context, returns all the rest of the entries in the directory. If there are no more entries, returns an undefined value in scalar context or a null list in list context. -If you're planning to filetest the return values out of a C<readdir()>, you'd +If you're planning to filetest the return values out of a C<readdir>, you'd better prepend the directory in question. Otherwise, because we didn't -C<chdir()> there, it would have been testing the wrong file. +C<chdir> there, it would have been testing the wrong file. opendir(DIR, $some_dir) || die "can't opendir $some_dir: $!"; @dots = grep { /^\./ && -f "$some_dir/$_" } readdir(DIR); @@ -3061,7 +3136,7 @@ operator is discussed in more detail in L<perlop/"I/O Operators">. Receives a message on a socket. Attempts to receive LENGTH bytes of data into variable SCALAR from the specified SOCKET filehandle. -Actually does a C C<recvfrom()>, so that it can return the address of the +Actually does a C C<recvfrom>, so that it can return the address of the sender. Returns the undefined value if there's an error. SCALAR will be grown or shrunk to the length actually read. Takes the same flags as the system call of the same name. @@ -3105,20 +3180,21 @@ C<redo> work. =item ref -Returns a TRUE value if EXPR is a reference, FALSE otherwise. If EXPR +Returns a true value if EXPR is a reference, false otherwise. If EXPR is not specified, C<$_> will be used. The value returned depends on the type of thing the reference is a reference to. Builtin types include: - REF SCALAR ARRAY HASH CODE + REF GLOB + LVALUE If the referenced object has been blessed into a package, then that package -name is returned instead. You can think of C<ref()> as a C<typeof()> operator. +name is returned instead. You can think of C<ref> as a C<typeof> operator. if (ref($r) eq "HASH") { print "r is a reference to a hash.\n"; @@ -3134,7 +3210,9 @@ See also L<perlref>. =item rename OLDNAME,NEWNAME -Changes the name of a file. Returns C<1> for success, C<0> otherwise. +Changes the name of a file; an existing file NEWNAME will be +clobbered. Returns true for success, false otherwise. + Behavior of this function varies wildly depending on your system implementation. For example, it will usually not work across file system boundaries, even though the system I<mv> command sometimes compensates @@ -3152,7 +3230,7 @@ supplied. If EXPR is numeric, demands that the current version of Perl Otherwise, demands that a library file be included if it hasn't already been included. The file is included via the do-FILE mechanism, which is -essentially just a variety of C<eval()>. Has semantics similar to the following +essentially just a variety of C<eval>. Has semantics similar to the following subroutine: sub require { @@ -3176,10 +3254,10 @@ subroutine: } Note that the file will not be included twice under the same specified -name. The file must return TRUE as the last statement to indicate +name. The file must return true as the last statement to indicate successful execution of any initialization code, so it's customary to -end such a file with "C<1;>" unless you're sure it'll return TRUE -otherwise. But it's better just to put the "C<1;>", in case you add more +end such a file with C<1;> unless you're sure it'll return true +otherwise. But it's better just to put the C<1;>, in case you add more statements. If EXPR is a bareword, the require assumes a "F<.pm>" extension and @@ -3202,7 +3280,7 @@ But if you try this: require "Foo::Bar"; # not a bareword because of the "" The require function will look for the "F<Foo::Bar>" file in the @INC array and -will complain about not finding "F<Foo::Bar>" there. In this case you can do: +will complain about not finding "F<Foo::Bar>" there. In this case you can do: eval "require $class"; @@ -3235,10 +3313,10 @@ See L</my>. =item return -Returns from a subroutine, C<eval()>, or C<do FILE> with the value +Returns from a subroutine, C<eval>, or C<do FILE> with the value given in EXPR. Evaluation of EXPR may be in list, scalar, or void context, depending on how the return value will be used, and the context -may vary from one execution to the next (see C<wantarray()>). If no EXPR +may vary from one execution to the next (see C<wantarray>). If no EXPR is given, returns an empty list in list context, the undefined value in scalar context, and (of course) nothing at all in a void context. @@ -3269,7 +3347,7 @@ on a large hash, such as from a DBM file. =item rewinddir DIRHANDLE Sets the current position to the beginning of the directory for the -C<readdir()> routine on DIRHANDLE. +C<readdir> routine on DIRHANDLE. =item rindex STR,SUBSTR,POSITION @@ -3284,7 +3362,7 @@ last occurrence at or before that position. =item rmdir Deletes the directory specified by FILENAME if that directory is empty. If it -succeeds it returns TRUE, otherwise it returns FALSE and sets C<$!> (errno). If +succeeds it returns true, otherwise it returns false and sets C<$!> (errno). If FILENAME is omitted, uses C<$_>. =item s/// @@ -3304,7 +3382,7 @@ needed. If you really wanted to do so, however, you could use the construction C<@{[ (some expression) ]}>, but usually a simple C<(some expression)> suffices. -Wince C<scalar> is unary operator, if you accidentally use for EXPR a +Because C<scalar> is unary operator, if you accidentally use for EXPR a parenthesized list, this behaves as a scalar comma expression, evaluating all but the last element in void context and returning the final element evaluated in scalar context. This is seldom what you want. @@ -3322,7 +3400,7 @@ See L<perlop> for more details on unary operators and the comma operator. =item seek FILEHANDLE,POSITION,WHENCE -Sets FILEHANDLE's position, just like the C<fseek()> call of C<stdio()>. +Sets FILEHANDLE's position, just like the C<fseek> call of C<stdio>. FILEHANDLE may be an expression whose value gives the name of the filehandle. The values for WHENCE are C<0> to set the new position to POSITION, C<1> to set it to the current position plus POSITION, and C<2> to @@ -3330,9 +3408,9 @@ set it to EOF plus POSITION (typically negative). For WHENCE you may use the constants C<SEEK_SET>, C<SEEK_CUR>, and C<SEEK_END> from either the C<IO::Seekable> or the POSIX module. Returns C<1> upon success, C<0> otherwise. -If you want to position file for C<sysread()> or C<syswrite()>, don't use -C<seek()> -- buffering makes its effect on the file's system position -unpredictable and non-portable. Use C<sysseek()> instead. +If you want to position file for C<sysread> or C<syswrite>, don't use +C<seek>--buffering makes its effect on the file's system position +unpredictable and non-portable. Use C<sysseek> instead. Due to the rules and rigors of ANSI C, on some systems you have to do a seek whenever you switch between reading and writing. Amongst other @@ -3343,7 +3421,7 @@ A WHENCE of C<1> (C<SEEK_CUR>) is useful for not moving the file position: This is also useful for applications emulating C<tail -f>. Once you hit EOF on your read, and then sleep for a while, you might have to stick in a -seek() to reset things. The C<seek()> doesn't change the current position, +seek() to reset things. The C<seek> doesn't change the current position, but it I<does> clear the end-of-file condition on the handle, so that the next C<E<lt>FILEE<gt>> makes Perl try again to read something. We hope. @@ -3361,8 +3439,8 @@ you may need something more like this: =item seekdir DIRHANDLE,POS -Sets the current position for the C<readdir()> routine on DIRHANDLE. POS -must be a value returned by C<telldir()>. Has the same caveats about +Sets the current position for the C<readdir> routine on DIRHANDLE. POS +must be a value returned by C<telldir>. Has the same caveats about possible directory compaction as the corresponding system library routine. @@ -3372,7 +3450,7 @@ routine. Returns the currently selected filehandle. Sets the current default filehandle for output, if FILEHANDLE is supplied. This has two -effects: first, a C<write()> or a C<print()> without a filehandle will +effects: first, a C<write> or a C<print> without a filehandle will default to this FILEHANDLE. Second, references to variables related to output will refer to this output channel. For example, if you have to set the top of form format for more than one output channel, you might @@ -3397,7 +3475,7 @@ methods, preferring to write the last example as: =item select RBITS,WBITS,EBITS,TIMEOUT This calls the select(2) system call with the bit masks specified, which -can be constructed using C<fileno()> and C<vec()>, along these lines: +can be constructed using C<fileno> and C<vec>, along these lines: $rin = $win = $ein = ''; vec($rin,fileno(STDIN),1) = 1; @@ -3426,32 +3504,32 @@ or to block until something becomes ready just do this $nfound = select($rout=$rin, $wout=$win, $eout=$ein, undef); -Most systems do not bother to return anything useful in C<$timeleft>, so -calling select() in scalar context just returns C<$nfound>. +Most systems do not bother to return anything useful in $timeleft, so +calling select() in scalar context just returns $nfound. Any of the bit masks can also be undef. The timeout, if specified, is in seconds, which may be fractional. Note: not all implementations are -capable of returning theC<$timeleft>. If not, they always return -C<$timeleft> equal to the supplied C<$timeout>. +capable of returning the$timeleft. If not, they always return +$timeleft equal to the supplied $timeout. You can effect a sleep of 250 milliseconds this way: select(undef, undef, undef, 0.25); -B<WARNING>: One should not attempt to mix buffered I/O (like C<read()> -or E<lt>FHE<gt>) with C<select()>, except as permitted by POSIX, and even -then only on POSIX systems. You have to use C<sysread()> instead. +B<WARNING>: One should not attempt to mix buffered I/O (like C<read> +or E<lt>FHE<gt>) with C<select>, except as permitted by POSIX, and even +then only on POSIX systems. You have to use C<sysread> instead. =item semctl ID,SEMNUM,CMD,ARG -Calls the System V IPC function C<semctl()>. You'll probably have to say +Calls the System V IPC function C<semctl>. You'll probably have to say use IPC::SysV; first to get the correct constant definitions. If CMD is IPC_STAT or GETALL, then ARG must be a variable which will hold the returned -semid_ds structure or semaphore value array. Returns like C<ioctl()>: the -undefined value for error, "C<0> but true" for zero, or the actual return +semid_ds structure or semaphore value array. Returns like C<ioctl>: the +undefined value for error, "C<0 but true>" for zero, or the actual return value otherwise. See also C<IPC::SysV> and C<IPC::Semaphore> documentation. =item semget KEY,NSEMS,FLAGS @@ -3466,9 +3544,9 @@ Calls the System V IPC function semop to perform semaphore operations such as signaling and waiting. OPSTRING must be a packed array of semop structures. Each semop structure can be generated with C<pack("sss", $semnum, $semop, $semflag)>. The number of semaphore -operations is implied by the length of OPSTRING. Returns TRUE if -successful, or FALSE if there is an error. As an example, the -following code waits on semaphore C<$semnum> of semaphore id C<$semid>: +operations is implied by the length of OPSTRING. Returns true if +successful, or false if there is an error. As an example, the +following code waits on semaphore $semnum of semaphore id $semid: $semop = pack("sss", $semnum, -1, 0); die "Semaphore trouble: $!\n" unless semop($semid, $semop); @@ -3482,7 +3560,7 @@ and C<IPC::SysV::Semaphore> documentation. Sends a message on a socket. Takes the same flags as the system call of the same name. On unconnected sockets you must specify a -destination to send TO, in which case it does a C C<sendto()>. Returns +destination to send TO, in which case it does a C C<sendto>. Returns the number of characters sent, or the undefined value if there is an error. The C system call sendmsg(2) is currently unimplemented. See L<perlipc/"UDP: Message Passing"> for examples. @@ -3492,7 +3570,7 @@ See L<perlipc/"UDP: Message Passing"> for examples. Sets the current process group for the specified PID, C<0> for the current process. Will produce a fatal error if used on a machine that doesn't implement setpgrp(2). If the arguments are omitted, it defaults to -C<0,0>. Note that the POSIX version of C<setpgrp()> does not accept any +C<0,0>. Note that the POSIX version of C<setpgrp> does not accept any arguments, so only C<setpgrp(0,0)> is portable. See also C<POSIX::setsid()>. =item setpriority WHICH,WHO,PRIORITY @@ -3517,8 +3595,8 @@ array, returns the undefined value. If ARRAY is omitted, shifts the C<@_> array within the lexical scope of subroutines and formats, and the C<@ARGV> array at file scopes or within the lexical scopes established by the C<eval ''>, C<BEGIN {}>, C<END {}>, and C<INIT {}> constructs. -See also C<unshift()>, C<push()>, and C<pop()>. C<Shift()> and C<unshift()> do the -same thing to the left end of an array that C<pop()> and C<push()> do to the +See also C<unshift>, C<push>, and C<pop>. C<Shift()> and C<unshift> do the +same thing to the left end of an array that C<pop> and C<push> do to the right end. =item shmctl ID,CMD,ARG @@ -3548,7 +3626,7 @@ position POS for size SIZE by attaching to it, copying in/out, and detaching from it. When reading, VAR must be a variable that will hold the data read. When writing, if STRING is too long, only SIZE bytes are used; if STRING is too short, nulls are written to fill out -SIZE bytes. Return TRUE if successful, or FALSE if there is an error. +SIZE bytes. Return true if successful, or false if there is an error. See also C<IPC::SysV> documentation and the C<IPC::Shareable> module from CPAN. @@ -3564,7 +3642,7 @@ has the same interpretation as in the system call of the same name. This is useful with sockets when you want to tell the other side you're done writing but not done reading, or vice versa. It's also a more insistent form of close because it also -disables the filedescriptor in any forked copies in other +disables the file descriptor in any forked copies in other processes. =item sin EXPR @@ -3574,7 +3652,7 @@ processes. Returns the sine of EXPR (expressed in radians). If EXPR is omitted, returns sine of C<$_>. -For the inverse sine operation, you may use the C<POSIX::asin()> +For the inverse sine operation, you may use the C<POSIX::asin> function, or use this relation: sub asin { atan2($_[0], sqrt(1 - $_[0] * $_[0])) } @@ -3586,8 +3664,8 @@ function, or use this relation: Causes the script to sleep for EXPR seconds, or forever if no EXPR. May be interrupted if the process receives a signal such as C<SIGALRM>. Returns the number of seconds actually slept. You probably cannot -mix C<alarm()> and C<sleep()> calls, because C<sleep()> is often implemented -using C<alarm()>. +mix C<alarm> and C<sleep> calls, because C<sleep> is often implemented +using C<alarm>. On some older systems, it may sleep up to a full second less than what you requested, depending on how it counts seconds. Most modern systems @@ -3596,26 +3674,27 @@ however, because your process might not be scheduled right away in a busy multitasking system. For delays of finer granularity than one second, you may use Perl's -C<syscall()> interface to access setitimer(2) if your system supports it, +C<syscall> interface to access setitimer(2) if your system supports it, or else see L</select> above. -See also the POSIX module's C<sigpause()> function. +See also the POSIX module's C<sigpause> function. =item socket SOCKET,DOMAIN,TYPE,PROTOCOL Opens a socket of the specified kind and attaches it to filehandle -SOCKET. DOMAIN, TYPE, and PROTOCOL are specified the same as for the -system call of the same name. You should "C<use Socket;>" first to get -the proper definitions imported. See the examples in L<perlipc/"Sockets: Client/Server Communication">. +SOCKET. DOMAIN, TYPE, and PROTOCOL are specified the same as for +the system call of the same name. You should C<use Socket> first +to get the proper definitions imported. See the examples in +L<perlipc/"Sockets: Client/Server Communication">. =item socketpair SOCKET1,SOCKET2,DOMAIN,TYPE,PROTOCOL Creates an unnamed pair of sockets in the specified domain, of the specified type. DOMAIN, TYPE, and PROTOCOL are specified the same as for the system call of the same name. If unimplemented, yields a fatal -error. Returns TRUE if successful. +error. Returns true if successful. -Some systems defined C<pipe()> in terms of C<socketpair()>, in which a call +Some systems defined C<pipe> in terms of C<socketpair>, in which a call to C<pipe(Rdr, Wtr)> is essentially: use Socket; @@ -3632,10 +3711,10 @@ See L<perlipc> for an example of socketpair use. =item sort LIST Sorts the LIST and returns the sorted list value. If SUBNAME or BLOCK -is omitted, C<sort()>s in standard string comparison order. If SUBNAME is +is omitted, C<sort>s in standard string comparison order. If SUBNAME is specified, it gives the name of a subroutine that returns an integer less than, equal to, or greater than C<0>, depending on how the elements -of the array are to be ordered. (The C<E<lt>=E<gt>> and C<cmp> +of the list are to be ordered. (The C<E<lt>=E<gt>> and C<cmp> operators are extremely useful in such routines.) SUBNAME may be a scalar variable name (unsubscripted), in which case the value provides the name of (or a reference to) the actual subroutine to use. In place @@ -3645,12 +3724,12 @@ subroutine. In the interests of efficiency the normal calling code for subroutines is bypassed, with the following effects: the subroutine may not be a recursive subroutine, and the two elements to be compared are passed into -the subroutine not via C<@_> but as the package global variables C<$a> and -C<$b> (see example below). They are passed by reference, so don't -modify C<$a> and C<$b>. And don't try to declare them as lexicals either. +the subroutine not via C<@_> but as the package global variables $a and +$b (see example below). They are passed by reference, so don't +modify $a and $b. And don't try to declare them as lexicals either. You also cannot exit out of the sort block or subroutine using any of the -loop control operators described in L<perlsyn> or with C<goto()>. +loop control operators described in L<perlsyn> or with C<goto>. When C<use locale> is in effect, C<sort LIST> sorts LIST according to the current collation locale. See L<perllocale>. @@ -3675,19 +3754,19 @@ Examples: # sort numerically descending @articles = sort {$b <=> $a} @files; + # this sorts the %age hash by value instead of key + # using an in-line function + @eldest = sort { $age{$b} <=> $age{$a} } keys %age; + # sort using explicit subroutine name sub byage { $age{$a} <=> $age{$b}; # presuming numeric } @sortedclass = sort byage @class; - # this sorts the %age hash by value instead of key - # using an in-line function - @eldest = sort { $age{$b} <=> $age{$a} } keys %age; - - sub backwards { $b cmp $a; } - @harry = ('dog','cat','x','Cain','Abel'); - @george = ('gone','chased','yz','Punished','Axed'); + sub backwards { $b cmp $a } + @harry = qw(dog cat x Cain Abel); + @george = qw(gone chased yz Punished Axed); print sort @harry; # prints AbelCaincatdogx print sort backwards @harry; @@ -3721,15 +3800,15 @@ Examples: } 0..$#old ]; - # same thing using a Schwartzian Transform (no temps) + # same thing, but without any temps @new = map { $_->[0] } - sort { $b->[1] <=> $a->[1] - || - $a->[2] cmp $b->[2] - } map { [$_, /=(\d+)/, uc($_)] } @old; + sort { $b->[1] <=> $a->[1] + || + $a->[2] cmp $b->[2] + } map { [$_, /=(\d+)/, uc($_)] } @old; -If you're using strict, you I<MUST NOT> declare C<$a> -and C<$b> as lexicals. They are package globals. That means +If you're using strict, you I<must not> declare $a +and $b as lexicals. They are package globals. That means if you're in the C<main> package, it's @articles = sort {$main::b <=> $main::a} @files; @@ -3758,7 +3837,7 @@ replaces them with the elements of LIST, if any. In list context, returns the elements removed from the array. In scalar context, returns the last element removed, or C<undef> if no elements are removed. The array grows or shrinks as necessary. -If OFFSET is negative then it start that far from the end of the array. +If OFFSET is negative then it starts that far from the end of the array. If LENGTH is omitted, removes everything from OFFSET onward. If LENGTH is negative, leave that many elements off the end of the array. The following equivalences hold (assuming C<$[ == 0>): @@ -3790,7 +3869,7 @@ Example, assuming array lengths are passed before arrays: =item split -Splits a string into an array of strings, and returns it. By default, +Splits a string into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted. If not in list context, returns the number of fields found and splits into @@ -3807,7 +3886,7 @@ that the delimiter may be longer than one character.) If LIMIT is specified and positive, splits into no more than that many fields (though it may split into fewer). If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users -of C<pop()> would do well to remember). If LIMIT is negative, it is +of C<pop> would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified. A pattern matching the null string (not to be confused with @@ -3829,7 +3908,7 @@ unnecessary work. For the list above LIMIT would have been 4 by default. In time critical applications it behooves you not to split into more fields than you really need. -If the PATTERN contains parentheses, additional array elements are +If the PATTERN contains parentheses, additional list elements are created from each matching substring in the delimiter. split(/([,-])/, "1-10,20", 3); @@ -3838,7 +3917,7 @@ produces the list value (1, '-', 10, ',', 20) -If you had the entire header of a normal Unix email message in C<$header>, +If you had the entire header of a normal Unix email message in $header, you could split it up into fields and their values this way: $header =~ s/\n\s+/ /g; # fix continuation lines @@ -3849,11 +3928,11 @@ patterns that vary at runtime. (To do runtime compilation only once, use C</$variable/o>.) As a special case, specifying a PATTERN of space (C<' '>) will split on -white space just as C<split()> with no arguments does. Thus, C<split(' ')> can +white space just as C<split> with no arguments does. Thus, C<split(' ')> can be used to emulate B<awk>'s default behavior, whereas C<split(/ /)> will give you as many null initial fields as there are leading spaces. -A C<split()> on C</\s+/> is like a C<split(' ')> except that any leading -whitespace produces a null first field. A C<split()> with no arguments +A C<split> on C</\s+/> is like a C<split(' ')> except that any leading +whitespace produces a null first field. A C<split> with no arguments really does a C<split(' ', $_)> internally. Example: @@ -3865,22 +3944,22 @@ Example: #... } -(Note that C<$shell> above will still have a newline on it. See L</chop>, +(Note that $shell above will still have a newline on it. See L</chop>, L</chomp>, and L</join>.) =item sprintf FORMAT, LIST -Returns a string formatted by the usual C<printf()> conventions of the -C library function C<sprintf()>. See L<sprintf(3)> or L<printf(3)> +Returns a string formatted by the usual C<printf> conventions of the +C library function C<sprintf>. See L<sprintf(3)> or L<printf(3)> on your system for an explanation of the general principles. -Perl does its own C<sprintf()> formatting -- it emulates the C -function C<sprintf()>, but it doesn't use it (except for floating-point +Perl does its own C<sprintf> formatting--it emulates the C +function C<sprintf>, but it doesn't use it (except for floating-point numbers, and even then only the standard modifiers are allowed). As a -result, any non-standard extensions in your local C<sprintf()> are not +result, any non-standard extensions in your local C<sprintf> are not available from Perl. -Perl's C<sprintf()> permits the following universally-known conversions: +Perl's C<sprintf> permits the following universally-known conversions: %% a percent sign %c a character with the given number @@ -3931,11 +4010,11 @@ There is also one Perl-specific flag: V interpret integer as Perl's standard integer type -Where a number would appear in the flags, an asterisk ("C<*>") may be +Where a number would appear in the flags, an asterisk (C<*>) may be used instead, in which case Perl uses the next item in the parameter list as the given number (that is, as the field width or precision). -If a field width obtained through "C<*>" is negative, it has the same -effect as the "C<->" flag: left-justification. +If a field width obtained through C<*> is negative, it has the same +effect as the C<-> flag: left-justification. If C<use locale> is in effect, the character used for the decimal point in formatted real numbers is affected by the LC_NUMERIC locale. @@ -3956,19 +4035,19 @@ loaded the standard Math::Complex module. =item srand -Sets the random number seed for the C<rand()> operator. If EXPR is +Sets the random number seed for the C<rand> operator. If EXPR is omitted, uses a semi-random value supplied by the kernel (if it supports the F</dev/urandom> device) or based on the current time and process ID, among other things. In versions of Perl prior to 5.004 the default -seed was just the current C<time()>. This isn't a particularly good seed, +seed was just the current C<time>. This isn't a particularly good seed, so many old programs supply their own seed value (often C<time ^ $$> or C<time ^ ($$ + ($$ E<lt>E<lt> 15))>), but that isn't necessary any more. -In fact, it's usually not necessary to call C<srand()> at all, because if +In fact, it's usually not necessary to call C<srand> at all, because if it is not called explicitly, it is called implicitly at the first use of -the C<rand()> operator. However, this was not the case in version of Perl +the C<rand> operator. However, this was not the case in version of Perl before 5.004, so if your script will run under older Perl versions, it -should call C<srand()>. +should call C<srand>. Note that you need something much more random than the default seed for cryptographic purposes. Checksumming the compressed output of one or more @@ -3980,11 +4059,11 @@ example: If you're particularly concerned with this, see the C<Math::TrulyRandom> module in CPAN. -Do I<not> call C<srand()> multiple times in your program unless you know +Do I<not> call C<srand> multiple times in your program unless you know exactly what you're doing and why you're doing it. The point of the -function is to "seed" the C<rand()> function so that C<rand()> can produce +function is to "seed" the C<rand> function so that C<rand> can produce a different sequence each time you run your program. Just do it once at the -top of your program, or you I<won't> get random numbers out of C<rand()>! +top of your program, or you I<won't> get random numbers out of C<rand>! Frequently called programs (like CGI scripts) that simply use @@ -4047,8 +4126,7 @@ if you want to see the real permissions. $mode = (stat($filename))[2]; printf "Permissions are %04o\n", $mode & 07777; - -In scalar context, C<stat()> returns a boolean value indicating success +In scalar context, C<stat> returns a boolean value indicating success or failure, and, if successful, sets the information associated with the special filehandle C<_>. @@ -4068,12 +4146,12 @@ Takes extra time to study SCALAR (C<$_> if unspecified) in anticipation of doing many pattern matches on the string before it is next modified. This may or may not save time, depending on the nature and number of patterns you are searching on, and on the distribution of character -frequencies in the string to be searched -- you probably want to compare +frequencies in the string to be searched--you probably want to compare run times with and without it to see which runs faster. Those loops which scan for many short constant strings (including the constant parts of more complex patterns) will benefit most. You may have only -one C<study()> active at a time -- if you study a different scalar the first -is "unstudied". (The way C<study()> works is this: a linked list of every +one C<study> active at a time--if you study a different scalar the first +is "unstudied". (The way C<study> works is this: a linked list of every character in the string to be searched is made, so we know, for example, where all the C<'k'> characters are. From each search string, the rarest character is selected, based on some static frequency tables @@ -4099,7 +4177,7 @@ it saves you more time than it took to build the linked list in the first place. Note that if you have to look for strings that you don't know till -runtime, you can build an entire loop as a string and C<eval()> that to +runtime, you can build an entire loop as a string and C<eval> that to avoid recompiling all your patterns all the time. Together with undefining C<$/> to input entire files as one record, this can be very fast, often faster than specialized programs like fgrep(1). The following @@ -4152,7 +4230,7 @@ You can use the substr() function as an lvalue, in which case EXPR must itself be an lvalue. If you assign something shorter than LEN, the string will shrink, and if you assign something longer than LEN, the string will grow to accommodate it. To keep the string the same -length you may need to pad or chop your value using C<sprintf()>. +length you may need to pad or chop your value using C<sprintf>. An alternative to using substr() as an lvalue is to specify the replacement string as the 4th argument. This allows you to replace @@ -4177,12 +4255,12 @@ as follows: if a given argument is numeric, the argument is passed as an int. If not, the pointer to the string value is passed. You are responsible to make sure a string is pre-extended long enough to receive any result that might be written into a string. You can't use a -string literal (or other read-only string) as an argument to C<syscall()> +string literal (or other read-only string) as an argument to C<syscall> because Perl has to assume that any string pointer might be written through. If your integer arguments are not literals and have never been interpreted in a numeric context, you may need to add C<0> to them to force them to look -like numbers. This emulates the C<syswrite()> function (or vice versa): +like numbers. This emulates the C<syswrite> function (or vice versa): require 'syscall.ph'; # may need to run h2ph $s = "hi there\n"; @@ -4192,7 +4270,7 @@ Note that Perl supports passing of up to only 14 arguments to your system call, which in practice should usually suffice. Syscall returns whatever value returned by the system call it calls. -If the system call fails, C<syscall()> returns C<-1> and sets C<$!> (errno). +If the system call fails, C<syscall> returns C<-1> and sets C<$!> (errno). Note that some system calls can legitimately return C<-1>. The proper way to handle such calls is to assign C<$!=0;> before the call and check the value of C<$!> if syscall returns C<-1>. @@ -4200,7 +4278,7 @@ check the value of C<$!> if syscall returns C<-1>. There's a problem with C<syscall(&SYS_pipe)>: it returns the file number of the read end of the pipe it creates. There is no way to retrieve the file number of the other end. You can avoid this -problem by using C<pipe()> instead. +problem by using C<pipe> instead. =item sysopen FILEHANDLE,FILENAME,MODE @@ -4209,7 +4287,7 @@ problem by using C<pipe()> instead. Opens the file whose filename is given by FILENAME, and associates it with FILEHANDLE. If FILEHANDLE is an expression, its value is used as the name of the real filehandle wanted. This function calls the -underlying operating system's C<open()> function with the parameters +underlying operating system's C<open> function with the parameters FILENAME, MODE, PERMS. The possible values and flag bits of the MODE parameter are @@ -4220,14 +4298,14 @@ means read/write. We know that these values do I<not> work under OS/390 & VM/ESA Unix and on the Macintosh; you probably don't want to use them in new code. -If the file named by FILENAME does not exist and the C<open()> call creates +If the file named by FILENAME does not exist and the C<open> call creates it (typically because MODE includes the C<O_CREAT> flag), then the value of PERMS specifies the permissions of the newly created file. If you omit -the PERMS argument to C<sysopen()>, Perl uses the octal value C<0666>. +the PERMS argument to C<sysopen>, Perl uses the octal value C<0666>. These permission values need to be in octal, and are modified by your process's current C<umask>. -You should seldom if ever use C<0644> as argument to C<sysopen()>, because +You should seldom if ever use C<0644> as argument to C<sysopen>, because that takes away the user's option to have a more permissive umask. Better to omit it. See the perlfunc(1) entry on C<umask> for more on this. @@ -4240,8 +4318,8 @@ See L<perlopentut> for a kinder, gentler explanation of opening files. Attempts to read LENGTH bytes of data into variable SCALAR from the specified FILEHANDLE, using the system call read(2). It bypasses stdio, -so mixing this with other kinds of reads, C<print()>, C<write()>, -C<seek()>, C<tell()>, or C<eof()> can cause confusion because stdio +so mixing this with other kinds of reads, C<print>, C<write>, +C<seek>, C<tell>, or C<eof> can cause confusion because stdio usually buffers data. Returns the number of bytes actually read, C<0> at end of file, or undef if there was an error. SCALAR will be grown or shrunk so that the last byte actually read is the last byte of the @@ -4256,13 +4334,13 @@ the result of the read is appended. There is no syseof() function, which is ok, since eof() doesn't work very well on device files (like ttys) anyway. Use sysread() and check -ofr a return value for 0 to decide whether you're done. +for a return value for 0 to decide whether you're done. =item sysseek FILEHANDLE,POSITION,WHENCE Sets FILEHANDLE's system position using the system call lseek(2). It -bypasses stdio, so mixing this with reads (other than C<sysread()>), -C<print()>, C<write()>, C<seek()>, C<tell()>, or C<eof()> may cause +bypasses stdio, so mixing this with reads (other than C<sysread>), +C<print>, C<write>, C<seek>, C<tell>, or C<eof> may cause confusion. FILEHANDLE may be an expression whose value gives the name of the filehandle. The values for WHENCE are C<0> to set the new position to POSITION, C<1> to set the it to the current position plus @@ -4271,37 +4349,40 @@ For WHENCE, you may use the constants C<SEEK_SET>, C<SEEK_CUR>, and C<SEEK_END> from either the C<IO::Seekable> or the POSIX module. Returns the new position, or the undefined value on failure. A position -of zero is returned as the string "C<0> but true"; thus C<sysseek()> returns -TRUE on success and FALSE on failure, yet you can still easily determine +of zero is returned as the string C<"0 but true">; thus C<sysseek> returns +true on success and false on failure, yet you can still easily determine the new position. =item system LIST =item system PROGRAM LIST -Does exactly the same thing as "C<exec LIST>", except that a fork is done -first, and the parent process waits for the child process to complete. -Note that argument processing varies depending on the number of -arguments. If there is more than one argument in LIST, or if LIST is -an array with more than one value, starts the program given by the -first element of the list with arguments given by the rest of the list. -If there is only one scalar argument, the argument is -checked for shell metacharacters, and if there are any, the entire -argument is passed to the system's command shell for parsing (this is -C</bin/sh -c> on Unix platforms, but varies on other platforms). If -there are no shell metacharacters in the argument, it is split into -words and passed directly to C<execvp()>, which is more efficient. +Does exactly the same thing as C<exec LIST>, except that a fork is +done first, and the parent process waits for the child process to +complete. Note that argument processing varies depending on the +number of arguments. If there is more than one argument in LIST, +or if LIST is an array with more than one value, starts the program +given by the first element of the list with arguments given by the +rest of the list. If there is only one scalar argument, the argument +is checked for shell metacharacters, and if there are any, the +entire argument is passed to the system's command shell for parsing +(this is C</bin/sh -c> on Unix platforms, but varies on other +platforms). If there are no shell metacharacters in the argument, +it is split into words and passed directly to C<execvp>, which is +more efficient. + +All files opened for output are flushed before attempting the exec(). The return value is the exit status of the program as -returned by the C<wait()> call. To get the actual exit value divide by -256. See also L</exec>. This is I<NOT> what you want to use to capture +returned by the C<wait> call. To get the actual exit value divide by +256. See also L</exec>. This is I<not> what you want to use to capture the output from a command, for that you should use merely backticks or C<qx//>, as described in L<perlop/"`STRING`">. -Like C<exec()>, C<system()> allows you to lie to a program about its name if -you use the "C<system PROGRAM LIST>" syntax. Again, see L</exec>. +Like C<exec>, C<system> allows you to lie to a program about its name if +you use the C<system PROGRAM LIST> syntax. Again, see L</exec>. -Because C<system()> and backticks block C<SIGINT> and C<SIGQUIT>, killing the +Because C<system> and backticks block C<SIGINT> and C<SIGQUIT>, killing the program they're running doesn't actually interrupt your program. @args = ("command", "arg1", "arg2"); @@ -4326,14 +4407,14 @@ See L<perlop/"`STRING`"> and L</exec> for details. =item syswrite FILEHANDLE,SCALAR Attempts to write LENGTH bytes of data from variable SCALAR to the -specified FILEHANDLE, using the system call write(2). If LENGTH is -not specified, writes whole SCALAR. It bypasses -stdio, so mixing this with reads (other than C<sysread())>, C<print()>, -C<write()>, C<seek()>, C<tell()>, or C<eof()> may cause confusion -because stdio usually buffers data. Returns the number of bytes -actually written, or C<undef> if there was an error. If the LENGTH is -greater than the available data in the SCALAR after the OFFSET, only as -much data as is available will be written. +specified FILEHANDLE, using the system call write(2). If LENGTH +is not specified, writes whole SCALAR. It bypasses stdio, so mixing +this with reads (other than C<sysread())>, C<print>, C<write>, +C<seek>, C<tell>, or C<eof> may cause confusion because stdio +usually buffers data. Returns the number of bytes actually written, +or C<undef> if there was an error. If the LENGTH is greater than +the available data in the SCALAR after the OFFSET, only as much +data as is available will be written. An OFFSET may be specified to write the data from some part of the string other than the beginning. A negative OFFSET specifies writing @@ -4348,12 +4429,12 @@ Returns the current position for FILEHANDLE. FILEHANDLE may be an expression whose value gives the name of the actual filehandle. If FILEHANDLE is omitted, assumes the file last read. -There is no C<systell()> function. Use C<sysseek(FH, 0, 1)> for that. +There is no C<systell> function. Use C<sysseek(FH, 0, 1)> for that. =item telldir DIRHANDLE -Returns the current position of the C<readdir()> routines on DIRHANDLE. -Value may be given to C<seekdir()> to access a particular location in a +Returns the current position of the C<readdir> routines on DIRHANDLE. +Value may be given to C<seekdir> to access a particular location in a directory. Has the same caveats about possible directory compaction as the corresponding system library routine. @@ -4362,16 +4443,16 @@ the corresponding system library routine. This function binds a variable to a package class that will provide the implementation for the variable. VARIABLE is the name of the variable to be enchanted. CLASSNAME is the name of a class implementing objects -of correct type. Any additional arguments are passed to the "C<new()>" +of correct type. Any additional arguments are passed to the C<new> method of the class (meaning C<TIESCALAR>, C<TIEHANDLE>, C<TIEARRAY>, or C<TIEHASH>). Typically these are arguments such as might be passed -to the C<dbm_open()> function of C. The object returned by the "C<new()>" -method is also returned by the C<tie()> function, which would be useful +to the C<dbm_open()> function of C. The object returned by the C<new> +method is also returned by the C<tie> function, which would be useful if you want to access other methods in CLASSNAME. -Note that functions such as C<keys()> and C<values()> may return huge lists +Note that functions such as C<keys> and C<values> may return huge lists when used on large objects, like DBM files. You may prefer to use the -C<each()> function to iterate over such. Example: +C<each> function to iterate over such. Example: # print out history file offsets use NDBM_File; @@ -4431,16 +4512,16 @@ A class implementing a scalar should have the following methods: Not all methods indicated above need be implemented. See L<perltie>, L<Tie::Hash>, L<Tie::Array>, L<Tie::Scalar>, and L<Tie::Handle>. -Unlike C<dbmopen()>, the C<tie()> function will not use or require a module +Unlike C<dbmopen>, the C<tie> function will not use or require a module for you--you need to do that explicitly yourself. See L<DB_File> -or the F<Config> module for interesting C<tie()> implementations. +or the F<Config> module for interesting C<tie> implementations. For further details see L<perltie>, L<"tied VARIABLE">. =item tied VARIABLE Returns a reference to the object underlying VARIABLE (the same value -that was originally returned by the C<tie()> call that bound the variable +that was originally returned by the C<tie> call that bound the variable to a package.) Returns the undefined value if VARIABLE isn't tied to a package. @@ -4449,7 +4530,7 @@ package. Returns the number of non-leap seconds since whatever time the system considers to be the epoch (that's 00:00:00, January 1, 1904 for MacOS, and 00:00:00 UTC, January 1, 1970 for most other systems). -Suitable for feeding to C<gmtime()> and C<localtime()>. +Suitable for feeding to C<gmtime> and C<localtime>. =item times @@ -4460,7 +4541,7 @@ seconds, for this process and the children of this process. =item tr/// -The transliteration operator. Same as C<y///>. See L<perlop>. +The transliteration operator. Same as C<y///>. See L<perlop>. =item truncate FILEHANDLE,LENGTH @@ -4468,7 +4549,7 @@ The transliteration operator. Same as C<y///>. See L<perlop>. Truncates the file opened on FILEHANDLE, or named by EXPR, to the specified length. Produces a fatal error if truncate isn't implemented -on your system. Returns TRUE if successful, the undefined value +on your system. Returns true if successful, the undefined value otherwise. =item uc EXPR @@ -4479,7 +4560,7 @@ Returns an uppercased version of EXPR. This is the internal function implementing the C<\U> escape in double-quoted strings. Respects current LC_CTYPE locale if C<use locale> in force. See L<perllocale>. Under Unicode (C<use utf8>) it uses the standard Unicode uppercase mappings. (It -does not attempt to do titlecase mapping on initial letters. See C<ucfirst()> for that.) +does not attempt to do titlecase mapping on initial letters. See C<ucfirst> for that.) If EXPR is omitted, uses C<$_>. @@ -4511,12 +4592,12 @@ even if you tell C<sysopen> to create a file with permissions C<0777>, if your umask is C<0022> then the file will actually be created with permissions C<0755>. If your C<umask> were C<0027> (group can't write; others can't read, write, or execute), then passing -C<sysopen()> C<0666> would create a file with mode C<0640> (C<0666 &~ +C<sysopen> C<0666> would create a file with mode C<0640> (C<0666 &~ 027> is C<0640>). Here's some advice: supply a creation mode of C<0666> for regular -files (in C<sysopen()>) and one of C<0777> for directories (in -C<mkdir()>) and executable files. This gives users the freedom of +files (in C<sysopen>) and one of C<0777> for directories (in +C<mkdir>) and executable files. This gives users the freedom of choice: if they want protected files, they might choose process umasks of C<022>, C<027>, or even the particularly antisocial mask of C<077>. Programs should rarely if ever make policy decisions better left to @@ -4537,8 +4618,8 @@ string of octal digits. See also L</oct>, if all you have is a string. =item undef Undefines the value of EXPR, which must be an lvalue. Use only on a -scalar value, an array (using "C<@>"), a hash (using "C<%>"), a subroutine -(using "C<&>"), or a typeglob (using "<*>"). (Saying C<undef $hash{$key}> +scalar value, an array (using C<@>), a hash (using C<%>), a subroutine +(using C<&>), or a typeglob (using <*>). (Saying C<undef $hash{$key}> will probably not do what you expect on most predefined variables or DBM list values, so don't do that; see L<delete>.) Always returns the undefined value. You can omit the EXPR, in which case nothing is @@ -4569,19 +4650,19 @@ deleted. unlink @goners; unlink <*.bak>; -Note: C<unlink()> will not delete directories unless you are superuser and +Note: C<unlink> will not delete directories unless you are superuser and the B<-U> flag is supplied to Perl. Even if these conditions are met, be warned that unlinking a directory can inflict damage on your -filesystem. Use C<rmdir()> instead. +filesystem. Use C<rmdir> instead. If LIST is omitted, uses C<$_>. =item unpack TEMPLATE,EXPR -C<Unpack()> does the reverse of C<pack()>: it takes a string representing a -structure and expands it out into a list value, returning the array -value. (In scalar context, it returns merely the first value -produced.) The TEMPLATE has the same format as in the C<pack()> function. +C<unpack> does the reverse of C<pack>: it takes a string +representing a structure and expands it out into a list of values. +(In scalar context, it returns merely the first value produced.) +The TEMPLATE has the same format as in the C<pack> function. Here's a subroutine that does substring: sub substr { @@ -4598,10 +4679,10 @@ you want a E<lt>numberE<gt>-bit checksum of the items instead of the items themselves. Default is a 16-bit checksum. For example, the following computes the same number as the System V sum program: - while (<>) { - $checksum += unpack("%32C*", $_); - } - $checksum %= 65535; + $checksum = do { + local $/; # slurp! + unpack("%32C*",<>) % 65535; + }; The following efficiently counts the number of set bits in a bit vector: @@ -4616,18 +4697,18 @@ See L</pack> for more examples. =item untie VARIABLE -Breaks the binding between a variable and a package. (See C<tie()>.) +Breaks the binding between a variable and a package. (See C<tie>.) =item unshift ARRAY,LIST -Does the opposite of a C<shift()>. Or the opposite of a C<push()>, +Does the opposite of a C<shift>. Or the opposite of a C<push>, depending on how you look at it. Prepends list to the front of the array, and returns the new number of elements in the array. unshift(ARGV, '-e') unless $ARGV[0] =~ /^-/; Note the LIST is prepended whole, not one element at a time, so the -prepended elements stay in the same order. Use C<reverse()> to do the +prepended elements stay in the same order. Use C<reverse> to do the reverse. =item use Module LIST @@ -4654,14 +4735,14 @@ Perl version before C<use>ing library modules that have changed in incompatible ways from older versions of Perl. (We try not to do this more than we have to.) -The C<BEGIN> forces the C<require> and C<import()> to happen at compile time. The +The C<BEGIN> forces the C<require> and C<import> to happen at compile time. The C<require> makes sure the module is loaded into memory if it hasn't been -yet. The C<import()> is not a builtin--it's just an ordinary static method -call into the "C<Module>" package to tell the module to import the list of +yet. The C<import> is not a builtin--it's just an ordinary static method +call into the C<Module> package to tell the module to import the list of features back into the current package. The module can implement its -C<import()> method any way it likes, though most modules just choose to -derive their C<import()> method via inheritance from the C<Exporter> class that -is defined in the C<Exporter> module. See L<Exporter>. If no C<import()> +C<import> method any way it likes, though most modules just choose to +derive their C<import> method via inheritance from the C<Exporter> class that +is defined in the C<Exporter> module. See L<Exporter>. If no C<import> method can be found then the error is currently silently ignored. This may change to a fatal error in a future version. @@ -4689,18 +4770,18 @@ are also implemented this way. Currently implemented pragmas are: use strict qw(subs vars refs); use subs qw(afunc blurfl); -Some of these these pseudo-modules import semantics into the current +Some of these pseudo-modules import semantics into the current block scope (like C<strict> or C<integer>, unlike ordinary modules, which import symbols into the current package (which are effective through the end of the file). -There's a corresponding "C<no>" command that unimports meanings imported -by C<use>, i.e., it calls C<unimport Module LIST> instead of C<import()>. +There's a corresponding C<no> command that unimports meanings imported +by C<use>, i.e., it calls C<unimport Module LIST> instead of C<import>. no integer; no strict 'refs'; -If no C<unimport()> method can be found the call fails with a fatal error. +If no C<unimport> method can be found the call fails with a fatal error. See L<perlmod> for a list of standard modules and pragmas. @@ -4710,7 +4791,7 @@ Changes the access and modification times on each file of a list of files. The first two elements of the list must be the NUMERICAL access and modification times, in that order. Returns the number of files successfully changed. The inode modification time of each file is set -to the current time. This code has the same effect as the "C<touch>" +to the current time. This code has the same effect as the C<touch> command if the files already exist: #!/usr/bin/perl @@ -4723,7 +4804,7 @@ Returns a list consisting of all the values of the named hash. (In a scalar context, returns the number of values.) The values are returned in an apparently random order. The actual random order is subject to change in future versions of perl, but it is guaranteed to -be the same order as either the C<keys()> or C<each()> function would +be the same order as either the C<keys> or C<each> function would produce on the same (unmodified) hash. Note that you cannot modify the values of a hash this way, because the @@ -4734,25 +4815,25 @@ since it's lvaluable in a way that values() is not. for (@hash{keys %hash}) { s/foo/bar/g } # ok As a side effect, calling values() resets the HASH's internal iterator. -See also C<keys()>, C<each()>, and C<sort()>. +See also C<keys>, C<each>, and C<sort>. =item vec EXPR,OFFSET,BITS Treats the string in EXPR as a vector of unsigned integers, and returns the value of the bit field specified by OFFSET. BITS specifies the number of bits that are reserved for each entry in the bit -vector. This must be a power of two from 1 to 32. C<vec()> may also be +vector. This must be a power of two from 1 to 32. C<vec> may also be assigned to, in which case parentheses are needed to give the expression the correct precedence as in vec($image, $max_x * $x + $y, 8) = 3; -Vectors created with C<vec()> can also be manipulated with the logical +Vectors created with C<vec> can also be manipulated with the logical operators C<|>, C<&>, and C<^>, which will assume a bit vector operation is desired when both operands are strings. See L<perlop/"Bitwise String Operators">. The following code will build up an ASCII string saying C<'PerlPerlPerl'>. -The comments show the string after each step. Note that this code works +The comments show the string after each step. Note that this code works in the same way on big-endian or little-endian machines. my $foo = ''; @@ -4769,7 +4850,7 @@ in the same way on big-endian or little-endian machines. vec($foo, 94, 1) = 1; # 'PerlPerlPerl' # 'l' is "\x6c" -To transform a bit vector into a string or array of 0's and 1's, use these: +To transform a bit vector into a string or list of 0's and 1's, use these: $bits = unpack("b*", $vector); @bits = split(//, unpack("b*", $vector)); @@ -4780,7 +4861,7 @@ If you know the exact length in bits, it can be used in place of the C<*>. Behaves like the wait(2) system call on your system: it waits for a child process to terminate and returns the pid of the deceased process, or -C<-1> if there are no child processes. The status is rketurned in C<$?>. +C<-1> if there are no child processes. The status is returned in C<$?>. Note that a return value of C<-1> could mean that child processes are being automatically reaped, as described in L<perlipc>. @@ -4810,8 +4891,8 @@ and for other examples. =item wantarray -Returns TRUE if the context of the currently executing subroutine is -looking for a list value. Returns FALSE if the context is looking +Returns true if the context of the currently executing subroutine is +looking for a list value. Returns false if the context is looking for a scalar. Returns the undefined value if the context is looking for no value (void context). @@ -4819,30 +4900,32 @@ for no value (void context). my @a = complex_calculation(); return wantarray ? @a : "@a"; +This function should have been named wantlist() instead. + =item warn LIST -Produces a message on STDERR just like C<die()>, but doesn't exit or throw +Produces a message on STDERR just like C<die>, but doesn't exit or throw an exception. If LIST is empty and C<$@> already contains a value (typically from a previous eval) that value is used after appending C<"\t...caught"> -to C<$@>. This is useful for staying almost, but not entirely similar to -C<die()>. +to C<$@>. This is useful for staying almost, but not entirely similar to +C<die>. If C<$@> is empty then the string C<"Warning: Something's wrong"> is used. No message is printed if there is a C<$SIG{__WARN__}> handler installed. It is the handler's responsibility to deal with the message -as it sees fit (like, for instance, converting it into a C<die()>). Most +as it sees fit (like, for instance, converting it into a C<die>). Most handlers must therefore make arrangements to actually display the -warnings that they are not prepared to deal with, by calling C<warn()> +warnings that they are not prepared to deal with, by calling C<warn> again in the handler. Note that this is quite safe and will not produce an endless loop, since C<__WARN__> hooks are not called from inside one. You will find this behavior is slightly different from that of C<$SIG{__DIE__}> handlers (which don't suppress the error text, but can -instead call C<die()> again to change it). +instead call C<die> again to change it). Using a C<__WARN__> handler provides a powerful way to silence all warnings (even the so-called mandatory ones). An example: @@ -4871,7 +4954,7 @@ carp() and cluck() functions. Writes a formatted record (possibly multi-line) to the specified FILEHANDLE, using the format associated with that file. By default the format for a file is the one having the same name as the filehandle, but the -format for the current output channel (see the C<select()> function) may be set +format for the current output channel (see the C<select> function) may be set explicitly by assigning the name of the format to the C<$~> variable. Top of form processing is handled automatically: if there is @@ -4886,11 +4969,11 @@ variable C<$->, which can be set to C<0> to force a new page. If FILEHANDLE is unspecified, output goes to the current default output channel, which starts out as STDOUT but may be changed by the -C<select()> operator. If the FILEHANDLE is an EXPR, then the expression +C<select> operator. If the FILEHANDLE is an EXPR, then the expression is evaluated and the resulting string is used to look up the name of the FILEHANDLE at run time. For more on formats, see L<perlform>. -Note that write is I<NOT> the opposite of C<read()>. Unfortunately. +Note that write is I<not> the opposite of C<read>. Unfortunately. =item y/// diff --git a/pod/perlhist.pod b/pod/perlhist.pod index af5971f719..7cee85c3d9 100644 --- a/pod/perlhist.pod +++ b/pod/perlhist.pod @@ -310,7 +310,7 @@ the strings?). 5.005_03-MT3 1999-Jan-17 5.005_03-MT4 1999-Jan-26 5.005_03-MT5 1999-Jan-28 - 5.005_03 1999-***-** + 5.005_03 1999-Mar-28 Sarathy 5.005_50 1998-Jul-26 The 5.006 development track. 5.005_51 1998-Aug-10 @@ -319,6 +319,7 @@ the strings?). 5.005_54 1998-Nov-30 5.005_55 1999-Feb-16 5.005_56 1999-Mar-01 + 5.005_57 1999-May-25 =head2 SELECTED RELEASE SIZES diff --git a/pod/perlipc.pod b/pod/perlipc.pod index 1492ccfc31..e687304510 100644 --- a/pod/perlipc.pod +++ b/pod/perlipc.pod @@ -58,7 +58,7 @@ You may also choose to assign the strings C<'IGNORE'> or C<'DEFAULT'> as the handler, in which case Perl will try to discard the signal or do the default thing. -On most UNIX platforms, the C<CHLD> (sometimes also known as C<CLD>) signal +On most Unix platforms, the C<CHLD> (sometimes also known as C<CLD>) signal has special behavior with respect to a value of C<'IGNORE'>. Setting C<$SIG{CHLD}> to C<'IGNORE'> on such a platform has the effect of not creating zombie processes when the parent process fails to C<wait()> @@ -276,7 +276,7 @@ same effect as opening a pipe for reading: While this is true on the surface, it's much more efficient to process the file one line or record at a time because then you don't have to read the -whole thing into memory at once. It also gives you finer control of the +whole thing into memory at once. It also gives you finer control of the whole process, letting you to kill off the child process early if you'd like. @@ -1157,7 +1157,7 @@ server. (Under Unix, ports under 1024 are restricted to the superuser.) In our sample, we'll use port 9000, but you can use any port that's not currently in use on your system. If you try to use one already in used, you'll get an "Address already in use" -message. Under Unix, the C<netstat -a> command will show +message. Under Unix, the C<netstat -a> command will show which services current have servers. =item Listen diff --git a/pod/perllol.pod b/pod/perllol.pod index 56f08c2090..f015a20bc4 100644 --- a/pod/perllol.pod +++ b/pod/perllol.pod @@ -1,62 +1,62 @@ =head1 NAME -perlLoL - Manipulating Lists of Lists in Perl +perllol - Manipulating Arrays of Arrays in Perl =head1 DESCRIPTION -=head1 Declaration and Access of Lists of Lists +=head1 Declaration and Access of Arrays of Arrays -The simplest thing to build is a list of lists (sometimes called an array -of arrays). It's reasonably easy to understand, and almost everything -that applies here will also be applicable later on with the fancier data -structures. +The simplest thing to build an array of arrays (sometimes imprecisely +called a list of lists). It's reasonably easy to understand, and +almost everything that applies here will also be applicable later +on with the fancier data structures. -A list of lists, or an array of an array if you would, is just a regular -old array @LoL that you can get at with two subscripts, like C<$LoL[3][2]>. Here's -a declaration of the array: +An array of an array is just a regular old array @AoA that you can +get at with two subscripts, like C<$AoA[3][2]>. Here's a declaration +of the array: - # assign to our array a list of list references - @LoL = ( + # assign to our array, an array of array references + @AoA = ( [ "fred", "barney" ], [ "george", "jane", "elroy" ], [ "homer", "marge", "bart" ], ); - print $LoL[2][2]; + print $AoA[2][2]; bart Now you should be very careful that the outer bracket type is a round one, that is, a parenthesis. That's because you're assigning to -an @list, so you need parentheses. If you wanted there I<not> to be an @LoL, +an @array, so you need parentheses. If you wanted there I<not> to be an @AoA, but rather just a reference to it, you could do something more like this: - # assign a reference to list of list references - $ref_to_LoL = [ + # assign a reference to array of array references + $ref_to_AoA = [ [ "fred", "barney", "pebbles", "bambam", "dino", ], [ "homer", "bart", "marge", "maggie", ], [ "george", "jane", "elroy", "judy", ], ]; - print $ref_to_LoL->[2][2]; + print $ref_to_AoA->[2][2]; Notice that the outer bracket type has changed, and so our access syntax has also changed. That's because unlike C, in perl you can't freely -interchange arrays and references thereto. $ref_to_LoL is a reference to an -array, whereas @LoL is an array proper. Likewise, C<$LoL[2]> is not an +interchange arrays and references thereto. $ref_to_AoA is a reference to an +array, whereas @AoA is an array proper. Likewise, C<$AoA[2]> is not an array, but an array ref. So how come you can write these: - $LoL[2][2] - $ref_to_LoL->[2][2] + $AoA[2][2] + $ref_to_AoA->[2][2] instead of having to write these: - $LoL[2]->[2] - $ref_to_LoL->[2]->[2] + $AoA[2]->[2] + $ref_to_AoA->[2]->[2] Well, that's because the rule is that on adjacent brackets only (whether square or curly), you are free to omit the pointer dereferencing arrow. But you cannot do so for the very first one if it's a scalar containing -a reference, which means that $ref_to_LoL always needs it. +a reference, which means that $ref_to_AoA always needs it. =head1 Growing Your Own @@ -67,81 +67,81 @@ it up entirely from scratch? First, let's look at reading it in from a file. This is something like adding a row at a time. We'll assume that there's a flat file in which each line is a row and each word an element. If you're trying to develop an -@LoL list containing all these, here's the right way to do that: +@AoA array containing all these, here's the right way to do that: while (<>) { @tmp = split; - push @LoL, [ @tmp ]; + push @AoA, [ @tmp ]; } You might also have loaded that from a function: for $i ( 1 .. 10 ) { - $LoL[$i] = [ somefunc($i) ]; + $AoA[$i] = [ somefunc($i) ]; } Or you might have had a temporary variable sitting around with the -list in it. +array in it. for $i ( 1 .. 10 ) { @tmp = somefunc($i); - $LoL[$i] = [ @tmp ]; + $AoA[$i] = [ @tmp ]; } -It's very important that you make sure to use the C<[]> list reference +It's very important that you make sure to use the C<[]> array reference constructor. That's because this will be very wrong: - $LoL[$i] = @tmp; + $AoA[$i] = @tmp; -You see, assigning a named list like that to a scalar just counts the +You see, assigning a named array like that to a scalar just counts the number of elements in @tmp, which probably isn't what you want. If you are running under C<use strict>, you'll have to add some declarations to make it happy: use strict; - my(@LoL, @tmp); + my(@AoA, @tmp); while (<>) { @tmp = split; - push @LoL, [ @tmp ]; + push @AoA, [ @tmp ]; } Of course, you don't need the temporary array to have a name at all: while (<>) { - push @LoL, [ split ]; + push @AoA, [ split ]; } You also don't have to use push(). You could just make a direct assignment if you knew where you wanted to put it: - my (@LoL, $i, $line); + my (@AoA, $i, $line); for $i ( 0 .. 10 ) { $line = <>; - $LoL[$i] = [ split ' ', $line ]; + $AoA[$i] = [ split ' ', $line ]; } or even just - my (@LoL, $i); + my (@AoA, $i); for $i ( 0 .. 10 ) { - $LoL[$i] = [ split ' ', <> ]; + $AoA[$i] = [ split ' ', <> ]; } -You should in general be leery of using potential list functions -in a scalar context without explicitly stating such. -This would be clearer to the casual reader: +You should in general be leery of using functions that could +potentially return lists in scalar context without explicitly stating +such. This would be clearer to the casual reader: - my (@LoL, $i); + my (@AoA, $i); for $i ( 0 .. 10 ) { - $LoL[$i] = [ split ' ', scalar(<>) ]; + $AoA[$i] = [ split ' ', scalar(<>) ]; } -If you wanted to have a $ref_to_LoL variable as a reference to an array, +If you wanted to have a $ref_to_AoA variable as a reference to an array, you'd have to do something like this: while (<>) { - push @$ref_to_LoL, [ split ]; + push @$ref_to_AoA, [ split ]; } Now you can add new rows. What about adding new columns? If you're @@ -149,12 +149,12 @@ dealing with just matrices, it's often easiest to use simple assignment: for $x (1 .. 10) { for $y (1 .. 10) { - $LoL[$x][$y] = func($x, $y); + $AoA[$x][$y] = func($x, $y); } } for $x ( 3, 7, 9 ) { - $LoL[$x][20] += func2($x); + $AoA[$x][20] += func2($x); } It doesn't matter whether those elements are already @@ -165,11 +165,11 @@ If you wanted just to append to a row, you'd have to do something a bit funnier looking: # add new columns to an existing row - push @{ $LoL[0] }, "wilma", "betty"; + push @{ $AoA[0] }, "wilma", "betty"; Notice that I I<couldn't> say just: - push $LoL[0], "wilma", "betty"; # WRONG! + push $AoA[0], "wilma", "betty"; # WRONG! In fact, that wouldn't even compile. How come? Because the argument to push() must be a real array, not just a reference to such. @@ -180,12 +180,12 @@ Now it's time to print your data structure out. How are you going to do that? Well, if you want only one of the elements, it's trivial: - print $LoL[0][0]; + print $AoA[0][0]; If you want to print the whole thing, though, you can't say - print @LoL; # WRONG + print @AoA; # WRONG because you'll get just references listed, and perl will never automatically dereference things for you. Instead, you have to @@ -193,41 +193,41 @@ roll yourself a loop or two. This prints the whole structure, using the shell-style for() construct to loop across the outer set of subscripts. - for $aref ( @LoL ) { + for $aref ( @AoA ) { print "\t [ @$aref ],\n"; } If you wanted to keep track of subscripts, you might do this: - for $i ( 0 .. $#LoL ) { - print "\t elt $i is [ @{$LoL[$i]} ],\n"; + for $i ( 0 .. $#AoA ) { + print "\t elt $i is [ @{$AoA[$i]} ],\n"; } or maybe even this. Notice the inner loop. - for $i ( 0 .. $#LoL ) { - for $j ( 0 .. $#{$LoL[$i]} ) { - print "elt $i $j is $LoL[$i][$j]\n"; + for $i ( 0 .. $#AoA ) { + for $j ( 0 .. $#{$AoA[$i]} ) { + print "elt $i $j is $AoA[$i][$j]\n"; } } As you can see, it's getting a bit complicated. That's why sometimes is easier to take a temporary on your way through: - for $i ( 0 .. $#LoL ) { - $aref = $LoL[$i]; + for $i ( 0 .. $#AoA ) { + $aref = $AoA[$i]; for $j ( 0 .. $#{$aref} ) { - print "elt $i $j is $LoL[$i][$j]\n"; + print "elt $i $j is $AoA[$i][$j]\n"; } } Hmm... that's still a bit ugly. How about this: - for $i ( 0 .. $#LoL ) { - $aref = $LoL[$i]; + for $i ( 0 .. $#AoA ) { + $aref = $AoA[$i]; $n = @$aref - 1; for $j ( 0 .. $n ) { - print "elt $i $j is $LoL[$i][$j]\n"; + print "elt $i $j is $AoA[$i][$j]\n"; } } @@ -240,49 +240,49 @@ pointer arrow for dereferencing, no such convenience exists for slices. (Remember, of course, that you can always write a loop to do a slice operation.) -Here's how to do one operation using a loop. We'll assume an @LoL +Here's how to do one operation using a loop. We'll assume an @AoA variable as before. @part = (); $x = 4; for ($y = 7; $y < 13; $y++) { - push @part, $LoL[$x][$y]; + push @part, $AoA[$x][$y]; } That same loop could be replaced with a slice operation: - @part = @{ $LoL[4] } [ 7..12 ]; + @part = @{ $AoA[4] } [ 7..12 ]; but as you might well imagine, this is pretty rough on the reader. Ah, but what if you wanted a I<two-dimensional slice>, such as having $x run from 4..8 and $y run from 7 to 12? Hmm... here's the simple way: - @newLoL = (); + @newAoA = (); for ($startx = $x = 4; $x <= 8; $x++) { for ($starty = $y = 7; $y <= 12; $y++) { - $newLoL[$x - $startx][$y - $starty] = $LoL[$x][$y]; + $newAoA[$x - $startx][$y - $starty] = $AoA[$x][$y]; } } We can reduce some of the looping through slices for ($x = 4; $x <= 8; $x++) { - push @newLoL, [ @{ $LoL[$x] } [ 7..12 ] ]; + push @newAoA, [ @{ $AoA[$x] } [ 7..12 ] ]; } If you were into Schwartzian Transforms, you would probably have selected map for that - @newLoL = map { [ @{ $LoL[$_] } [ 7..12 ] ] } 4 .. 8; + @newAoA = map { [ @{ $AoA[$_] } [ 7..12 ] ] } 4 .. 8; Although if your manager accused of seeking job security (or rapid insecurity) through inscrutable code, it would be hard to argue. :-) If I were you, I'd put that in a function: - @newLoL = splice_2D( \@LoL, 4 => 8, 7 => 12 ); + @newAoA = splice_2D( \@AoA, 4 => 8, 7 => 12 ); sub splice_2D { - my $lrr = shift; # ref to list of list refs! + my $lrr = shift; # ref to array of array refs! my ($x_lo, $x_hi, $y_lo, $y_hi) = @_; diff --git a/pod/perlmod.pod b/pod/perlmod.pod index 48ebf23711..0031d6e0e6 100644 --- a/pod/perlmod.pod +++ b/pod/perlmod.pod @@ -6,25 +6,27 @@ perlmod - Perl modules (packages and symbol tables) =head2 Packages -Perl provides a mechanism for alternative namespaces to protect packages -from stomping on each other's variables. In fact, there's really no such -thing as a global variable in Perl (although some identifiers default -to the main package instead of the current one). The package statement -declares the compilation unit as -being in the given namespace. The scope of the package declaration -is from the declaration itself through the end of the enclosing block, -C<eval>, C<sub>, or end of file, whichever comes first (the same scope -as the my() and local() operators). All further unqualified dynamic -identifiers will be in this namespace. A package statement only affects -dynamic variables--including those you've used local() on--but -I<not> lexical variables created with my(). Typically it would be -the first declaration in a file to be included by the C<require> or -C<use> operator. You can switch into a package in more than one place; -it merely influences which symbol table is used by the compiler for the -rest of that block. You can refer to variables and filehandles in other -packages by prefixing the identifier with the package name and a double -colon: C<$Package::Variable>. If the package name is null, the C<main> -package is assumed. That is, C<$::sail> is equivalent to C<$main::sail>. +Perl provides a mechanism for alternative namespaces to protect +packages from stomping on each other's variables. In fact, there's +really no such thing as a global variable in Perl . The package +statement declares the compilation unit as being in the given +namespace. The scope of the package declaration is from the +declaration itself through the end of the enclosing block, C<eval>, +or file, whichever comes first (the same scope as the my() and +local() operators). Unqualified dynamic identifiers will be in +this namespace, except for those few identifiers that if unqualified, +default to the main package instead of the current one as described +below. A package statement affects only dynamic variables--including +those you've used local() on--but I<not> lexical variables created +with my(). Typically it would be the first declaration in a file +included by the C<do>, C<require>, or C<use> operators. You can +switch into a package in more than one place; it merely influences +which symbol table is used by the compiler for the rest of that +block. You can refer to variables and filehandles in other packages +by prefixing the identifier with the package name and a double +colon: C<$Package::Variable>. If the package name is null, the +C<main> package is assumed. That is, C<$::sail> is equivalent to +C<$main::sail>. The old package delimiter was a single quote, but double colon is now the preferred delimiter, in part because it's more readable to humans, and @@ -37,35 +39,38 @@ C<"This is $owner's house">, you'll be accessing C<$owner::s>; that is, the $s variable in package C<owner>, which is probably not what you meant. Use braces to disambiguate, as in C<"This is ${owner}'s house">. -Packages may be nested inside other packages: C<$OUTER::INNER::var>. This -implies nothing about the order of name lookups, however. All symbols +Packages may themselves contain package separators, as in +C<$OUTER::INNER::var>. This implies nothing about the order of +name lookups, however. There are no relative packages: all symbols are either local to the current package, or must be fully qualified from the outer package name down. For instance, there is nowhere -within package C<OUTER> that C<$INNER::var> refers to C<$OUTER::INNER::var>. -It would treat package C<INNER> as a totally separate global package. - -Only identifiers starting with letters (or underscore) are stored in a -package's symbol table. All other symbols are kept in package C<main>, -including all of the punctuation variables like $_. In addition, when -unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, -INC, and SIG are forced to be in package C<main>, even when used for other -purposes than their builtin one. Note also that, if you have a package -called C<m>, C<s>, or C<y>, then you can't use the qualified form of an -identifier because it will be interpreted instead as a pattern match, -a substitution, or a transliteration. - -(Variables beginning with underscore used to be forced into package +within package C<OUTER> that C<$INNER::var> refers to +C<$OUTER::INNER::var>. It would treat package C<INNER> as a totally +separate global package. + +Only identifiers starting with letters (or underscore) are stored +in a package's symbol table. All other symbols are kept in package +C<main>, including all punctuation variables, like $_. In addition, +when unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, +ARGVOUT, ENV, INC, and SIG are forced to be in package C<main>, +even when used for other purposes than their built-in one. If you +have a package called C<m>, C<s>, or C<y>, then you can't use the +qualified form of an identifier because it would be instead interpreted +as a pattern match, a substitution, or a transliteration. + +Variables beginning with underscore used to be forced into package main, but we decided it was more useful for package writers to be able to use leading underscore to indicate private variables and method names. -$_ is still global though.) +$_ is still global though. See also L<perlvar/"Technical Note on the +Syntax of Variable Names">. -Eval()ed strings are compiled in the package in which the eval() was +C<eval>ed strings are compiled in the package in which the eval() was compiled. (Assignments to C<$SIG{}>, however, assume the signal handler specified is in the C<main> package. Qualify the signal handler name if you wish to have a signal handler in a package.) For an example, examine F<perldb.pl> in the Perl library. It initially switches to the C<DB> package so that the debugger doesn't interfere with variables -in the script you are trying to debug. At various points, however, it +in the program you are trying to debug. At various points, however, it temporarily switches back to the C<main> package to evaluate various expressions in the context of the C<main> package (or wherever you came from). See L<perldebug>. @@ -92,8 +97,8 @@ table lookups at compile time: local $main::{foo} = $main::{bar}; You can use this to print out all the variables in a package, for -instance. The standard F<dumpvar.pl> library and the CPAN module -Devel::Symdump make use of this. +instance. The standard but antequated F<dumpvar.pl> library and +the CPAN module Devel::Symdump make use of this. Assignment to a typeglob performs an aliasing operation, i.e., @@ -102,7 +107,7 @@ Assignment to a typeglob performs an aliasing operation, i.e., causes variables, subroutines, formats, and file and directory handles accessible via the identifier C<richard> also to be accessible via the identifier C<dick>. If you want to alias only a particular variable or -subroutine, you can assign a reference instead: +subroutine, assign a reference instead: *dick = \$richard; @@ -130,7 +135,7 @@ is a somewhat tricky way of passing around references cheaply when you won't want to have to remember to dereference variables explicitly. -Another use of symbol tables is for making "constant" scalars. +Another use of symbol tables is for making "constant" scalars. *PI = \3.14159265358979; @@ -157,14 +162,59 @@ This prints You gave me main::foo You gave me bar::baz -The *foo{THING} notation can also be used to obtain references to the +The C<*foo{THING}> notation can also be used to obtain references to the individual elements of *foo, see L<perlref>. +Subroutine definitions (and declarations, for that matter) need +not necessarily be situated in the package whose symbol table they +occupy. You can define a subroutine outside its package by +explicitly qualifying the name of the subroutine: + + package main; + sub Some_package::foo { ... } # &foo defined in Some_package + +This is just a shorthand for a typeglob assignment at compile time: + + BEGIN { *Some_package::foo = sub { ... } } + +and is I<not> the same as writing: + + { + package Some_package; + sub foo { ... } + } + +In the first two versions, the body of the subroutine is +lexically in the main package, I<not> in Some_package. So +something like this: + + package main; + + $Some_package::name = "fred"; + $main::name = "barney"; + + sub Some_package::foo { + print "in ", __PACKAGE__, ": \$name is '$name'\n"; + } + + Some_package::foo(); + +prints: + + in main: $name is 'barney' + +rather than: + + in Some_package: $name is 'fred' + +This also has implications for the use of the SUPER:: qualifier +(see L<perlobj>). + =head2 Package Constructors and Destructors -There are two special subroutine definitions that function as package -constructors and destructors. These are the C<BEGIN> and C<END> -routines. The C<sub> is optional for these routines. +Three special subroutines act as package +constructors and destructors. These are the C<BEGIN>, C<INIT>, and +C<END> routines. The C<sub> is optional for these routines. A C<BEGIN> subroutine is executed as soon as possible, that is, the moment it is completely defined, even before the rest of the containing file @@ -175,6 +225,11 @@ files in time to be visible to the rest of the file. Once a C<BEGIN> has run, it is immediately undefined and any code it used is returned to Perl's memory pool. This means you can't ever explicitly call a C<BEGIN>. +Similar to C<BEGIN> blocks, C<INIT> blocks are run just before the +Perl runtime begins execution. For example, the code generators +documented in L<perlcc> make use of C<INIT> blocks to initialize +and resolve pointers to XSUBs. + An C<END> subroutine is executed as late as possible, that is, when the interpreter is being exited, even if it is exiting as a result of a die() function. (But not if it's polymorphing into another program @@ -183,39 +238,40 @@ trap that yourself (if you can).) You may have multiple C<END> blocks within a file--they will execute in reverse order of definition; that is: last in, first out (LIFO). -Inside an C<END> subroutine, C<$?> contains the value that the script is +Inside an C<END> subroutine, C<$?> contains the value that the program is going to pass to C<exit()>. You can modify C<$?> to change the exit -value of the script. Beware of changing C<$?> by accident (e.g. by +value of the program. Beware of changing C<$?> by accident (e.g. by running something via C<system>). -Note that when you use the B<-n> and B<-p> switches to Perl, C<BEGIN> and +When you use the B<-n> and B<-p> switches to Perl, C<BEGIN> and C<END> work just as they do in B<awk>, as a degenerate case. As currently implemented (and subject to change, since its inconvenient at best), -both C<BEGIN> I<and> C<END> blocks are run when you use the B<-c> switch +both C<BEGIN> and<END> blocks are run when you use the B<-c> switch for a compile-only syntax check, although your main code is not. =head2 Perl Classes -There is no special class syntax in Perl, but a package may function +There is no special class syntax in Perl, but a package may act as a class if it provides subroutines to act as methods. Such a package may also derive some of its methods from another class (package) -by listing the other package name in its global @ISA array (which +by listing the other package name(s) in its global @ISA array (which must be a package global, not a lexical). For more on this, see L<perltoot> and L<perlobj>. =head2 Perl Modules -A module is just a package that is defined in a library file of -the same name, and is designed to be reusable. It may do this by -providing a mechanism for exporting some of its symbols into the symbol -table of any package using it. Or it may function as a class -definition and make its semantics available implicitly through method -calls on the class and its objects, without explicit exportation of any -symbols. Or it can do a little of both. +A module is just a set of related function in a library file a Perl +package with the same name as the file. It is specifically designed +to be reusable by other modules or programs. It may do this by +providing a mechanism for exporting some of its symbols into the +symbol table of any package using it. Or it may function as a class +definition and make its semantics available implicitly through +method calls on the class and its objects, without explicitly +exportating anything. Or it can do a little of both. -For example, to start a normal module called Some::Module, create -a file called Some/Module.pm and start with this template: +For example, to start a traditional, non-OO module called Some::Module, +create a file called F<Some/Module.pm> and start with this template: package Some::Module; # assumes Some/Module.pm @@ -275,10 +331,13 @@ a file called Some/Module.pm and start with this template: END { } # module clean-up code here (global destructor) -Then go on to declare and use your variables in functions -without any qualifications. -See L<Exporter> and the L<perlmodlib> for details on -mechanics and style issues in module creation. + ## YOUR CODE GOES HERE + + 1; # don't forget to return a true value from the file + +Then go on to declare and use your variables in functions without +any qualifications. See L<Exporter> and the L<perlmodlib> for +details on mechanics and style issues in module creation. Perl modules are included into your program by saying @@ -304,12 +363,13 @@ is exactly equivalent to BEGIN { require Module; } -All Perl module files have the extension F<.pm>. C<use> assumes this so -that you don't have to spell out "F<Module.pm>" in quotes. This also -helps to differentiate new modules from old F<.pl> and F<.ph> files. -Module names are also capitalized unless they're functioning as pragmas, -"Pragmas" are in effect compiler directives, and are sometimes called -"pragmatic modules" (or even "pragmata" if you're a classicist). +All Perl module files have the extension F<.pm>. The C<use> operator +assumes this so you don't have to spell out "F<Module.pm>" in quotes. +This also helps to differentiate new modules from old F<.pl> and +F<.ph> files. Module names are also capitalized unless they're +functioning as pragmas; pragmas are in effect compiler directives, +and are sometimes called "pragmatic modules" (or even "pragmata" +if you're a classicist). The two statements: @@ -319,18 +379,19 @@ The two statements: differ from each other in two ways. In the first case, any double colons in the module name, such as C<Some::Module>, are translated into your system's directory separator, usually "/". The second -case does not, and would have to be specified literally. The other difference -is that seeing the first C<require> clues in the compiler that uses of -indirect object notation involving "SomeModule", as in C<$ob = purge SomeModule>, -are method calls, not function calls. (Yes, this really can make a difference.) - -Because the C<use> statement implies a C<BEGIN> block, the importation -of semantics happens at the moment the C<use> statement is compiled, +case does not, and would have to be specified literally. The other +difference is that seeing the first C<require> clues in the compiler +that uses of indirect object notation involving "SomeModule", as +in C<$ob = purge SomeModule>, are method calls, not function calls. +(Yes, this really can make a difference.) + +Because the C<use> statement implies a C<BEGIN> block, the importing +of semantics happens as soon as the C<use> statement is compiled, before the rest of the file is compiled. This is how it is able to function as a pragma mechanism, and also how modules are able to -declare subroutines that are then visible as list operators for +declare subroutines that are then visible as list or unary operators for the rest of the current file. This will not work if you use C<require> -instead of C<use>. With require you can get into this problem: +instead of C<use>. With C<require> you can get into this problem: require Cwd; # make Cwd:: accessible $here = Cwd::getcwd(); @@ -354,22 +415,22 @@ filenames on some systems. Therefore, if a module's name is, say, C<Text::Soundex>, then its definition is actually found in the library file F<Text/Soundex.pm>. -Perl modules always have a F<.pm> file, but there may also be dynamically -linked executables or autoloaded subroutine definitions associated with -the module. If so, these will be entirely transparent to the user of -the module. It is the responsibility of the F<.pm> file to load (or -arrange to autoload) any additional functionality. The POSIX module -happens to do both dynamic loading and autoloading, but the user can -say just C<use POSIX> to get it all. - -For more information on writing extension modules, see L<perlxstut> -and L<perlguts>. +Perl modules always have a F<.pm> file, but there may also be +dynamically linked executables (often ending in F<.so>) or autoloaded +subroutine definitions (often ending in F<.al> associated with the +module. If so, these will be entirely transparent to the user of +the module. It is the responsibility of the F<.pm> file to load +(or arrange to autoload) any additional functionality. For example, +although the POSIX module happens to do both dynamic loading and +autoloading, but the user can say just C<use POSIX> to get it all. =head1 SEE ALSO See L<perlmodlib> for general style issues related to building Perl -modules and classes as well as descriptions of the standard library and -CPAN, L<Exporter> for how Perl's standard import/export mechanism works, -L<perltoot> for an in-depth tutorial on creating classes, L<perlobj> -for a hard-core reference document on objects, and L<perlsub> for an -explanation of functions and scoping. +modules and classes, as well as descriptions of the standard library +and CPAN, L<Exporter> for how Perl's standard import/export mechanism +works, L<perltoot> and L<perltootc> for an in-depth tutorial on +creating classes, L<perlobj> for a hard-core reference document on +objects, L<perlsub> for an explanation of functions and scoping, +and L<perlxstut> and L<perlguts> for more information on writing +extension modules. diff --git a/pod/perlmodinstall.pod b/pod/perlmodinstall.pod index b6176f0927..4076254c62 100644 --- a/pod/perlmodinstall.pod +++ b/pod/perlmodinstall.pod @@ -5,21 +5,23 @@ perlmodinstall - Installing CPAN Modules =head1 DESCRIPTION You can think of a module as the fundamental unit of reusable Perl -code; see L<perlmod> for details. Whenever anyone creates a chunk of -Perl code that they think will be useful to the world, they register -as a Perl developer at http://www.perl.com/CPAN/modules/04pause.html -so that they can then upload their code to the CPAN. The CPAN is the -Comprehensive Perl Archive Network and can be accessed at -http://www.perl.com/CPAN/. +code; See L<perlmod> for details. Whenever anyone creates a chunk +of Perl code that they think will be useful to the world, they +register as a Perl developer at +http://www.perl.com/CPAN/modules/04pause.html so that they can then +upload their code to CPAN. CPAN is the Comprehensive Perl Archive +Network and can be accessed at http://www.perl.com/CPAN/, or searched +via http://cpan.perl.com/ and +http://theory.uwinnipeg.ca/mod_perl/cpan-search.pl . This documentation is for people who want to download CPAN modules and install them on their own computer. =head2 PREAMBLE -You have a file ending in .tar.gz (or, less often, .zip). You know -there's a tasty module inside. There are four steps you must now -take: +You have a file ending in F<.tar.gz> (or, less often, F<.zip>). +You know there's a tasty module inside. You must now take four +steps: =over 5 @@ -44,8 +46,8 @@ say C<perl Makefile.PL>, you can substitute C<perl Makefile.PL PREFIX=/my/perl_directory> to install the modules into C</my/perl_directory>. Then you can use the modules from your Perl programs with C<use lib -"/my/perl_directory/lib/site_perl";> or sometimes just C<use -"/my/perl_directory";>. +"/my/perl_directory/lib/site_perl"> or sometimes just C<use +"/my/perl_directory">. =over 4 @@ -54,7 +56,8 @@ from your Perl programs with C<use lib B<If you're on Unix,> You can use Andreas Koenig's CPAN module -( http://www.perl.com/CPAN/modules/by-module/CPAN ) +(which comes standard with Perl, or can itself be downloaded +from http://www.perl.com/CPAN/modules/by-module/CPAN) to automate the following steps, from DECOMPRESS through INSTALL. A. DECOMPRESS @@ -85,12 +88,12 @@ While still in that directory, type: make install -Make sure you have the appropriate permissions to install the module +Make sure you have appropriate permissions to install the module in your Perl 5 library directory. Often, you'll need to be root. That's all you need to do on Unix systems with dynamic linking. -Most Unix systems have dynamic linking -- if yours doesn't, or if for -another reason you have a statically-linked perl, B<and> the +Most Unix systems have dynamic linking--if yours doesn't, or if for +another reason you have a statically-linked perl, I<and> the module requires compilation, you'll need to build a new Perl binary that includes the module. Again, you'll probably need to be root. @@ -100,7 +103,7 @@ B<If you're running Windows 95 or NT with the ActiveState port of Perl> A. DECOMPRESS -You can use the shareware Winzip ( http://www.winzip.com ) to +You can use the shareware B<Winzip> program ( http://www.winzip.com ) to decompress and unpack modules. B. UNPACK @@ -112,7 +115,7 @@ If you used WinZip, this was already done for you. Does the module require compilation (i.e. does it have files that end in .xs, .c, .h, .y, .cc, .cxx, or .C)? If it does, you're on your own. You can try compiling it yourself if you have a C compiler. -If you're successful, consider uploading the resulting binary to the +If you're successful, consider uploading the resulting binary to CPAN for others to use. If it doesn't, go to INSTALL. D. INSTALL @@ -129,11 +132,11 @@ B<If you're running Windows 95 or NT with the core Windows distribution of Perl, A. DECOMPRESS When you download the module, make sure it ends in either -C<.tar.gz> or C<.zip>. Windows browsers sometimes +F<.tar.gz> or F<.zip>. Windows browsers sometimes download C<.tar.gz> files as C<_tar.tar>, because early versions of Windows prohibited more than one dot in a filename. -You can use the shareware WinZip ( http://www.winzip.com ) to +You can use the shareware B<WinZip> program ( http://www.winzip.com ) to decompress and unpack modules. Or, you can use InfoZip's C<unzip> utility ( @@ -151,7 +154,7 @@ UNPACK your module as well. B. UNPACK -All of the methods in DECOMPRESS will have done this for you. +The methods in DECOMPRESS will have done this for you. C. BUILD @@ -185,18 +188,18 @@ Specificly the "Commpress & Translate" listing ( http://hyperarchive.lcs.mit.edu/HyperArchive/Abstracts/cmp/HyperArchive.html ). -You can either use the shareware StuffIt Expander +You can either use the shareware B<StuffIt Expander> program ( http://hyperarchive.lcs.mit.edu/HyperArchive/Archive/cmp/stuffit-expander-401.hqx ) in combination with I<DropStuff with Expander Enhancer> ( http://hyperarchive.lcs.mit.edu/HyperArchive/Archive/cmp/drop-stuff-with-ee-40.hqx ) -or the freeware MacGzip ( +or the freeware B<MacGzip> program ( http://persephone.cps.unizar.es/general/gente/spd/gzip/gzip.html ). B. UNPACK If you're using DropStuff or Stuffit, you can just extract the tar -archive. Otherwise, you can use the freeware I<suntar> +archive. Otherwise, you can use the freeware B<suntar> ( http://hyperarchive.lcs.mit.edu/HyperArchive/Archive/cmp/suntar-221.hqx ) or I<Tar> ( http://hyperarchive.lcs.mit.edu/HyperArchive/Archive/cmp/tar-40b.hqx ). @@ -208,9 +211,9 @@ Does the module require compilation? Overview: You need MPW and a combination of new and old CodeWarrior compilers for MPW and libraries. Makefiles created for building under -MPW use the Metrowerks compilers. It's most likely possible to build +MPW use Metrowerks compilers. It's most likely possible to build without other compilers, but it has not been done successfully, to our -knowledge. Read the documentation in MacPerl: Power and Ease ( +knowledge. Read the documentation in I<MacPerl: Power and Ease> ( http://www.ptf.com/macperl/ ) on porting/building extensions, or find an existing precompiled binary, or hire someone to build it for you. @@ -226,9 +229,10 @@ Make sure the newlines for the modules are in Mac format, not Unix format. If they are not then you might have decompressed them incorrectly. Check your decompression and unpacking utilities settings to make sure they are translating text files properly. -As a last resort, you can use the perl one-liner: - perl -i.bak -pe 's/(?:\015)?\012/\015/g' filenames +As a last resort, you can use the perl one-liner: + + perl -i.bak -pe 's/(?:\015)?\012/\015/g' <filenames> on the source files. @@ -275,7 +279,7 @@ Go into the newly-created directory and type: make make test -You will need the packages mentioned in C<Readme.dos> +You will need the packages mentioned in F<README.dos> in the Perl distribution. D. INSTALL @@ -284,7 +288,7 @@ While still in that directory, type: make install -You will need the packages mentioned in Readme.dos in the Perl distribution. +You will need the packages mentioned in F<README.dos> in the Perl distribution. =item * @@ -298,8 +302,8 @@ the instructions for Unix. B<If you're on VMS,> -When downloading from CPAN, save your file with a C<.tgz> -extension instead of C<.tar.gz>. All other periods in the +When downloading from CPAN, save your file with a F<.tgz> +extension instead of F<.tar.gz>. All other periods in the filename should be replaced with underscores. For example, C<Your-Module-1.33.tar.gz> should be downloaded as C<Your-Module-1_33.tgz>. @@ -361,7 +365,7 @@ Substitute C<mmk> for C<mms> above if you're using MMK. B<If you're on MVS>, -Introduce the .tar.gz file into an HFS as binary; don't translate from +Introduce the F<.tar.gz> file into an HFS as binary; don't translate from ASCII to EBCDIC. A. DECOMPRESS diff --git a/pod/perlmodlib.pod b/pod/perlmodlib.pod index 2dc38dfd80..4cee4556b6 100644 --- a/pod/perlmodlib.pod +++ b/pod/perlmodlib.pod @@ -6,54 +6,76 @@ perlmodlib - constructing new Perl modules and finding existing ones =head1 THE PERL MODULE LIBRARY -A number of modules are included the Perl distribution. These are -described below, and all end in F<.pm>. You may also discover files in -the library directory that end in either F<.pl> or F<.ph>. These are old -libraries supplied so that old programs that use them still run. The -F<.pl> files will all eventually be converted into standard modules, and -the F<.ph> files made by B<h2ph> will probably end up as extension modules -made by B<h2xs>. (Some F<.ph> values may already be available through the -POSIX module.) The B<pl2pm> file in the distribution may help in your -conversion, but it's just a mechanical process and therefore far from -bulletproof. +Many modules are included the Perl distribution. These are described +below, and all end in F<.pm>. You may discover compiled library +file (usually ending in F<.so>) or small pieces of modules to be +autoloaded (ending in F<.al>); these were automatically generated +by the installation process. You may also discover files in the +library directory that end in either F<.pl> or F<.ph>. These are +old libraries supplied so that old programs that use them still +run. The F<.pl> files will all eventually be converted into standard +modules, and the F<.ph> files made by B<h2ph> will probably end up +as extension modules made by B<h2xs>. (Some F<.ph> values may +already be available through the POSIX, Errno, or Fcntl modules.) +The B<pl2pm> file in the distribution may help in your conversion, +but it's just a mechanical process and therefore far from bulletproof. =head2 Pragmatic Modules -They work somewhat like pragmas in that they tend to affect the compilation of -your program, and thus will usually work well only when used within a -C<use>, or C<no>. Most of these are lexically scoped, so an inner BLOCK -may countermand any of these by saying: +They work somewhat like compiler directives (pragmata) in that they +tend to affect the compilation of your program, and thus will usually +work well only when used within a C<use>, or C<no>. Most of these +are lexically scoped, so an inner BLOCK may countermand them +by saying: no integer; no strict 'refs'; which lasts until the end of that BLOCK. -Unlike the pragmas that effect the C<$^H> hints variable, the C<use -vars> and C<use subs> declarations are not BLOCK-scoped. They allow -you to predeclare a variables or subroutines within a particular -I<file> rather than just a block. Such declarations are effective -for the entire file for which they were declared. You cannot rescind -them with C<no vars> or C<no subs>. +Some pragmas are lexically scoped--typically those that affect the +C<$^H> hints variable. Others affect the current package instead, +like C<use vars> and C<use subs>, whic allow you to predeclare a +variables or subroutines within a particular I<file> rather than +just a block. Such declarations are effective for the entire file +for which they were declared. You cannot rescind them with C<no +vars> or C<no subs>. The following pragmas are defined (and have their own documentation). =over 12 -=item use autouse MODULE => qw(sub1 sub2 sub3) +=item attrs -Defers C<require MODULE> until someone calls one of the specified -subroutines (which must be exported by MODULE). This pragma should be -used with caution, and only when necessary. +set/get attributes of a subroutine + +=item autouse + +postpone load of modules until a function is used + +=item base + +Establish IS-A relationship with base class at compile time =item blib -manipulate @INC at compile time to use MakeMaker's uninstalled version -of a package +Use MakeMaker's uninstalled version of a package + +=item constant + +declare constants =item diagnostics -force verbose warning diagnostics +Perl compiler pragma to force verbose warning diagnostics + +=item fields + +compile-time class fields + +=item filetest + +control the filetest permission operators =item integer @@ -61,7 +83,7 @@ compute arithmetic in integer instead of double =item less -request less of something from the compiler +perl pragma to request less of something from the compiler =item lib @@ -69,19 +91,19 @@ manipulate @INC at compile time =item locale -use or ignore current locale for builtin operations (see L<perllocale>) +use and avoid POSIX locales for built-in operations =item ops -restrict named opcodes when compiling or running Perl code +restrict unsafe operations when compiling =item overload -overload basic Perl operations +Package for overloading perl operations =item re -alter behaviour of regular expressions +alter regular expression behavior =item sigtrap @@ -95,14 +117,22 @@ restrict unsafe constructs predeclare sub names -=item vmsish +=item utf8 -adopt certain VMS-specific behaviors +turn on UTF-8 and Unicode support =item vars predeclare global variable names +=item vmsish + +control VMS-specific language features + +=item warning + +control optional warnings + =back =head2 Standard Modules @@ -119,27 +149,115 @@ provide framework for multiple DBMs =item AutoLoader -load functions only on demand +load subroutines only on demand =item AutoSplit split a package for autoloading +=item B + +The Perl Compiler; See also L<perlcc>. + +=item B::Asmdata + +Autogenerated data about Perl ops, used to generate bytecode + +=item B::Assembler + +Assemble Perl bytecode + +=item B::Bblock + +Walk basic blocks + +=item B::Bytecode + +Perl compiler's bytecode backend + +=item B::C + +Perl compiler's C backend + +=item B::CC + +Perl compiler's optimized C translation backend + +=item B::Debug + +Walk Perl syntax tree, printing debug info about ops + +=item B::Deparse + +Perl compiler backend to produce perl code + +=item B::Disassembler + +Disassemble Perl bytecode + +=item B::Lint + +Perl lint + +=item B::Showlex + +Show lexical variables used in functions or files + +=item B::Stackobj + +Helper module for CC backend + +=item B::Terse + +Walk Perl syntax tree, printing terse info about ops + +=item B::Xref + +Generates cross reference reports for Perl programs + =item Benchmark benchmark running times of code +=item CGI + +Simple Common Gateway Interface Class + +=item CGI::Apache + +Make things work with CGI.pm against Perl-Apache API + +=item CGI::Carp + +CGI routines for writing to the HTTPD (or other) error log + +=item CGI::Cookie + +Interface to Netscape Cookies + +=item CGI::Fast + +CGI Interface for Fast CGI + +=item CGI::Push + +Simple Interface to Server Push + +=item CGI::Switch + +Try more than one constructors and return the first object available + =item CPAN -interface to Comprehensive Perl Archive Network +query, download and build perl modules from CPAN sites =item CPAN::FirstTime -create a CPAN configuration file +Utility for CPAN::Config file Initialization =item CPAN::Nox -run CPAN while avoiding compiled extensions +Wrapper around CPAN.pm without using any XS module =item Carp @@ -147,7 +265,7 @@ warn of errors (from perspective of caller) =item Class::Struct -declare struct-like datatypes +declare struct-like datatypes as Perl classes =item Config @@ -157,13 +275,21 @@ access Perl configuration information get pathname of current working directory +=item DB + +programmatic interface to the Perl debugging API + =item DB_File -access to Berkeley DB +Perl5 access to Berkeley DB version 1.x + +=item Data::Dumper + +stringified perl data structures, suitable for both printing and C<eval> =item Devel::Peek -data debugging tool for the XS programmer +A data debugging tool for the XS programmer =item Devel::SelfStubber @@ -173,9 +299,13 @@ generate stubs for a SelfLoading module supply object methods for directory handles +=item Dumpvalue + +provides screen dump of Perl data. + =item DynaLoader -dynamically load C libraries into Perl code +Dynamically load C libraries into Perl code =item English @@ -183,27 +313,39 @@ use nice English (or awk) names for ugly punctuation variables =item Env -import environment variables +perl module that imports environment variables + +=item Errno + +System errno constants =item Exporter -implements default import method for modules +Implements default import method for modules + +=item ExtUtils::Command + +utilities to replace common UNIX commands in Makefiles etc. =item ExtUtils::Embed -utilities for embedding Perl in C/C++ applications +Utilities for embedding Perl in C/C++ applications =item ExtUtils::Install install files from here to there +=item ExtUtils::Installed + +Inventory management of installed modules + =item ExtUtils::Liblist determine libraries to use and how to use them =item ExtUtils::MM_OS2 -methods to override Unix behaviour in ExtUtils::MakeMaker +methods to override UN*X behavior in ExtUtils::MakeMaker =item ExtUtils::MM_Unix @@ -211,7 +353,11 @@ methods used by ExtUtils::MakeMaker =item ExtUtils::MM_VMS -methods to override Unix behaviour in ExtUtils::MakeMaker +methods to override UN*X behavior in ExtUtils::MakeMaker + +=item ExtUtils::MM_Win32 + +methods to override UN*X behavior in ExtUtils::MakeMaker =item ExtUtils::MakeMaker @@ -221,6 +367,10 @@ create an extension Makefile utilities to write and check a MANIFEST file +=item ExtUtils::Miniperl + +write the C code for perlmain.c + =item ExtUtils::Mkbootstrap make a bootstrap file for use by DynaLoader @@ -229,13 +379,17 @@ make a bootstrap file for use by DynaLoader write linker options files for dynamic extension +=item ExtUtils::Packlist + +manage .packlist files + =item ExtUtils::testlib add blib/* directories to @INC =item Fatal -make errors in builtins or Perl functions fatal +replace functions with equivalents which succeed or die =item Fcntl @@ -245,17 +399,17 @@ load the C Fcntl.h defines split a pathname into pieces -=item File::CheckTree - -run many filetest checks on a tree - =item File::Compare -compare files or filehandles +Compare files or filehandles =item File::Copy -copy files or filehandles +Copy files or filehandles + +=item File::DosGlob + +DOS like globbing and then some =item File::Find @@ -271,11 +425,31 @@ portably perform operations on file names =item File::Spec::Functions -function call interface to File::Spec module +portably perform operations on file names + +=item File::Spec::Mac + +File::Spec for MacOS + +=item File::Spec::OS2 + +methods for OS/2 file specs + +=item File::Spec::Unix + +methods used by File::Spec + +=item File::Spec::VMS + +methods for VMS file specs + +=item File::Spec::Win32 + +methods for Win32 file specs =item File::stat -by-name interface to Perl's builtin stat() functions +by-name interface to Perl's built-in stat() functions =item FileCache @@ -287,11 +461,11 @@ supply object methods for filehandles =item FindBin -locate directory of original Perl script +Locate directory of original perl script =item GDBM_File -access to the gdbm library +Perl5 access to the gdbm library. =item Getopt::Long @@ -299,7 +473,7 @@ extended processing of command line options =item Getopt::Std -process single-character switches with switch clustering +Process single-character switches with switch clustering =item I18N::Collate @@ -309,6 +483,10 @@ compare 8-bit scalar data according to the current locale load various IO modules +=item IO::Dir + +supply object methods for directory handles + =item IO::File supply object methods for filehandles @@ -321,6 +499,10 @@ supply object methods for I/O handles supply object methods for pipes +=item IO::Poll + +Object interface to system poll call + =item IO::Seekable supply seek based methods for I/O objects @@ -331,7 +513,19 @@ OO interface to the select system call =item IO::Socket -object interface to socket communications +Object interface to socket communications + +=item IO::Socket::INET + +Object interface for AF_INET domain sockets + +=item IO::Socket::UNIX + +Object interface for AF_UNIX domain sockets + +=item IPC::Msg + +SysV Msg IPC object class =item IPC::Open2 @@ -341,13 +535,21 @@ open a process for both reading and writing open a process for reading, writing, and error handling +=item IPC::Semaphore + +SysV Semaphore IPC object class + +=item IPC::SysV + +SysV IPC constants + =item Math::BigFloat -arbitrary length float math package +Arbitrary length float math package =item Math::BigInt -arbitrary size integer math package +Arbitrary size integer math package =item Math::Complex @@ -355,52 +557,59 @@ complex numbers and associated mathematical functions =item Math::Trig -simple interface to parts of Math::Complex for those who -need trigonometric functions only for real numbers +trigonometric functions =item NDBM_File -tied access to ndbm files +Tied access to ndbm files =item Net::Ping -Hello, anybody home? +check a remote host for reachability =item Net::hostent -by-name interface to Perl's builtin gethost*() functions +by-name interface to Perl's built-in gethost*() functions =item Net::netent -by-name interface to Perl's builtin getnet*() functions +by-name interface to Perl's built-in getnet*() functions =item Net::protoent -by-name interface to Perl's builtin getproto*() functions +by-name interface to Perl's built-in getproto*() functions =item Net::servent -by-name interface to Perl's builtin getserv*() functions +by-name interface to Perl's built-in getserv*() functions -=item Opcode +=item O -disable named opcodes when compiling or running Perl code +Generic interface to Perl Compiler backends -=item Pod::Text +=item Opcode -convert POD data to formatted ASCII text +Disable named opcodes when compiling perl code =item POSIX -interface to IEEE Standard 1003.1 +Perl interface to IEEE Std 1003.1 + +=item Pod::Html + +module to convert pod files to HTML + +=item Pod::Text + +convert POD data to formatted ASCII text =item SDBM_File -tied access to sdbm files +Tied access to sdbm files =item Safe -compile and execute code in restricted compartments +Compile and execute code in restricted compartments =item Search::Dict @@ -416,7 +625,7 @@ load functions only on demand =item Shell -run shell commands transparently within Perl +run shell commands transparently within perl =item Socket @@ -428,27 +637,31 @@ manipulate Perl symbols and their names =item Sys::Hostname -try every conceivable way to get hostname +Try every conceivable way to get hostname =item Sys::Syslog -interface to the Unix syslog(3) calls +Perl interface to the UNIX syslog(3) calls =item Term::Cap -termcap interface +Perl termcap interface =item Term::Complete -word completion module +Perl word completion module =item Term::ReadLine -interface to various C<readline> packages +Perl interface to various C<readline> packages. + +=item Test + +provides a simple framework for writing test scripts =item Test::Harness -run Perl standard test scripts with statistics +run perl standard test scripts with statistics =item Text::Abbrev @@ -456,35 +669,61 @@ create an abbreviation table from a list =item Text::ParseWords -parse text into an array of tokens +parse text into an array of tokens or array of arrays =item Text::Soundex -implementation of the Soundex Algorithm as described by Knuth - -=item Text::Tabs +Implementation of the Soundex Algorithm as Described by Knuth -expand and unexpand tabs per the Unix expand(1) and unexpand(1) +=item Text::Tabs -- expand and unexpand tabs per the unix expand(1) and unexpand(1) =item Text::Wrap line wrapping to form simple paragraphs -=item Tie::Hash +=item Thread + +multithreading + +=item Thread::Queue + +thread-safe queues + +=item Thread::Semaphore + +thread-safe semaphores + +=item Thread::Signal + +Start a thread which runs signal handlers reliably + +=item Thread::Specific + +thread-specific keys + +=item Tie::Array + +base class for tied arrays + +=item Tie::Handle + +base class definitions for tied handles + +=item Tie::Hash, Tie::StdHash base class definitions for tied hashes =item Tie::RefHash -base class definitions for tied hashes with references as keys +use references as hash keys -=item Tie::Scalar +=item Tie::Scalar, Tie::StdScalar base class definitions for tied scalars =item Tie::SubstrHash -fixed-table-size, fixed-key-length hashing +Fixed-table-size, fixed-key-length hashing =item Time::Local @@ -492,11 +731,11 @@ efficiently compute time from local and GMT time =item Time::gmtime -by-name interface to Perl's builtin gmtime() function +by-name interface to Perl's built-in gmtime() function =item Time::localtime -by-name interface to Perl's builtin localtime() function +by-name interface to Perl's built-in localtime() function =item Time::tm @@ -508,42 +747,54 @@ base class for ALL classes (blessed references) =item User::grent -by-name interface to Perl's builtin getgr*() functions +by-name interface to Perl's built-in getgr*() functions =item User::pwent -by-name interface to Perl's builtin getpw*() functions +by-name interface to Perl's built-in getpw*() functions =back -To find out I<all> the modules installed on your system, including -those without documentation or outside the standard release, do this: +To find out I<all> modules installed on your system, including +those without documentation or outside the standard release, +jus tdo this: % find `perl -e 'print "@INC"'` -name '*.pm' -print -They should all have their own documentation installed and accessible via -your system man(1) command. If that fails, try the I<perldoc> program. +They should all have their own documentation installed and accessible +via your system man(1) command. If you do not have a B<find> +program, you can use the Perl B<find2perl> program instead, which +generates Perl code as output you can run through perl. If you +have a B<man> program but it doesn't find your modules, you'll have +to fix your manpath. See L<perl> for details. If you have no +system B<man> command, you might try the B<perldoc> program. =head2 Extension Modules -Extension modules are written in C (or a mix of Perl and C) and may be -statically linked or in general are -dynamically loaded into Perl if and when you need them. Supported -extension modules include the Socket, Fcntl, and POSIX modules. +Extension modules are written in C (or a mix of Perl and C). They +are usually dynamically loaded into Perl if and when you need them, +but may also be be linked in statically. Supported extension modules +include Socket, Fcntl, and POSIX. Many popular C extension modules do not come bundled (at least, not -completely) due to their sizes, volatility, or simply lack of time for -adequate testing and configuration across the multitude of platforms on -which Perl was beta-tested. You are encouraged to look for them in -archie(1L), the Perl FAQ or Meta-FAQ, the WWW page, and even with their -authors before randomly posting asking for their present condition and -disposition. +completely) due to their sizes, volatility, or simply lack of time +for adequate testing and configuration across the multitude of +platforms on which Perl was beta-tested. You are encouraged to +look for them on CPAN (described below), or using web search engines +like Alta Vista or Deja News. =head1 CPAN -CPAN stands for the Comprehensive Perl Archive Network. This is a globally -replicated collection of all known Perl materials, including hundreds -of unbundled modules. Here are the major categories of modules: +CPAN stands for Comprehensive Perl Archive Network; it's a globally +replicated trove of Perl materials, including documentation, style +guides, tricks and trap, alternate ports to non-Unix systems and +occasional binary distributions for these. Search engines for +CPAN can be found at http://cpan.perl.com/ and at +http://theory.uwinnipeg.ca/mod_perl/cpan-search.pl . + +Most importantly, CPAN includes around a thousand unbundled modules, +some of which require a C compiler to build. Major categories of +modules are: =over @@ -612,21 +863,18 @@ Miscellaneous Modules =back -The registered CPAN sites as of this writing include the following. +Registered CPAN sites as of this writing include the following. You should try to choose one close to you: =over -=item * -Africa +=item Africa South Africa ftp://ftp.is.co.za/programming/perl/CPAN/ ftp://ftpza.co.za/pub/mirrors/cpan/ -=item * -Asia +=item Asia - Armenia ftp://sunsite.aua.am/pub/CPAN/ China ftp://freesoft.cei.gov.cn/pub/languages/perl/CPAN/ Hong Kong ftp://ftp.hkstar.com/pub/CPAN/ Israel ftp://bioinfo.weizmann.ac.il/pub/software/perl/CPAN/ @@ -634,6 +882,7 @@ Asia ftp://ftp.jaist.ac.jp/pub/lang/perl/CPAN/ ftp://ftp.lab.kdd.co.jp/lang/perl/CPAN/ ftp://ftp.meisei-u.ac.jp/pub/CPAN/ + ftp://ftp.ring.gr.jp/pub/lang/perl/CPAN/ ftp://mirror.nucba.ac.jp/mirror/Perl/ Singapore ftp://ftp.nus.edu.sg/pub/unix/perl/CPAN/ South Korea ftp://ftp.bora.net/pub/CPAN/ @@ -643,8 +892,7 @@ Asia Thailand ftp://ftp.cs.riubon.ac.th/pub/mirrors/CPAN/ ftp://ftp.nectec.or.th/pub/mirrors/CPAN/ -=item * -Australasia +=item Australasia Australia ftp://cpan.topend.com.au/pub/CPAN/ ftp://ftp.labyrinth.net.au/pub/perl/CPAN/ @@ -653,13 +901,11 @@ Australasia New Zealand ftp://ftp.auckland.ac.nz/pub/perl/CPAN/ ftp://sunsite.net.nz/pub/languages/perl/CPAN/ -=item * Central America Costa Rica ftp://ftp.ucr.ac.cr/pub/Unix/CPAN/ -=item * -Europe +=item Europe Austria ftp://ftp.tuwien.ac.at/pub/languages/perl/CPAN/ Belgium ftp://ftp.kulnet.kuleuven.ac.be/pub/mirror/CPAN/ @@ -686,8 +932,10 @@ Europe Ireland ftp://sunsite.compapp.dcu.ie/pub/perl/ Italy ftp://cis.uniRoma2.it/CPAN/ ftp://ftp.flashnet.it/pub/CPAN/ + ftp://ftp.unina.it/pub/Other/CPAN/ ftp://ftp.unipi.it/pub/mirror/perl/CPAN/ Netherlands ftp://ftp.cs.uu.nl/mirror/CPAN/ + ftp://ftp.EU.net/packages/cpan/ ftp://ftp.nluug.nl/pub/languages/perl/CPAN/ Norway ftp://ftp.uit.no/pub/languages/perl/cpan/ ftp://sunsite.uio.no/pub/languages/perl/CPAN/ @@ -696,10 +944,11 @@ Europe ftp://ftp.pk.edu.pl/pub/lang/perl/CPAN/ ftp://sunsite.icm.edu.pl/pub/CPAN/ Portugal ftp://ftp.ci.uminho.pt/pub/mirrors/cpan/ + ftp://ftp.ist.utl.pt/pub/CPAN/ ftp://ftp.ua.pt/pub/CPAN/ Romania ftp://ftp.dntis.ro/pub/mirrors/perl-cpan/ ftp://ftp.dnttm.ro/pub/CPAN/ - Russia ftp://cpan.npi.msu.su/CPAN/ + Russia ftp://ftp.chg.ru/pub/lang/perl/CPAN/ ftp://ftp.sai.msu.su/pub/lang/perl/CPAN/ Slovakia ftp://ftp.entry.sk/pub/languages/perl/CPAN/ Slovenia ftp://ftp.arnes.si/software/perl/CPAN/ @@ -714,11 +963,11 @@ Europe ftp://sunsite.doc.ic.ac.uk/packages/CPAN/ ftp://unix.hensa.ac.uk/mirrors/perl-CPAN/ -=item * -North America +=item North America Alberta ftp://sunsite.ualberta.ca/pub/Mirror/CPAN/ - California ftp://ftp.cdrom.com/pub/perl/CPAN/ + California ftp://cpan.nas.nasa.gov/pub/perl/CPAN/ + ftp://ftp.cdrom.com/pub/perl/CPAN/ ftp://ftp.digital.com/pub/plan/perl/CPAN/ Colorado ftp://ftp.cs.colorado.edu/pub/perl/CPAN/ Florida ftp://ftp.cise.ufl.edu/pub/perl/CPAN/ @@ -728,30 +977,30 @@ North America Manitoba ftp://theory.uwinnipeg.ca/pub/CPAN/ Massachusetts ftp://ftp.ccs.neu.edu/net/mirrors/ftp.funet.fi/pub/languages/perl/CPAN/ ftp://ftp.iguide.com/pub/mirrors/packages/perl/CPAN/ - Mexico D.F. ftp://ftp.msg.com.mx/pub/CPAN/ + Mexico ftp://ftp.msg.com.mx/pub/CPAN/ + Minnesota ftp://ftp.midearthbbs.com/CPAN/ New York ftp://ftp.rge.com/pub/languages/perl/ North Carolina ftp://ftp.duke.edu/pub/perl/ Oklahoma ftp://ftp.ou.edu/mirrors/CPAN/ - Ontario ftp://ftp.crc.ca/pub/packages/perl/CPAN/ + Ontario ftp://ftp.crc.ca/pub/packages/lang/perl/CPAN/ Oregon ftp://ftp.orst.edu/pub/packages/CPAN/ Pennsylvania ftp://ftp.epix.net/pub/languages/perl/ Texas ftp://ftp.sedl.org/pub/mirrors/CPAN/ Utah ftp://mirror.xmission.com/CPAN/ Virginia ftp://ftp.perl.org/pub/perl/CPAN/ ftp://ruff.cs.jmu.edu/pub/CPAN/ - Washington ftp://ftp.spu.edu/pub/CPAN/ + Washington ftp://ftp-mirror.internap.com/pub/CPAN/ + ftp://ftp.spu.edu/pub/CPAN/ -=item * -South America +=item South America Brazil ftp://cpan.if.usp.br/pub/mirror/CPAN/ - Chile ftp://ftp.ing.puc.cl/pub/unix/perl/CPAN/ - ftp://sunsite.dcc.uchile.cl/pub/Lang/perl/CPAN/ + Chile ftp://sunsite.dcc.uchile.cl/pub/Lang/perl/CPAN/ =back For an up-to-date listing of CPAN sites, -see F<http://www.perl.com/perl/CPAN> or F<ftp://ftp.perl.com/perl/>. +see http://www.perl.com/perl/CPAN or ftp://www.perl.com/perl/ . =head1 Modules: Creation, Use, and Abuse @@ -795,6 +1044,8 @@ scheme as the original author. =item Try to design the new module to be easy to extend and reuse. +Always use B<-w>. + Use blessed references. Use the two argument form of bless to bless into the class name given as the first parameter of the constructor, e.g.,: @@ -819,7 +1070,7 @@ appropriate. Split large methods into smaller more flexible ones. Inherit methods from other modules if appropriate. Avoid class name tests like: C<die "Invalid" unless ref $ref eq 'FOO'>. -Generally you can delete the "C<eq 'FOO'>" part with no harm at all. +Generally you can delete the C<eq 'FOO'> part with no harm at all. Let the objects look after themselves! Generally, avoid hard-wired class names as far as possible. @@ -833,7 +1084,7 @@ the module after __END__ either using AutoSplit or by saying: eval join('',<main::DATA>) || die $@ unless caller(); Does your module pass the 'empty subclass' test? If you say -"C<@SUBCLASS::ISA = qw(YOURCLASS);>" your applications should be able +C<@SUBCLASS::ISA = qw(YOURCLASS);> your applications should be able to use SUBCLASS in exactly the same way as YOURCLASS. For example, does your application still work if you change: C<$obj = new YOURCLASS;> into: C<$obj = new SUBCLASS;> ? @@ -842,11 +1093,18 @@ Avoid keeping any state information in your packages. It makes it difficult for multiple other packages to use yours. Keep state information in objects. -Always use B<-w>. Try to C<use strict;> (or C<use strict qw(...);>). +Always use B<-w>. + +Try to C<use strict;> (or C<use strict qw(...);>). Remember that you can add C<no strict qw(...);> to individual blocks -of code that need less strictness. Always use B<-w>. Always use B<-w>! +of code that need less strictness. + +Always use B<-w>. + Follow the guidelines in the perlstyle(1) manual. +Always use B<-w>. + =item Some simple style guidelines The perlstyle manual supplied with Perl has many helpful points. @@ -1016,7 +1274,7 @@ should store your module's version number in a non-my package variable called $VERSION. This should be a floating point number with at least two digits after the decimal (i.e., hundredths, e.g, C<$VERSION = "0.01">). Don't use a "1.3.2" style version. -See Exporter.pm in Perl5.001m or later for details. +See L<Exporter> for details. It may be handy to add a function or method to retrieve the number. Use the number in announcements and archive file names when @@ -1030,7 +1288,7 @@ module (or the module itself if small) to the comp.lang.perl.announce Usenet newsgroup. This will at least ensure very wide once-off distribution. -If possible you should place the module into a major ftp archive and +If possible, register the module with CPAN. You should include details of its location in your announcement. Some notes about ftp archives: Please use a long descriptive file @@ -1065,7 +1323,7 @@ Please remember to send me an updated entry for the Module list! Always strive to remain compatible with previous released versions. Otherwise try to add a mechanism to revert to the -old behaviour if people rely on it. Document incompatible changes. +old behavior if people rely on it. Document incompatible changes. =back @@ -1091,8 +1349,8 @@ it worth it unless you plan to make other changes at the same time? =item Make the most of the opportunity. If you are going to convert the script to a module you can use the -opportunity to redesign the interface. The 'Guidelines for Module -Creation' above include many of the issues you should consider. +opportunity to redesign the interface. The guidelines for module +creation above include many of the issues you should consider. =item The pl2pm utility will get you started. diff --git a/pod/perlobj.pod b/pod/perlobj.pod index a997ae0de3..21073a795a 100644 --- a/pod/perlobj.pod +++ b/pod/perlobj.pod @@ -4,10 +4,10 @@ perlobj - Perl objects =head1 DESCRIPTION -First of all, you need to understand what references are in Perl. +First you need to understand what references are in Perl. See L<perlref> for that. Second, if you still find the following reference work too complicated, a tutorial on object-oriented programming -in Perl can be found in L<perltoot>. +in Perl can be found in L<perltoot> and L<perltootc>. If you're still with us, then here are three very simple definitions that you should find reassuring. @@ -50,7 +50,7 @@ a construct this way, too: package Critter; sub spawn { bless {} } -In fact, this might even be preferable, because the C++ programmers won't +This might even be preferable, because the C++ programmers won't be tricked into thinking that C<new> works in Perl as it does in C++. It doesn't. We recommend that you name your constructors whatever makes sense in the context of the problem you're solving. For example, @@ -73,7 +73,7 @@ have been returned directly, like this: return $self; } -In fact, you often see such a thing in more complicated constructors +You often see such a thing in more complicated constructors that wish to call methods in the class as part of the construction: sub new { @@ -115,12 +115,13 @@ reference as an ordinary reference. Outside the class package, the reference is generally treated as an opaque value that may be accessed only through the class's methods. -A constructor may re-bless a referenced object currently belonging to -another class, but then the new class is responsible for all cleanup -later. The previous blessing is forgotten, as an object may belong -to only one class at a time. (Although of course it's free to -inherit methods from many classes.) If you find yourself having to -do this, the parent class is probably misbehaving, though. +Although a constructor can in theory re-bless a referenced object +currently belonging to another class, this is almost certainly going +to get you into trouble. The new class is responsible for all +cleanup later. The previous blessing is forgotten, as an object +may belong to only one class at a time. (Although of course it's +free to inherit methods from many classes.) If you find yourself +having to do this, the parent class is probably misbehaving, though. A clarification: Perl objects are blessed. References are not. Objects know which package they belong to. References do not. The bless() @@ -154,7 +155,7 @@ last base class. Several commonly used methods are automatically supplied in the UNIVERSAL class; see L<"Default UNIVERSAL methods"> for more details. -If a missing method is found in one of the base classes, it is cached +If a missing method is found in a base class, it is cached in the current class for efficiency. Changing @ISA or defining new subroutines invalidates the cache and causes Perl to do the lookup again. @@ -186,16 +187,16 @@ is to prepend your fieldname in the hash with the package name. Unlike say C++, Perl doesn't provide any special syntax for method definition. (It does provide a little syntax for method invocation though. More on that later.) A method expects its first argument -to be the object (reference) or package (string) it is being invoked on. There are just two -types of methods, which we'll call class and instance. -(Sometimes you'll hear these called static and virtual, in honor of -the two C++ method types they most closely resemble.) +to be the object (reference) or package (string) it is being invoked +on. There are two ways of calling methods, which we'll call class +methods and instance methods. A class method expects a class name as the first argument. It -provides functionality for the class as a whole, not for any individual -object belonging to the class. Constructors are typically class -methods. Many class methods simply ignore their first argument, because -they already know what package they're in, and don't care what package +provides functionality for the class as a whole, not for any +individual object belonging to the class. Constructors are often +class methods, but see L<perltoot> and L<perltootc> for alternatives. +Many class methods simply ignore their first argument, because they +already know what package they're in and don't care what package they were invoked via. (These aren't necessarily the same, because class methods follow the inheritance tree just like ordinary instance methods.) Another typical use for class methods is to look up an @@ -284,13 +285,13 @@ For more reasons why the indirect object syntax is ambiguous, see L<"WARNING"> below. There are times when you wish to specify which class's method to use. -In this case, you can call your method as an ordinary subroutine +Here you can call your method as an ordinary subroutine call, being sure to pass the requisite first argument explicitly: $fred = MyCritter::find("Critter", "Fred"); MyCritter::display($fred, 'Height', 'Weight'); -Note however, that this does not do any inheritance. If you wish +Unlike method calls, function calls don't consider inheritance. If you wish merely to specify that Perl should I<START> looking for a method in a particular package, use an ordinary method call, but qualify the method name with the package like this: @@ -310,10 +311,59 @@ class. Sometimes you want to call a method when you don't know the method name ahead of time. You can use the arrow form, replacing the method name -with a simple scalar variable containing the method name: +with a simple scalar variable containing the method name or a +reference to the function. $method = $fast ? "findfirst" : "findbest"; - $fred->$method(@args); + $fred->$method(@args); # call by name + + if ($coderef = $fred->can($parent . "::findbest")) { + $self->$coderef(@args); # call by coderef + } + +=head2 WARNING + +While indirect object syntax may well be appealing to English speakers and +to C++ programmers, be not seduced! It suffers from two grave problems. + +The first problem is that an indirect object is limited to a name, +a scalar variable, or a block, because it would have to do too much +lookahead otherwise, just like any other postfix dereference in the +language. (These are the same quirky rules as are used for the filehandle +slot in functions like C<print> and C<printf>.) This can lead to horribly +confusing precedence problems, as in these next two lines: + + move $obj->{FIELD}; # probably wrong! + move $ary[$i]; # probably wrong! + +Those actually parse as the very surprising: + + $obj->move->{FIELD}; # Well, lookee here + $ary->move->[$i]; # Didn't expect this one, eh? + +Rather than what you might have expected: + + $obj->{FIELD}->move(); # You should be so lucky. + $ary[$i]->move; # Yeah, sure. + +The left side of ``-E<gt>'' is not so limited, because it's an infix operator, +not a postfix operator. + +As if that weren't bad enough, think about this: Perl must guess I<at +compile time> whether C<name> and C<move> above are functions or methods. +Usually Perl gets it right, but when it doesn't it, you get a function +call compiled as a method, or vice versa. This can introduce subtle +bugs that are hard to unravel. For example, calling a method C<new> +in indirect notation--as C++ programmers are so wont to do--can +be miscompiled into a subroutine call if there's already a C<new> +function in scope. You'd end up calling the current package's C<new> +as a subroutine, rather than the desired class's method. The compiler +tries to cheat by remembering bareword C<require>s, but the grief if it +messes up just isn't worth the years of debugging it would likely take +you to track such subtle bugs down. + +The infix arrow notation using ``C<-E<gt>>'' doesn't suffer from either +of these disturbing ambiguities, so we recommend you use it exclusively. =head2 Default UNIVERSAL methods @@ -361,7 +411,7 @@ C<isa> uses a very similar method and cache-ing strategy. This may cause strange effects if the Perl code dynamically changes @ISA in any package. You may add other methods to the UNIVERSAL class via Perl or XS code. -You do not need to C<use UNIVERSAL> in order to make these methods +You do not need to C<use UNIVERSAL> to make these methods available to your program. This is necessary only if you wish to have C<isa> available as a plain subroutine in the current package. @@ -386,55 +436,11 @@ object destruction, or for ensuring that destructors in the base classes of your choosing get called. Explicitly calling DESTROY is also possible, but is usually never needed. -Do not confuse the foregoing with how objects I<CONTAINED> in the current +Do not confuse the previous discussion with how objects I<CONTAINED> in the current one are destroyed. Such objects will be freed and destroyed automatically when the current object is freed, provided no other references to them exist elsewhere. -=head2 WARNING - -While indirect object syntax may well be appealing to English speakers and -to C++ programmers, be not seduced! It suffers from two grave problems. - -The first problem is that an indirect object is limited to a name, -a scalar variable, or a block, because it would have to do too much -lookahead otherwise, just like any other postfix dereference in the -language. (These are the same quirky rules as are used for the filehandle -slot in functions like C<print> and C<printf>.) This can lead to horribly -confusing precedence problems, as in these next two lines: - - move $obj->{FIELD}; # probably wrong! - move $ary[$i]; # probably wrong! - -Those actually parse as the very surprising: - - $obj->move->{FIELD}; # Well, lookee here - $ary->move->[$i]; # Didn't expect this one, eh? - -Rather than what you might have expected: - - $obj->{FIELD}->move(); # You should be so lucky. - $ary[$i]->move; # Yeah, sure. - -The left side of ``-E<gt>'' is not so limited, because it's an infix operator, -not a postfix operator. - -As if that weren't bad enough, think about this: Perl must guess I<at -compile time> whether C<name> and C<move> above are functions or methods. -Usually Perl gets it right, but when it doesn't it, you get a function -call compiled as a method, or vice versa. This can introduce subtle -bugs that are hard to unravel. For example, calling a method C<new> -in indirect notation--as C++ programmers are so wont to do--can -be miscompiled into a subroutine call if there's already a C<new> -function in scope. You'd end up calling the current package's C<new> -as a subroutine, rather than the desired class's method. The compiler -tries to cheat by remembering bareword C<require>s, but the grief if it -messes up just isn't worth the years of debugging it would likely take -you to to track such subtle bugs down. - -The infix arrow notation using ``C<-E<gt>>'' doesn't suffer from either -of these disturbing ambiguities, so we recommend you use it exclusively. - =head2 Summary That's about all there is to it. Now you need just to go off and buy a @@ -443,8 +449,8 @@ with it for the next six months or so. =head2 Two-Phased Garbage Collection -For most purposes, Perl uses a fast and simple reference-based -garbage collection system. For this reason, there's an extra +For most purposes, Perl uses a fast and simple, reference-based +garbage collection system. That means there's an extra dereference going on at some level, so if you haven't built your Perl executable using your C compiler's C<-O> flag, performance will suffer. If you I<have> built Perl with C<cc -O>, then this @@ -529,8 +535,8 @@ When run as F</tmp/test>, the following output is produced: Notice that "global destruction" bit there? That's the thread garbage collector reaching the unreachable. -Objects are always destructed, even when regular refs aren't and in fact -are destructed in a separate pass before ordinary refs just to try to +Objects are always destructed, even when regular refs aren't. Objects +are destructed in a separate pass before ordinary refs just to prevent object destructors from using refs that have been themselves destructed. Plain refs are only garbage-collected if the destruct level is greater than 0. You can test the higher levels of global destruction @@ -547,8 +553,8 @@ breaks the circularities in the self-referential structure. =head1 SEE ALSO -A kinder, gentler tutorial on object-oriented programming in Perl can -be found in L<perltoot>. -You should also check out L<perlbot> for other object tricks, traps, and tips, -as well as L<perlmodlib> for some style guides on constructing both modules +A kinder, gentler tutorial on object-oriented programming in Perl +can be found in L<perltoot> and L<perltootc>. You should also check +out L<perlbot> for other object tricks, traps, and tips, as well +as L<perlmodlib> for some style guides on constructing both modules and classes. diff --git a/pod/perlop.pod b/pod/perlop.pod index 106b9a9a87..0f8117ced9 100644 --- a/pod/perlop.pod +++ b/pod/perlop.pod @@ -5,11 +5,11 @@ perlop - Perl operators and precedence =head1 SYNOPSIS Perl operators have the following associativity and precedence, -listed from highest precedence to lowest. Note that all operators -borrowed from C keep the same precedence relationship with each other, -even where C's precedence is slightly screwy. (This makes learning -Perl easier for C folks.) With very few exceptions, these all -operate on scalar values only, not array values. +listed from highest precedence to lowest. Operators borrowed from +C keep the same precedence relationship with each other, even where +C's precedence is slightly screwy. (This makes learning Perl easier +for C folks.) With very few exceptions, these all operate on scalar +values only, not array values. left terms and list operators (leftward) left -> @@ -64,11 +64,11 @@ For example, in @ary = (1, 3, sort 4, 2); print @ary; # prints 1324 -the commas on the right of the sort are evaluated before the sort, but -the commas on the left are evaluated after. In other words, list -operators tend to gobble up all the arguments that follow them, and +the commas on the right of the sort are evaluated before the sort, +but the commas on the left are evaluated after. In other words, +list operators tend to gobble up all arguments that follow, and then act like a simple TERM with regard to the preceding expression. -Note that you have to be careful with parentheses: +Be careful with parentheses: # These evaluate exit before doing the print: print($foo, exit); # Obviously not what you want. @@ -95,16 +95,18 @@ as well as L<"I/O Operators">. =head2 The Arrow Operator -Just as in C and C++, "C<-E<gt>>" is an infix dereference operator. If the -right side is either a C<[...]> or C<{...}> subscript, then the left side -must be either a hard or symbolic reference to an array or hash (or -a location capable of holding a hard reference, if it's an lvalue (assignable)). -See L<perlref>. +"C<-E<gt>>" is an infix dereference operator, just as it is in C +and C++. If the right side is either a C<[...]>, C<{...}>, or a +C<(...)> subscript, then the left side must be either a hard or +symbolic reference to an array, a hash, or a subroutine respectively. +(Or technically speaking, a location capable of holding a hard +reference, if it's an array or hash reference being used for +assignment.) See L<perlreftut> and L<perlref>. -Otherwise, the right side is a method name or a simple scalar variable -containing the method name, and the left side must either be an object -(a blessed reference) or a class name (that is, a package name). -See L<perlobj>. +Otherwise, the right side is a method name or a simple scalar +variable containing either the method name or a subroutine reference, +and the left side must be either an object (a blessed reference) +or a class name (that is, a package name). See L<perlobj>. =head2 Auto-increment and Auto-decrement @@ -129,7 +131,7 @@ The auto-decrement operator is not magical. =head2 Exponentiation -Binary "**" is the exponentiation operator. Note that it binds even more +Binary "**" is the exponentiation operator. It binds even more tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is implemented using C's pow(3) function, which actually works on doubles internally.) @@ -155,10 +157,10 @@ syntactically for separating a function name from a parenthesized expression that would otherwise be interpreted as the complete list of function arguments. (See examples above under L<Terms and List Operators (Leftward)>.) -Unary "\" creates a reference to whatever follows it. See L<perlref>. -Do not confuse this behavior with the behavior of backslash within a -string, although both forms do convey the notion of protecting the next -thing from interpretation. +Unary "\" creates a reference to whatever follows it. See L<perlreftut> +and L<perlref>. Do not confuse this behavior with the behavior of +backslash within a string, although both forms do convey the notion +of protecting the next thing from interpolation. =head2 Binding Operators @@ -384,23 +386,26 @@ of B<sed>, B<awk>, and various editors. Each ".." operator maintains its own boolean state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, I<AFTER> which the range operator becomes false -again. (It doesn't become false till the next time the range operator is +again. It doesn't become false till the next time the range operator is evaluated. It can test the right operand and become false on the same evaluation it became true (as in B<awk>), but it still returns true once. -If you don't want it to test the right operand till the next evaluation -(as in B<sed>), use three dots ("...") instead of two.) The right -operand is not evaluated while the operator is in the "false" state, and -the left operand is not evaluated while the operator is in the "true" -state. The precedence is a little lower than || and &&. The value -returned is either the empty string for false, or a sequence number -(beginning with 1) for true. The sequence number is reset for each range -encountered. The final sequence number in a range has the string "E0" -appended to it, which doesn't affect its numeric value, but gives you -something to search for if you want to exclude the endpoint. You can -exclude the beginning point by waiting for the sequence number to be -greater than 1. If either operand of scalar ".." is a constant expression, -that operand is implicitly compared to the C<$.> variable, the current -line number. Examples: +If you don't want it to test the right operand till the next +evaluation, as in B<sed>, just use three dots ("...") instead of +two. In all other regards, "..." behaves just like ".." does. + +The right operand is not evaluated while the operator is in the +"false" state, and the left operand is not evaluated while the +operator is in the "true" state. The precedence is a little lower +than || and &&. The value returned is either the empty string for +false, or a sequence number (beginning with 1) for true. The +sequence number is reset for each range encountered. The final +sequence number in a range has the string "E0" appended to it, which +doesn't affect its numeric value, but gives you something to search +for if you want to exclude the endpoint. You can exclude the +beginning point by waiting for the sequence number to be greater +than 1. If either operand of scalar ".." is a constant expression, +that operand is implicitly compared to the C<$.> variable, the +current line number. Examples: As a scalar operator: @@ -429,7 +434,7 @@ can say @alphabet = ('A' .. 'Z'); -to get all the letters of the alphabet, or +to get all normal letters of the alphabet, or $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15]; @@ -464,8 +469,6 @@ legal lvalues (meaning that you can assign to them): ($a_or_b ? $a : $b) = $c; -This is not necessarily guaranteed to contribute to the readability of your program. - Because this operator produces an assignable result, using assignments without parentheses will get you in trouble. For example, this: @@ -479,6 +482,10 @@ Rather than this: ($a % 2) ? ($a += 10) : ($a += 2) +That should probably be written more simply as: + + $a += ($a % 2) ? 10 : 2; + =head2 Assignment Operators "=" is the ordinary assignment operator. @@ -500,7 +507,7 @@ The following are recognized: .= %= ^= x= -Note that while these are grouped by family, they all have the precedence +Although these are grouped by family, they all have the precedence of assignment. Unlike in C, the assignment operator produces a valid lvalue. Modifying @@ -573,14 +580,14 @@ probably avoid using this for assignment, only for control flow. ($a = $b) or $c; # really means this $a = $b || $c; # better written this way -However, when it's a list context assignment and you're trying to use +However, when it's a list-context assignment and you're trying to use "||" for control flow, you probably need "or" so that the assignment takes higher precedence. @info = stat($file) || die; # oops, scalar sense of stat! @info = stat($file) or die; # better, now @info gets its due -Then again, you could always use parentheses. +Then again, you could always use parentheses. Binary "xor" returns the exclusive-OR of the two surrounding expressions. It cannot short circuit, of course. @@ -602,7 +609,7 @@ operators are typed: $, @, %, and &.) =item (TYPE) -Type casting operator. +Type-casting operator. =back @@ -627,17 +634,17 @@ the same character fore and aft, but the 4 sorts of brackets s{}{} Substitution yes (unless '' is delimiter) tr{}{} Transliteration no (but see below) -Note that there can be whitespace between the operator and the quoting +There can be whitespace between the operator and the quoting characters, except when C<#> is being used as the quoting character. -C<q#foo#> is parsed as being the string C<foo>, while C<q #foo#> is the -operator C<q> followed by a comment. Its argument will be taken from the -next line. This allows you to write: +C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the +operator C<q> followed by a comment. Its argument will be taken +from the next line. This allows you to write: s {foo} # Replace foo {bar} # with bar. -For constructs that do interpolation, variables beginning with "C<$>" -or "C<@>" are interpolated, as are the following sequences. Within +For constructs that do interpolate, variables beginning with "C<$>" +or "C<@>" are interpolated, as are the following escape sequences. Within a transliteration, the first eleven of these sequences may be used. \t tab (HT, TAB) @@ -650,7 +657,7 @@ a transliteration, the first eleven of these sequences may be used. \033 octal char (ESC) \x1b hex char (ESC) \x{263a} wide hex char (SMILEY) - \c[ control char + \c[ control char (ESC) \l lowercase next char \u uppercase next char @@ -664,7 +671,7 @@ and C<\U> is taken from the current locale. See L<perllocale>. All systems use the virtual C<"\n"> to represent a line terminator, called a "newline". There is no such thing as an unvarying, physical -newline character. It is an illusion that the operating system, +newline character. It is only an illusion that the operating system, device drivers, C libraries, and Perl all conspire to preserve. Not all systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example, on a Mac, these are reversed, and on systems without line terminator, @@ -687,28 +694,17 @@ interpolated, so that regular expressions may be incorporated into the pattern from the variables. If this is not what you want, use C<\Q> to interpolate a variable literally. -Apart from the above, there are no multiple levels of interpolation. In -particular, contrary to the expectations of shell programmers, back-quotes -do I<NOT> interpolate within double quotes, nor do single quotes impede -evaluation of variables when used within double quotes. +Apart from the behavior described above, Perl does not expand +multiple levels of interpolation. In particular, contrary to the +expectations of shell programmers, back-quotes do I<NOT> interpolate +within double quotes, nor do single quotes impede evaluation of +variables when used within double quotes. =head2 Regexp Quote-Like Operators Here are the quote-like operators that apply to pattern matching and related activities. -Most of this section is related to use of regular expressions from Perl. -Such a use may be considered from two points of view: Perl handles a -a string and a "pattern" to RE (regular expression) engine to match, -RE engine finds (or does not find) the match, and Perl uses the findings -of RE engine for its operation, possibly asking the engine for other matches. - -RE engine has no idea what Perl is going to do with what it finds, -similarly, the rest of Perl has no idea what a particular regular expression -means to RE engine. This creates a clean separation, and in this section -we discuss matching from Perl point of view only. The other point of -view may be found in L<perlre>. - =over 8 =item ?PATTERN? @@ -727,21 +723,22 @@ patterns local to the current package are reset. reset if eof; # clear ?? status for next file } -This usage is vaguely deprecated, and may be removed in some future -version of Perl. +This usage is vaguely depreciated, which means it just might possibly +be removed in some distant future version of Perl, perhaps somewhere +around the year 2168. =item m/PATTERN/cgimosx =item /PATTERN/cgimosx Searches a string for a pattern match, and in scalar context returns -true (1) or false (''). If no string is specified via the C<=~> or -C<!~> operator, the $_ string is searched. (The string specified with -C<=~> need not be an lvalue--it may be the result of an expression -evaluation, but remember the C<=~> binds rather tightly.) See also -L<perlre>. -See L<perllocale> for discussion of additional considerations that apply -when C<use locale> is in effect. +true if it succeeds, false if it fails. If no string is specified +via the C<=~> or C<!~> operator, the $_ string is searched. (The +string specified with C<=~> need not be an lvalue--it may be the +result of an expression evaluation, but remember the C<=~> binds +rather tightly.) See also L<perlre>. See L<perllocale> for +discussion of additional considerations that apply when C<use locale> +is in effect. Options are: @@ -755,11 +752,10 @@ Options are: If "/" is the delimiter then the initial C<m> is optional. With the C<m> you can use any pair of non-alphanumeric, non-whitespace characters -as delimiters. This is particularly useful for matching Unix path names -that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is +as delimiters. This is particularly useful for matching path names +that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is the delimiter, then the match-only-once rule of C<?PATTERN?> applies. -If "'" is the delimiter, no variable interpolation is performed on the -PATTERN. +If "'" is the delimiter, no interpolation is performed on the PATTERN. PATTERN may contain variables, which will be interpolated (and the pattern recompiled) every time the pattern search is evaluated, except @@ -770,12 +766,12 @@ the trailing delimiter. This avoids expensive run-time recompilations, and is useful when the value you are interpolating won't change over the life of the script. However, mentioning C</o> constitutes a promise that you won't change the variables in the pattern. If you change them, -Perl won't even notice. +Perl won't even notice. See also L<qr//>. If the PATTERN evaluates to the empty string, the last I<successfully> matched regular expression is used instead. -If the C</g> option is not used, C<m//> in a list context returns a +If the C</g> option is not used, C<m//> in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are also set, and that this differs from Perl 4's behavior.) When there are @@ -805,15 +801,16 @@ remainder of the line, and assigns those three fields to $F1, $F2, and $Etc. The conditional is true if any variables were assigned, i.e., if the pattern matched. -The C</g> modifier specifies global pattern matching--that is, matching -as many times as possible within the string. How it behaves depends on -the context. In list context, it returns a list of all the -substrings matched by all the parentheses in the regular expression. -If there are no parentheses, it returns a list of all the matched -strings, as if there were parentheses around the whole pattern. +The C</g> modifier specifies global pattern matching--that is, +matching as many times as possible within the string. How it behaves +depends on the context. In list context, it returns a list of the +substrings matched by any capturing parentheses in the regular +expression. If there are no parentheses, it returns a list of all +the matched strings, as if there were parentheses around the whole +pattern. In scalar context, each execution of C<m//g> finds the next match, -returning TRUE if it matches, and FALSE if there is no further match. +returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see L<perlfunc/pos>. A failed match normally resets the search position to the beginning of the string, but you can avoid that @@ -823,8 +820,8 @@ string also resets the search position. You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a zero-width assertion that matches the exact position where the previous C<m//g>, if any, left off. The C<\G> assertion is not supported without -the C</g> modifier; currently, without C</g>, C<\G> behaves just like -C<\A>, but that's accidental and may change in the future. +the C</g> modifier. (Currently, without C</g>, C<\G> behaves just like +C<\A>, but that's accidental and may change in the future.) Examples: @@ -832,12 +829,10 @@ Examples: ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g); # scalar context - { - local $/ = ""; - while (defined($paragraph = <>)) { - while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) { - $sentences++; - } + $/ = ""; $* = 1; # $* deprecated in modern perls + while (defined($paragraph = <>)) { + while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) { + $sentences++; } } print "$sentences\n"; @@ -893,7 +888,7 @@ Here is the output (split into several lines): =item C<'STRING'> -A single-quoted, literal string. A backslash represents a backslash +A single-quoted, literal string. A backslash represents a backslash unless followed by the delimiter or another backslash, in which case the delimiter or backslash is interpolated. @@ -909,15 +904,16 @@ A double-quoted, interpolated string. $_ .= qq (*** The previous line contains the naughty word "$1".\n) - if /(tcl|rexx|python)/; # :-) + if /\b(tcl|java|python)\b/i; # :-) $baz = "\n"; # a one-character string =item qr/STRING/imosx -Quote-as-a-regular-expression operator. I<STRING> is interpolated the -same way as I<PATTERN> in C<m/PATTERN/>. If "'" is used as the -delimiter, no variable interpolation is done. Returns a Perl value -which may be used instead of the corresponding C</STRING/imosx> expression. +This operators quotes--and compiles--its I<STRING> as a regular +expression. I<STRING> is interpolated the same way as I<PATTERN> +in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation +is done. Returns a Perl value which may be used instead of the +corresponding C</STRING/imosx> expression. For example, @@ -936,7 +932,7 @@ The result may be used as a subpattern in a match: $string =~ /$re/; # or this way Since Perl may compile the pattern at the moment of execution of qr() -operator, using qr() may have speed advantages in I<some> situations, +operator, using qr() may have speed advantages in some situations, notably if the result of qr() is used standalone: sub match { @@ -951,11 +947,11 @@ notably if the result of qr() is used standalone: } @_; } -Precompilation of the pattern into an internal representation at the -moment of qr() avoids a need to recompile the pattern every time a -match C</$pat/> is attempted. (Note that Perl has many other -internal optimizations, but none would be triggered in the above -example if we did not use qr() operator.) +Precompilation of the pattern into an internal representation at +the moment of qr() avoids a need to recompile the pattern every +time a match C</$pat/> is attempted. (Perl has many other internal +optimizations, but none would be triggered in the above example if +we did not use qr() operator.) Options are: @@ -1012,7 +1008,7 @@ double-quote interpolation, passing it on to the shell instead: $perl_info = qx(ps $$); # that's Perl's $$ $shell_info = qx'ps $$'; # that's the new shell's $$ -Note that how the string gets evaluated is entirely subject to the command +How that string gets evaluated is entirely subject to the command interpreter on your system. On most platforms, you will have to protect shell metacharacters if you want them treated literally. This is in practice difficult to do, as it's unclear how to escape which characters. @@ -1064,10 +1060,10 @@ Some frequently seen examples: use POSIX qw( setlocale localeconv ) @EXPORT = qw( foo bar baz ); -A common mistake is to try to separate the words with comma or to put -comments into a multi-line C<qw>-string. For this reason the C<-w> -switch produce warnings if the STRING contains the "," or the "#" -character. +A common mistake is to try to separate the words with comma or to +put comments into a multi-line C<qw>-string. For this reason, the +B<-w> switch (that is, the C<$^W> variable) produces warnings if +the STRING contains the "," or the "#" character. =item s/PATTERN/REPLACEMENT/egimosx @@ -1080,7 +1076,7 @@ variable is searched and modified. (The string specified with C<=~> must be scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.) -If the delimiter chosen is a single quote, no variable interpolation is +If the delimiter chosen is a single quote, no interpolation is done on either the PATTERN or the REPLACEMENT. Otherwise, if the PATTERN contains a $ that looks like a variable rather than an end-of-string test, the variable will be interpolated into the pattern @@ -1163,16 +1159,14 @@ B<sed>, we use the \E<lt>I<digit>E<gt> form in only the left hand side. Anywhere else it's $E<lt>I<digit>E<gt>. Occasionally, you can't use just a C</g> to get all the changes -to occur. Here are two common cases: +to occur that you might want. Here are two common cases: # put commas in the right places in an integer - 1 while s/(.*\d)(\d\d\d)/$1,$2/g; # perl4 - 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; # perl5 + 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; # expand tabs to 8-column spacing 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e; - =item tr/SEARCHLIST/REPLACEMENTLIST/cdsUC =item y/SEARCHLIST/REPLACEMENTLIST/cdsUC @@ -1206,14 +1200,14 @@ Options: U Translate to/from UTF-8. C Translate to/from 8-bit char (octet). -If the C</c> modifier is specified, the SEARCHLIST character set is -complemented. If the C</d> modifier is specified, any characters specified -by SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note -that this is slightly more flexible than the behavior of some B<tr> -programs, which delete anything they find in the SEARCHLIST, period.) -If the C</s> modifier is specified, sequences of characters that were -transliterated to the same character are squashed down to a single instance of the -character. +If the C</c> modifier is specified, the SEARCHLIST character set +is complemented. If the C</d> modifier is specified, any characters +specified by SEARCHLIST not found in REPLACEMENTLIST are deleted. +(Note that this is slightly more flexible than the behavior of some +B<tr> programs, which delete anything they find in the SEARCHLIST, +period.) If the C</s> modifier is specified, sequences of characters +that were transliterated to the same character are squashed down +to a single instance of the character. If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter @@ -1245,19 +1239,20 @@ Examples: tr [\200-\377] [\000-\177]; # delete 8th bit - tr/\0-\xFF//CU; # translate Latin-1 to Unicode - tr/\0-\x{FF}//UC; # translate Unicode to Latin-1 + tr/\0-\xFF//CU; # change Latin-1 to Unicode + tr/\0-\x{FF}//UC; # change Unicode to Latin-1 -If multiple transliterations are given for a character, only the first one is used: +If multiple transliterations are given for a character, only the +first one is used: tr/AAA/XYZ/ will transliterate any A to X. -Note that because the transliteration table is built at compile time, neither +Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote -interpolation. That means that if you want to use variables, you must use -an eval(): +interpolation. That means that if you want to use variables, you +must use an eval(): eval "tr/$oldlist/$newlist/"; die $@ if $@; @@ -1268,52 +1263,52 @@ an eval(): =head2 Gory details of parsing quoted constructs -When presented with something which may have several different -interpretations, Perl uses the principle B<DWIM> (expanded to Do What I Mean -- not what I wrote) to pick up the most probable interpretation of the -source. This strategy is so successful that Perl users usually do not -suspect ambivalence of what they write. However, time to time Perl's ideas -differ from what the author meant. - -The target of this section is to clarify the Perl's way of interpreting -quoted constructs. The most frequent reason one may have to want to know the -details discussed in this section is hairy regular expressions. However, the -first steps of parsing are the same for all Perl quoting operators, so here -they are discussed together. - -The most important detail of Perl parsing rules is the first one -discussed below; when processing a quoted construct, Perl I<first> -finds the end of the construct, then it interprets the contents of the -construct. If you understand this rule, you may skip the rest of this -section on the first reading. The other rules would -contradict user's expectations much less frequently than the first one. - -Some of the passes discussed below are performed concurrently, but as -far as results are the same, we consider them one-by-one. For different -quoting constructs Perl performs different number of passes, from -one to five, but they are always performed in the same order. +When presented with something that might have several different +interpretations, Perl uses the B<DWIM> (that's "Do What I Mean") +principle to pick the most probable interpretation. This strategy +is so successful that Perl programmers often do not suspect the +ambivalence of what they write. But from time to time, Perl's +notions differ substantially from what the author honestly meant. + +This section hopes to clarify how Perl handles quoted constructs. +Although the most common reason to learn this is to unravel labyrinthine +regular expressions, because the initial steps of parsing are the +same for all quoting operators, they are all discussed together. + +The most important Perl parsing rule is the first one discussed +below: when processing a quoted construct, Perl first finds the end +of that construct, then interprets its contents. If you understand +this rule, you may skip the rest of this section on the first +reading. The other rules are likely to contradict the user's +expectations much less frequently than this first one. + +Some passes discussed below are performed concurrently, but because +their results are the same, we consider them individually. For different +quoting constructs, Perl performs different numbers of passes, from +one to five, but these passes are always performed in the same order. =over =item Finding the end -First pass is finding the end of the quoted construct, be it -a multichar delimiter -C<"\nEOF\n"> of C<<<EOF> construct, C</> which terminates C<qq/> construct, -C<]> which terminates C<qq[> construct, or C<E<gt>> which terminates a -fileglob started with C<<>. +The first pass is finding the end of the quoted construct, whether +it be a multicharacter delimiter C<"\nEOF\n"> in the C<<<EOF> +construct, a C</> that terminates a C<qq//> construct, a C<]> which +terminates C<qq[]> construct, or a C<E<gt>> which terminates a +fileglob started with C<E<lt>>. -When searching for one-char non-matching delimiter, such as C</>, combinations -C<\\> and C<\/> are skipped. When searching for one-char matching delimiter, -such as C<]>, combinations C<\\>, C<\]> and C<\[> are skipped, and -nested C<[>, C<]> are skipped as well. When searching for multichar delimiter -no skipping is performed. +When searching for single-character non-pairing delimiters, such +as C</>, combinations of C<\\> and C<\/> are skipped. However, +when searching for single-character pairing delimiter like C<[>, +combinations of C<\\>, C<\]>, and C<\[> are all skipped, and nested +C<[>, C<]> are skipped as well. When searching for multicharacter +delimiters, nothing is skipped. -For constructs with 3-part delimiters (C<s///> etc.) the search is -repeated once more. +For constructs with three-part delimiters (C<s///>, C<y///>, and +C<tr///>), the search is repeated once more. -During this search no attention is paid to the semantic of the construct, -thus: +During this search no attention is paid to the semantics of the construct. +Thus: "$hash{"$foo/$bar"}" @@ -1323,30 +1318,28 @@ or: bar # NOT a comment, this slash / terminated m//! /x -do not form legal quoted expressions, the quoted part ends on the first C<"> -and C</>, and the rest happens to be a syntax error. Note that since the slash -which terminated C<m//> was followed by a C<SPACE>, the above is not C<m//x>, -but rather C<m//> with no 'x' switch. So the embedded C<#> is interpreted -as a literal C<#>. +do not form legal quoted expressions. The quoted part ends on the +first C<"> and C</>, and the rest happens to be a syntax error. +Because the slash that terminated C<m//> was followed by a C<SPACE>, +the example above is not C<m//x>, but rather C<m//> with no C</x> +modifier. So the embedded C<#> is interpreted as a literal C<#>. =item Removal of backslashes before delimiters -During the second pass the text between the starting delimiter and -the ending delimiter is copied to a safe location, and the C<\> is -removed from combinations consisting of C<\> and delimiter(s) (both starting -and ending delimiter if they differ). - -The removal does not happen for multi-char delimiters. - -Note that the combination C<\\> is left as it was! +During the second pass, text between the starting and ending +delimiters is copied to a safe location, and the C<\> is removed +from combinations consisting of C<\> and delimiter--or delimiters, +meaning both starting and ending delimiters will should these differ. +This removal does not happen for multi-character delimiters. +Note that the combination C<\\> is left intact, just as it was. -Starting from this step no information about the delimiter(s) is used in the -parsing. +Starting from this step no information about the delimiters is +used in parsing. =item Interpolation -Next step is interpolation in the obtained delimiter-independent text. -There are four different cases. +The next step is interpolation in the text obtained, which is now +delimiter-independent. There are four different cases. =over @@ -1360,44 +1353,40 @@ The only interpolation is removal of C<\> from pairs C<\\>. =item C<"">, C<``>, C<qq//>, C<qx//>, C<<file*globE<gt>> -C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are converted -to corresponding Perl constructs, thus C<"$foo\Qbaz$bar"> is converted to : - - $foo . (quotemeta("baz" . $bar)); - -Other combinations of C<\> with following chars are substituted with -appropriate expansions. +C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are +converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar"> +is converted to C<$foo . (quotemeta("baz" . $bar))> internally. +The other combinations are replaced with appropriate expansions. -Let it be stressed that I<whatever is between C<\Q> and C<\E>> is interpolated -in the usual way. Say, C<"\Q\\E"> has no C<\E> inside: it has C<\Q>, C<\\>, -and C<E>, thus the result is the same as for C<"\\\\E">. Generally speaking, -having backslashes between C<\Q> and C<\E> may lead to counterintuitive -results. So, C<"\Q\t\E"> is converted to: - - quotemeta("\t") - -which is the same as C<"\\\t"> (since TAB is not alphanumerical). Note also -that: +Let it be stressed that I<whatever falls between C<\Q> and C<\E>> +is interpolated in the usual way. Something like C<"\Q\\E"> has +no C<\E> inside. instead, it has C<\Q>, C<\\>, and C<E>, so the +result is the same as for C<"\\\\E">. As a general rule, backslashes +between C<\Q> and C<\E> may lead to counterintuitive results. So, +C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same +as C<"\\\t"> (since TAB is not alphanumeric). Note also that: $str = '\t'; return "\Q$str"; may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">. -Interpolated scalars and arrays are internally converted to the C<join> and -C<.> Perl operations, thus C<"$foo >>> '@arr'"> becomes: +Interpolated scalars and arrays are converted internally to the C<join> and +C<.> catentation operations. Thus, C<"$foo XXX '@arr'"> becomes: - $foo . " >>> '" . (join $", @arr) . "'"; + $foo . " XXX '" . (join $", @arr) . "'"; -All the operations in the above are performed simultaneously left-to-right. +All operations above are performed simultaneously, left to right. -Since the result of "\Q STRING \E" has all the metacharacters quoted -there is no way to insert a literal C<$> or C<@> inside a C<\Q\E> pair: if -protected by C<\> C<$> will be quoted to became "\\\$", if not, it is -interpreted as starting an interpolated scalar. +Because the result of C<"\Q STRING \E"> has all metacharacters +quoted, there is no way to insert a literal C<$> or C<@> inside a +C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to became +C<"\\\$">; if not, it is interpreted as the start of an interpolated +scalar. -Note also that the interpolating code needs to make a decision on where the -interpolated scalar ends. For instance, whether C<"a $b -E<gt> {c}"> means: +Note also that the interpolation code needs to make a decision on +where the interpolated scalar ends. For instance, whether +C<"a $b -E<gt> {c}"> really means: "a " . $b . " -> {c}"; @@ -1405,99 +1394,108 @@ or: "a " . $b -> {c}; -I<Most of the time> the decision is to take the longest possible text which -does not include spaces between components and contains matching -braces/brackets. Since the outcome may be determined by I<voting> based -on heuristic estimators, the result I<is not strictly predictable>, but -is usually correct for the ambiguous cases. +Most of the time, the longest possible text that does not include +spaces between components and which contains matching braces or +brackets. because the outcome may be determined by voting based +on heuristic estimators, the result is not strictly predictable. +Fortunately, it's usually correct for ambiguous cases. =item C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>, -Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> and interpolation happens -(almost) as with C<qq//> constructs, but I<the substitution of C<\> followed by -RE-special chars (including C<\>) is not performed>! Moreover, -inside C<(?{BLOCK})>, C<(?# comment )>, and C<#>-comment of -C<//x>-regular expressions no processing is performed at all. -This is the first step where presence of the C<//x> switch is relevant. - -Interpolation has several quirks: C<$|>, C<$(> and C<$)> are not interpolated, and -constructs C<$var[SOMETHING]> are I<voted> (by several different estimators) -to be an array element or C<$var> followed by a RE alternative. This is -the place where the notation C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> -is interpreted as an array element C<-9>, not as a regular expression from -variable C<$arr> followed by a digit, which is the interpretation of -C</$arr[0-9]/>. Since voting among different estimators may be performed, -the result I<is not predictable>. - -It is on this step that C<\1> is converted to C<$1> in the replacement -text of C<s///>. - -Note that absence of processing of C<\\> creates specific restrictions on the -post-processed text: if the delimiter is C</>, one cannot get the combination -C<\/> into the result of this step: C</> will finish the regular expression, -C<\/> will be stripped to C</> on the previous step, and C<\\/> will be left -as is. Since C</> is equivalent to C<\/> inside a regular expression, this -does not matter unless the delimiter is a special character for the RE engine, -as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>, or an alphanumeric char, as in: +Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation +happens (almost) as with C<qq//> constructs, but the substitution +of C<\> followed by RE-special chars (including C<\>) is not +performed. Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and +a C<#>-comment in a C<//x>-regular expression, no processing is +performed whatsoever. This is the first step at which the presence +of the C<//x> modifier is relevant. + +Interpolation has several quirks: C<$|>, C<$(>, and C<$)> are not +interpolated, and constructs C<$var[SOMETHING]> are voted (by several +different estimators) to be either an array element or C<$var> +followed by an RE alternative. This is where the notation +C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as +array element C<-9>, not as a regular expression from the variable +C<$arr> followed by a digit, which would be the interpretation of +C</$arr[0-9]/>. Since voting among different estimators may occur, +the result is not predictable. + +It is at this step that C<\1> is begrudgingly converted to C<$1> in +the replacement text of C<s///> to correct the incorrigible +I<sed> hackers who haven't picked up the saner idiom yet. A warning +is emitted if the B<-w> command-line flag (that is, the C<$^W> variable) +was set. + +The lack of processing of C<\\> creates specific restrictions on +the post-processed text. If the delimiter is C</>, one cannot get +the combination C<\/> into the result of this step. C</> will +finish the regular expression, C<\/> will be stripped to C</> on +the previous step, and C<\\/> will be left as is. Because C</> is +equivalent to C<\/> inside a regular expression, this does not +matter unless the delimiter happens to be character special to the +RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an +alphanumeric char, as in: m m ^ a \s* b mmx; -In the above RE, which is intentionally obfuscated for illustration, the +In the RE above, which is intentionally obfuscated for illustration, the delimiter is C<m>, the modifier is C<mx>, and after backslash-removal the -RE is the same as for C<m/ ^ a s* b /mx>). +RE is the same as for C<m/ ^ a s* b /mx>). There's more than one +reason you're encouraged to restrict your delimiters to non-alphanumeric, +non-whitespace choices. =back -This step is the last one for all the constructs except regular expressions, +This step is the last one for all constructs except regular expressions, which are processed further. =item Interpolation of regular expressions -All the previous steps were performed during the compilation of Perl code, -this one happens in run time (though it may be optimized to be calculated -at compile time if appropriate). After all the preprocessing performed -above (and possibly after evaluation if catenation, joining, up/down-casing -and C<quotemeta()>ing are involved) the resulting I<string> is passed to RE -engine for compilation. - -Whatever happens in the RE engine is better be discussed in L<perlre>, -but for the sake of continuity let us do it here. - -This is another step where presence of the C<//x> switch is relevant. -The RE engine scans the string left-to-right, and converts it to a finite -automaton. - -Backslashed chars are either substituted by corresponding literal -strings (as with C<\{>), or generate special nodes of the finite automaton -(as with C<\b>). Characters which are special to the RE engine (such as -C<|>) generate corresponding nodes or groups of nodes. C<(?#...)> -comments are ignored. All the rest is either converted to literal strings -to match, or is ignored (as is whitespace and C<#>-style comments if -C<//x> is present). - -Note that the parsing of the construct C<[...]> is performed using -rather different rules than for the rest of the regular expression. -The terminator of this construct is found using the same rules as for -finding a terminator of a C<{}>-delimited construct, the only exception -being that C<]> immediately following C<[> is considered as if preceded -by a backslash. Similarly, the terminator of C<(?{...})> is found using -the same rules as for finding a terminator of a C<{}>-delimited construct. - -It is possible to inspect both the string given to RE engine, and the -resulting finite automaton. See arguments C<debug>/C<debugcolor> -of C<use L<re>> directive, and/or B<-Dr> option of Perl in -L<perlrun/Switches>. +Previous steps were performed during the compilation of Perl code, +but this one happens at run time--although it may be optimized to +be calculated at compile time if appropriate. After preprocessing +described above, and possibly after evaluation if catenation, +joining, casing translation, or metaquoting are involved, the +resulting I<string> is passed to the RE engine for compilation. + +Whatever happens in the RE engine might be better discussed in L<perlre>, +but for the sake of continuity, we shall do so here. + +This is another step where the presence of the C<//x> modifier is +relevant. The RE engine scans the string from left to right and +converts it to a finite automaton. + +Backslashed characters are either replaced with corresponding +literal strings (as with C<\{>), or else they generate special nodes +in the finite automaton (as with C<\b>). Characters special to the +RE engine (such as C<|>) generate corresponding nodes or groups of +nodes. C<(?#...)> comments are ignored. All the rest is either +converted to literal strings to match, or else is ignored (as is +whitespace and C<#>-style comments if C<//x> is present). + +Parsing of the bracketed character class construct, C<[...]>, is +rather different than the rule used for the rest of the pattern. +The terminator of this construct is found using the same rules as +for finding the terminator of a C<{}>-delimited construct, the only +exception being that C<]> immediately following C<[> is treated as +though preceded by a backslash. Similarly, the terminator of +C<(?{...})> is found using the same rules as for finding the +terminator of a C<{}>-delimited construct. + +It is possible to inspect both the string given to RE engine and the +resulting finite automaton. See the arguments C<debug>/C<debugcolor> +in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line +switch documented in L<perlrun/Switches>. =item Optimization of regular expressions This step is listed for completeness only. Since it does not change semantics, details of this step are not documented and are subject -to change. This step is performed over the finite automaton generated -during the previous pass. +to change without notice. This step is performed over the finite +automaton that was generated during the previous pass. -However, in older versions of Perl C<L<split>> used to silently -optimize C</^/> to mean C</^/m>. This behaviour, though present -in current versions of Perl, may be deprecated in future. +It is at this stage that C<split()> silently optimizes C</^/> to +mean C</^/m>. =back @@ -1506,39 +1504,40 @@ in current versions of Perl, may be deprecated in future. There are several I/O operators you should know about. A string enclosed by backticks (grave accents) first undergoes -variable substitution just like a double quoted string. It is then -interpreted as a command, and the output of that command is the value -of the pseudo-literal, like in a shell. In scalar context, a single -string consisting of all the output is returned. In list context, -a list of values is returned, one for each line of output. (You can -set C<$/> to use a different line terminator.) The command is executed +double-quote interpolation. It is then interpreted as an external +command, and the output of that command is the value of the +pseudo-literal, j +string consisting of all output is returned. In list context, a +list of values is returned, one per line of output. (You can set +C<$/> to use a different line terminator.) The command is executed each time the pseudo-literal is evaluated. The status value of the command is returned in C<$?> (see L<perlvar> for the interpretation of C<$?>). Unlike in B<csh>, no translation is done on the return data--newlines remain newlines. Unlike in any of the shells, single quotes do not hide variable names in the command from interpretation. -To pass a $ through to the shell you need to hide it with a backslash. -The generalized form of backticks is C<qx//>. (Because backticks -always undergo shell expansion as well, see L<perlsec> for -security concerns.) - -In a scalar context, evaluating a filehandle in angle brackets yields the -next line from that file (newline, if any, included), or C<undef> at -end-of-file. When C<$/> is set to C<undef> (i.e. file slurp mode), -and the file is empty, it returns C<''> the first time, followed by -C<undef> subsequently. - -Ordinarily you must assign the returned value to a variable, but there is one -situation where an automatic assignment happens. I<If and ONLY if> the -input symbol is the only thing inside the conditional of a C<while> or -C<for(;;)> loop, the value is automatically assigned to the variable -C<$_>. In these loop constructs, the assigned value (whether assignment -is automatic or explicit) is then tested to see if it is defined. -The defined test avoids problems where line has a string value -that would be treated as false by perl e.g. "" or "0" with no trailing -newline. (This may seem like an odd thing to you, but you'll use the -construct in almost every Perl script you write.) Anyway, the following -lines are equivalent to each other: +To pass a literal dollar-sign through to the shell you need to hide +it with a backslash. The generalized form of backticks is C<qx//>. +(Because backticks always undergo shell expansion as well, see +L<perlsec> for security concerns.) + +In scalar context, evaluating a filehandle in angle brackets yields +the next line from that file (the newline, if any, included), or +C<undef> at end-of-file or on error. When C<$/> is set to C<undef> +(sometimes known as file-slurp mode) and the file is empty, it +returns C<''> the first time, followed by C<undef> subsequently. + +Ordinarily you must assign the returned value to a variable, but +there is one situation where an automatic assignment happens. If +and only if the input symbol is the only thing inside the conditional +of a C<while> statement (even if disguised as a C<for(;;)> loop), +the value is automatically assigned to the global variable $_, +destroying whatever was there previously. (This may seem like an +odd thing to you, but you'll use the construct in almost every Perl +script you write.) The $_ variables is not implicitly localized. +You'll have to put a C<local $_;> before the loop if you want that +to happen. + +The following lines are equivalent: while (defined($_ = <STDIN>)) { print; } while ($_ = <STDIN>) { print; } @@ -1548,34 +1547,40 @@ lines are equivalent to each other: print while ($_ = <STDIN>); print while <STDIN>; -and this also behaves similarly, but avoids the use of $_ : +This also behaves similarly, but avoids $_ : while (my $line = <STDIN>) { print $line } -If you really mean such values to terminate the loop they should be -tested for explicitly: +In these loop constructs, the assigned value (whether assignment +is automatic or explicit) is then tested to see whether it is +defined. The defined test avoids problems where line has a string +value that would be treated as false by Perl, for example a "" or +a "0" with no trailing newline. If you really mean for such values +to terminate the loop, they should be tested for explicitly: while (($_ = <STDIN>) ne '0') { ... } while (<STDIN>) { last unless $_; ... } -In other boolean contexts, C<E<lt>I<filehandle>E<gt>> without explicit C<defined> -test or comparison will solicit a warning if C<-w> is in effect. +In other boolean contexts, C<E<lt>I<filehandle>E<gt>> without an +explicit C<defined> test or comparison elicit a warning if the B<-w> +command-line switch (the C<$^W> variable) is in effect. The filehandles STDIN, STDOUT, and STDERR are predefined. (The -filehandles C<stdin>, C<stdout>, and C<stderr> will also work except in -packages, where they would be interpreted as local identifiers rather -than global.) Additional filehandles may be created with the open() -function. See L<perlfunc/open> for details on this. +filehandles C<stdin>, C<stdout>, and C<stderr> will also work except +in packages, where they would be interpreted as local identifiers +rather than global.) Additional filehandles may be created with +the open() function, amongst others. See L<perlopentut> and +L<perlfunc/open> for details on this. -If a E<lt>FILEHANDLEE<gt> is used in a context that is looking for a list, a -list consisting of all the input lines is returned, one line per list -element. It's easy to make a I<LARGE> data space this way, so use with -care. +If a E<lt>FILEHANDLEE<gt> is used in a context that is looking for +a list, a list comprising all input lines is returned, one line per +list element. It's easy to grow to a rather large data space this +way, so use with care. -E<lt>FILEHANDLEE<gt> may also be spelt readline(FILEHANDLE). See -L<perlfunc/readline>. +E<lt>FILEHANDLEE<gt> may also be spelled C<readline(*FILEHANDLE)>. +See L<perlfunc/readline>. -The null filehandle E<lt>E<gt> is special and can be used to emulate the +The null filehandle E<lt>E<gt> is special: it can be used to emulate the behavior of B<sed> and B<awk>. Input from E<lt>E<gt> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time E<lt>E<gt> is evaluated, the @ARGV array is @@ -1597,16 +1602,17 @@ is equivalent to the following Perl-like pseudo code: } } -except that it isn't so cumbersome to say, and will actually work. It -really does shift array @ARGV and put the current filename into variable -$ARGV. It also uses filehandle I<ARGV> internally--E<lt>E<gt> is just a -synonym for E<lt>ARGVE<gt>, which is magical. (The pseudo code above -doesn't work because it treats E<lt>ARGVE<gt> as non-magical.) +except that it isn't so cumbersome to say, and will actually work. +It really does shift the @ARGV array and put the current filename +into the $ARGV variable. It also uses filehandle I<ARGV> +internally--E<lt>E<gt> is just a synonym for E<lt>ARGVE<gt>, which +is magical. (The pseudo code above doesn't work because it treats +E<lt>ARGVE<gt> as non-magical.) You can modify @ARGV before the first E<lt>E<gt> as long as the array ends up containing the list of filenames you really want. Line numbers (C<$.>) -continue as if the input were one big happy file. (But see example -under C<eof> for how to reset line numbers on each file.) +continue as though the input were one big happy file. See the example +in L<perlfunc/eof> for how to reset line numbers on each file. If you want to set @ARGV to your own list of files, go right ahead. This sets @ARGV to all plain text files if no @ARGV was given: @@ -1634,12 +1640,13 @@ Getopts modules or put a loop on the front like this: } The E<lt>E<gt> symbol will return C<undef> for end-of-file only once. -If you call it again after this it will assume you are processing another -@ARGV list, and if you haven't set @ARGV, will input from STDIN. +If you call it again after this, it will assume you are processing another +@ARGV list, and if you haven't set @ARGV, will read input from STDIN. -If the string inside the angle brackets is a reference to a scalar -variable (e.g., E<lt>$fooE<gt>), then that variable contains the name of the -filehandle to input from, or its typeglob, or a reference to the same. For example: +If angle brackets contain is a simple scalar variable (e.g., +E<lt>$fooE<gt>), then that variable contains the name of the +filehandle to input from, or its typeglob, or a reference to the +same. For example: $fh = \*STDIN; $line = <$fh>; @@ -1648,9 +1655,9 @@ If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, -depending on context. This distinction is determined on syntactic -grounds alone. That means C<E<lt>$xE<gt>> is always a readline from -an indirect handle, but C<E<lt>$hash{key}E<gt>> is always a glob. +depending on context. This distinction is determined on syntactic +grounds alone. That means C<E<lt>$xE<gt>> is always a readline() from +an indirect handle, but C<E<lt>$hash{key}E<gt>> is always a glob(). That's because $x is a simple scalar variable, but C<$hash{key}> is not--it's a hash element. @@ -1660,7 +1667,7 @@ in the previous paragraph. (In older versions of Perl, programmers would insert curly brackets to force interpretation as a filename glob: C<E<lt>${foo}E<gt>>. These days, it's considered cleaner to call the internal function directly as C<glob($foo)>, which is probably the right -way to have done it in the first place.) Example: +way to have done it in the first place.) For example: while (<*.c>) { chmod 0644, $_; @@ -1674,27 +1681,31 @@ is equivalent to chmod 0644, $_; } -In fact, it's currently implemented that way. (Which means it will not -work on filenames with spaces in them unless you have csh(1) on your -machine.) Of course, the shortest way to do the above is: +In fact, it's currently implemented that way, but this is expected +to be made completely internal in the near future. (Which means +it will not work on filenames with spaces in them unless you have +csh(1) on your machine.) Of course, the shortest way to do the +above is: chmod 0644, <*.c>; -Because globbing invokes a shell, it's often faster to call readdir() yourself -and do your own grep() on the filenames. Furthermore, due to its current -implementation of using a shell, the glob() routine may get "Arg list too -long" errors (unless you've installed tcsh(1L) as F</bin/csh>). - -A glob evaluates its (embedded) argument only when it is starting a new -list. All values must be read before it will start over. In a list -context this isn't important, because you automatically get them all -anyway. In scalar context, however, the operator returns the next value -each time it is called, or a C<undef> value if you've just run out. As -for filehandles an automatic C<defined> is generated when the glob -occurs in the test part of a C<while> or C<for> - because legal glob returns -(e.g. a file called F<0>) would otherwise terminate the loop. -Again, C<undef> is returned only once. So if you're expecting a single value -from a glob, it is much better to say +Because globbing currently invokes a shell, it's often faster to +call readdir() yourself and do your own grep() on the filenames. +Furthermore, due to its current implementation of using a shell, +the glob() routine may get "Arg list too long" errors (unless you've +installed tcsh(1L) as F</bin/csh> or hacked your F<config.sh>). + +A (file)glob evaluates its (embedded) argument only when it is +starting a new list. All values must be read before it will start +over. In list context, this isn't important because you automatically +get them all anyway. However, in scalar context the operator returns +the next value each time it's called, or C +run out. As with filehandle reads, an automatic C<defined> is +generated when the glob occurs in the test part of a C<while>, +because legal glob returns (e.g. a file called F<0>) would otherwise +terminate the loop. Again, C<undef> is returned only once. So if +you're expecting a single value from a glob, it is much better to +say ($file) = <blurch*>; @@ -1703,7 +1714,7 @@ than $file = <blurch*>; because the latter will alternate between returning a filename and -returning FALSE. +returning false. It you're trying to do variable interpolation, it's definitely better to use the glob() function, because the older notation can cause people @@ -1715,10 +1726,10 @@ to become confused with the indirect filehandle notation. =head2 Constant Folding Like C, Perl does a certain amount of expression evaluation at -compile time, whenever it determines that all arguments to an +compile time whenever it determines that all arguments to an operator are static and have no side effects. In particular, string concatenation happens at compile time between literals that don't do -variable substitution. Backslash interpretation also happens at +variable substitution. Backslash interpolation also happens at compile time. You can say 'Now is the time for all' . "\n" . @@ -1731,20 +1742,20 @@ you say if (-s $file > 5 + 100 * 2**16) { } } -the compiler will precompute the number that -expression represents so that the interpreter -won't have to. +the compiler will precompute the number which that expression +represents so that the interpreter won't have to. =head2 Bitwise String Operators Bitstrings of any size may be manipulated by the bitwise operators (C<~ | & ^>). -If the operands to a binary bitwise op are strings of different sizes, -B<|> and B<^> ops will act as if the shorter operand had additional -zero bits on the right, while the B<&> op will act as if the longer -operand were truncated to the length of the shorter. Note that the -granularity for such extension or truncation is one or more I<bytes>. +If the operands to a binary bitwise op are strings of different +sizes, B<|> and B<^> ops act as though the shorter operand had +additional zero bits on the right, while the B<&> op acts as though +the longer operand were truncated to the length of the shorter. +The granularity for such extension or truncation is one or more +bytes. # ASCII-based examples print "j p \n" ^ " a h"; # prints "JAPH\n" @@ -1752,9 +1763,9 @@ granularity for such extension or truncation is one or more I<bytes>. print "japh\nJunk" & '_____'; # prints "JAPH\n"; print 'p N$' ^ " E<H\n"; # prints "Perl\n"; -If you are intending to manipulate bitstrings, you should be certain that +If you are intending to manipulate bitstrings, be certain that you're supplying bitstrings: If an operand is a number, that will imply -a B<numeric> bitwise operation. You may explicitly show which type of +a B<numeric> bitwise operation. You may explicitly show which type of operation you intend by using C<""> or C<0+>, as in the examples below. $foo = 150 | 105 ; # yields 255 (0x96 | 0x69 is 0xFF) @@ -1770,33 +1781,39 @@ in a bit vector. =head2 Integer Arithmetic -By default Perl assumes that it must do most of its arithmetic in +By default, Perl assumes that it must do most of its arithmetic in floating point. But by saying use integer; you may tell the compiler that it's okay to use integer operations -from here to the end of the enclosing BLOCK. An inner BLOCK may -countermand this by saying +(if it feels like it) from here to the end of the enclosing BLOCK. +An inner BLOCK may countermand this by saying no integer; -which lasts until the end of that BLOCK. - -The bitwise operators ("&", "|", "^", "~", "<<", and ">>") always -produce integral results. (But see also L<Bitwise String Operators>.) -However, C<use integer> still has meaning -for them. By default, their results are interpreted as unsigned -integers. However, if C<use integer> is in effect, their results are -interpreted as signed integers. For example, C<~0> usually evaluates -to a large integral value. However, C<use integer; ~0> is -1 on twos-complement machines. +which lasts until the end of that BLOCK. Note that this doesn't +mean everything is only an integer, merely that Perl may use integer +operations if it is so inclined. For example, even under C<use +integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731> +or so. + +Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<", +and ">>") always produce integral results. (But see also L<Bitwise +String Operators>.) However, C<use integer> still has meaning for +them. By default, their results are interpreted as unsigned integers, but +if C<use integer> is in effect, their results are interpreted +as signed integers. For example, C<~0> usually evaluates to a large +integral value. However, C<use integer; ~0> is C<-1> on twos-complement +machines. =head2 Floating-point Arithmetic While C<use integer> provides integer-only arithmetic, there is no -similar ways to provide rounding or truncation at a certain number of -decimal places. For rounding to a certain number of digits, sprintf() -or printf() is usually the easiest route. +analogous mechanism to provide automatic rounding or truncation to a +certain number of decimal places. For rounding to a certain number +of digits, sprintf() or printf() is usually the easiest route. +See L<perlfaq4>. Floating-point numbers are only approximations to what a mathematician would call real numbers. There are infinitely more reals than floats, @@ -1820,10 +1837,10 @@ this topic. } The POSIX module (part of the standard perl distribution) implements -ceil(), floor(), and a number of other mathematical and trigonometric -functions. The Math::Complex module (part of the standard perl -distribution) defines a number of mathematical functions that can also -work on real numbers. Math::Complex not as efficient as POSIX, but +ceil(), floor(), and other mathematical and trigonometric functions. +The Math::Complex module (part of the standard perl distribution) +defines mathematical functions that work on both the reals and the +imaginary numbers. Math::Complex not as efficient as POSIX, but POSIX can't work with complex numbers. Rounding in financial applications can have serious implications, and @@ -1835,13 +1852,17 @@ need yourself. =head2 Bigger Numbers The standard Math::BigInt and Math::BigFloat modules provide -variable precision arithmetic and overloaded operators. -At the cost of some space and considerable speed, they -avoid the normal pitfalls associated with limited-precision -representations. +variable-precision arithmetic and overloaded operators, although +they're currently pretty slow. At the cost of some space and +considerable speed, they avoid the normal pitfalls associated with +limited-precision representations. use Math::BigInt; $x = Math::BigInt->new('123456789123456789'); print $x * $x; # prints +15241578780673678515622620750190521 + +The non-standard modules SSLeay::BN and Math::Pari provide +equivalent functionality (and much more) with a substantial +performance savings. diff --git a/pod/perlopentut.pod b/pod/perlopentut.pod index 6e6091ab49..ae622a6e41 100644 --- a/pod/perlopentut.pod +++ b/pod/perlopentut.pod @@ -73,7 +73,7 @@ from a different file, and forget to trim it before opening: This is not a bug, but a feature. Because C<open> mimics the shell in its style of using redirection arrows to specify how to open the file, it also does so with respect to extra white space around the filename itself -as well. For accessing files with naughty names, see L</"Dispelling +as well. For accessing files with naughty names, see L<"Dispelling the Dweomer">. =head2 Pipe Opens diff --git a/pod/perlpod.pod b/pod/perlpod.pod index 7fa8290f1d..731a0fbd3d 100644 --- a/pod/perlpod.pod +++ b/pod/perlpod.pod @@ -11,7 +11,6 @@ L<verbatim|/"Verbatim Paragraph">, L<command|/"Command Paragraph">, and L<ordinary text|/"Ordinary Block of Text">. - =head2 Verbatim Paragraph A verbatim paragraph, distinguished by being indented (that is, @@ -20,7 +19,6 @@ with tabs assumed to be on 8-column boundaries. There are no special formatting escapes, so you can't italicize or anything like that. A \ means \, and nothing else. - =head2 Command Paragraph All command paragraphs start with "=", followed by an @@ -75,7 +73,6 @@ or use "=item 1.", "=item 2.", etc., to produce numbered lists, or use or numbers. If you start with bullets or numbers, stick with them, as many formatters use the first "=item" type to decide how to format the list. - =item =for =item =begin @@ -149,10 +146,8 @@ Some examples of lists include: =back - =back - =head2 Ordinary Block of Text It will be filled, and maybe even @@ -193,7 +188,6 @@ here and in commands: E<html> Some non-numeric HTML entity, such as E<Agrave> - =head2 The Intent That's it. The intent is simplicity, not power. I wanted paragraphs @@ -223,7 +217,6 @@ TeX, and other markup languages, as used for online documentation. Translators exist for B<pod2man> (that's for nroff(1) and troff(1)), B<pod2text>, B<pod2html>, B<pod2latex>, and B<pod2fm>. - =head2 Embedding Pods in Perl Modules You can embed pod documentation in your Perl scripts. Start your @@ -236,7 +229,6 @@ directive. __END__ - =head1 NAME modern - I am a modern module @@ -244,7 +236,6 @@ directive. If you had not had that empty line there, then the translators wouldn't have seen it. - =head2 Common Pod Pitfalls =over 4 diff --git a/pod/perlport.pod b/pod/perlport.pod index a2c798f8cc..7c73cd25a8 100644 --- a/pod/perlport.pod +++ b/pod/perlport.pod @@ -2,47 +2,46 @@ perlport - Writing portable Perl - =head1 DESCRIPTION -Perl runs on a variety of operating systems. While most of them share -a lot in common, they also have their own very particular and unique -features. +Perl runs on numerous operating systems. While most of them share +much in common, they also have their own unique features. This document is meant to help you to find out what constitutes portable -Perl code, so that once you have made your decision to write portably, +Perl code. That way once you make a decision to write portably, you know where the lines are drawn, and you can stay within them. -There is a tradeoff between taking full advantage of one particular type -of computer, and taking advantage of a full range of them. Naturally, -as you make your range bigger (and thus more diverse), the common -denominators drop, and you are left with fewer areas of common ground in -which you can operate to accomplish a particular task. Thus, when you -begin attacking a problem, it is important to consider which part of the -tradeoff curve you want to operate under. Specifically, whether it is -important to you that the task that you are coding needs the full -generality of being portable, or if it is sufficient to just get the job -done. This is the hardest choice to be made. The rest is easy, because -Perl provides lots of choices, whichever way you want to approach your +There is a tradeoff between taking full advantage of one particular +type of computer and taking advantage of a full range of them. +Naturally, as you broaden your range and become more diverse, the +common factors drop, and you are left with an increasingly smaller +area of common ground in which you can operate to accomplish a +particular task. Thus, when you begin attacking a problem, it is +important to consider under which part of the tradeoff curve you +want to operate. Specifically, you must decide whether it is +important that the task that you are coding have the full generality +of being portable, or whether to just get the job done right now. +This is the hardest choice to be made. The rest is easy, because +Perl provides many choices, whichever way you want to approach your problem. Looking at it another way, writing portable code is usually about -willfully limiting your available choices. Naturally, it takes discipline -to do that. +willfully limiting your available choices. Naturally, it takes +discipline and sacrifice to do that. The product of portability +and convenience may be a constant. You have been warned. Be aware of two important points: - =over 4 =item Not all Perl programs have to be portable -There is no reason why you should not use Perl as a language to glue Unix +There is no reason you should not use Perl as a language to glue Unix tools together, or to prototype a Macintosh application, or to manage the Windows registry. If it makes no sense to aim for portability for one reason or another in a given program, then don't bother. -=item The vast majority of Perl I<is> portable +=item Nearly all of Perl already I<is> portable Don't be fooled into thinking that it is hard to create portable Perl code. It isn't. Perl tries its level-best to bridge the gaps between @@ -53,9 +52,8 @@ writing portable code, and this document is entirely about those issues. =back - -Here's the general rule: When you approach a task that is commonly done -using a whole range of platforms, think in terms of writing portable +Here's the general rule: When you approach a task commonly done +using a whole range of platforms, think about writing portable code. That way, you don't sacrifice much by way of the implementation choices you can avail yourself of, and at the same time you can give your users lots of platform choices. On the other hand, when you have to @@ -63,47 +61,46 @@ take advantage of some unique feature of a particular platform, as is often the case with systems programming (whether for Unix, Windows, S<Mac OS>, VMS, etc.), consider writing platform-specific code. -When the code will run on only two or three operating systems, then you -may only need to consider the differences of those particular systems. -The important thing is to decide where the code will run, and to be +When the code will run on only two or three operating systems, you +may need to consider only the differences of those particular systems. +The important thing is to decide where the code will run and to be deliberate in your decision. The material below is separated into three main sections: main issues of portability (L<"ISSUES">, platform-specific issues (L<"PLATFORMS">, and -builtin perl functions that behave differently on various ports +built-in perl functions that behave differently on various ports (L<"FUNCTION IMPLEMENTATIONS">. This information should not be considered complete; it includes possibly transient information about idiosyncrasies of some of the ports, almost -all of which are in a state of constant evolution. Thus this material +all of which are in a state of constant evolution. Thus, this material should be considered a perpetual work in progress (E<lt>IMG SRC="yellow_sign.gif" ALT="Under Construction"E<gt>). - =head1 ISSUES =head2 Newlines In most operating systems, lines in files are terminated by newlines. Just what is used as a newline may vary from OS to OS. Unix -traditionally uses C<\012>, one kind of Windows I/O uses C<\015\012>, +traditionally uses C<\012>, one type of DOSish I/O uses C<\015\012>, and S<Mac OS> uses C<\015>. -Perl uses C<\n> to represent the "logical" newline, where what -is logical may depend on the platform in use. In MacPerl, C<\n> -always means C<\015>. In DOSish perls, C<\n> usually means C<\012>, but -when accessing a file in "text" mode, STDIO translates it to (or from) -C<\015\012>. C<\015\012> is commonly referred to as CRLF. - -Due to the "text" mode translation, DOSish perls have limitations -of using C<seek> and C<tell> when a file is being accessed in "text" -mode. Specifically, if you stick to C<seek>-ing to locations you got -from C<tell> (and no others), you are usually free to use C<seek> and -C<tell> even in "text" mode. In general, using C<seek> or C<tell> or -other file operations that count bytes instead of characters, without -considering the length of C<\n>, may be non-portable. If you use -C<binmode> on a file, however, you can usually use C<seek> and C<tell> -with arbitrary values quite safely. +Perl uses C<\n> to represent the "logical" newline, where what is +logical may depend on the platform in use. In MacPerl, C<\n> always +means C<\015>. In DOSish perls, C<\n> usually means C<\012>, but +when accessing a file in "text" mode, STDIO translates it to (or +from) C<\015\012>, depending on whether your reading or writing. +Unix does the same thing on ttys in canonical mode. C<\015\012> +is commonly referred to as CRLF. + +Because of the "text" mode translation, DOSish perls have limitations +in using C<seek> and C<tell> on a file accessed in "text" mode. +Stick to C<seek>-ing to locations you got from C<tell> (and no +others), and you are usually free to use C<seek> and C<tell> even +in "text" mode. Using C<seek> or C<tell> or other file operations +may be non-portable. If you use C<binmode> on a file, however, you +can usually C<seek> and C<tell> with arbitrary values in safety. A common misconception in socket programming is that C<\n> eq C<\012> everywhere. When using protocols such as common Internet protocols, @@ -121,15 +118,15 @@ such, the Socket module supplies the Right Thing for those who want it. print SOCKET "Hi there, client!$CRLF" # RIGHT When reading from a socket, remember that the default input record -separator C<$/> is C<\n>, but code like this should recognize C<$/> as -C<\012> or C<\015\012>: +separator C<$/> is C<\n>, but robust socket code will recognize as +either C<\012> or C<\015\012> as end of line: while (<SOCKET>) { # ... } -Since both CRLF and LF end in LF, the input record separator can -be set to LF, and the CR can be stripped later, if present. Better: +Because both CRLF and LF end in LF, the input record separator can +be set to LF and any CR stripped later. Better to write: use Socket qw(:DEFAULT :crlf); local($/) = LF; # not needed if $/ is already \012 @@ -139,17 +136,17 @@ be set to LF, and the CR can be stripped later, if present. Better: # s/\015?\012/\n/; # same thing } -And this example is actually better than the previous one even for Unix -platforms, because now any C<\015>'s (C<\cM>'s) are stripped out +This example is preferred over the previous one--even for Unix +platforms--because now any C<\015>'s (C<\cM>'s) are stripped out (and there was much rejoicing). Similarly, functions that return text data--such as a function that -fetches a web page--should, in some cases, translate newlines before -returning the data, if they've not yet been trsnalted to the local -newline. Often one line of code will suffice: +fetches a web page--should sometimes translate newlines before +returning the data, if they've not yet been translated to the local +newline representation. A single line of code will often suffice: - $data =~ s/\015?\012/\n/g; - return $data; + $data =~ s/\015?\012/\n/g; + return $data; Some of this may be confusing. Here's a handy reference to the ASCII CR and LF characters. You can print it out and stick it in your wallet. @@ -166,48 +163,51 @@ and LF characters. You can print it out and stick it in your wallet. --------------------------- * text-mode STDIO +The Unix column assumes that you are not accessing a serial line +(like a tty) in canonical mode. If you are, then CR on input becomes +"\n", and "\n" on output becomes CRLF. + These are just the most common definitions of C<\n> and C<\r> in Perl. There may well be others. - =head2 Numbers endianness and Width Different CPUs store integers and floating point numbers in different orders (called I<endianness>) and widths (32-bit and 64-bit being the -most common). This affects your programs if they attempt to transfer -numbers in binary format from one CPU architecture to another over some -channel, usually either "live" via network connection, or by storing the -numbers to secondary storage such as a disk file. +most common today). This affects your programs when they attempt to transfer +numbers in binary format from one CPU architecture to another, +usually either "live" via network connection, or by storing the +numbers to secondary storage such as a disk file or tape. -Conflicting storage orders make utter mess out of the numbers: if a +Conflicting storage orders make utter mess out of the numbers. If a little-endian host (Intel, Alpha) stores 0x12345678 (305419896 in decimal), a big-endian host (Motorola, MIPS, Sparc, PA) reads it as 0x78563412 (2018915346 in decimal). To avoid this problem in network (socket) connections use the C<pack> and C<unpack> formats C<n> -and C<N>, the "network" orders. They are guaranteed to be portable. +and C<N>, the "network" orders. These are guaranteed to be portable. -Different widths can cause truncation even between platforms of equal -endianness: the platform of shorter width loses the upper parts of the +Differing widths can cause truncation even between platforms of equal +endianness. The platform of shorter width loses the upper parts of the number. There is no good solution for this problem except to avoid transferring or storing raw binary numbers. -One can circumnavigate both these problems in two ways: either +One can circumnavigate both these problems in two ways. Either transfer and store numbers always in text format, instead of raw -binary, or consider using modules like Data::Dumper (included in -the standard distribution as of Perl 5.005) and Storable. - +binary, or else consider using modules like Data::Dumper (included in +the standard distribution as of Perl 5.005) and Storable. Keeping +all data as text significantly simplifies matters. =head2 Files and Filesystems Most platforms these days structure files in a hierarchical fashion. -So, it is reasonably safe to assume that any platform supports the +So, it is reasonably safe to assume that all platforms support the notion of a "path" to uniquely identify a file on the system. How -that path is actually written differs. +that path is really written, though, differs considerably. -While they are similar, file path specifications differ between Unix, -Windows, S<Mac OS>, OS/2, VMS, VOS, S<RISC OS> and probably others. -Unix, for example, is one of the few OSes that has the idea of a single -root directory. +Atlhough similar, file path specifications differ between Unix, +Windows, S<Mac OS>, OS/2, VMS, VOS, S<RISC OS>, and probably others. +Unix, for example, is one of the few OSes that has the elegant idea +of a single root directory. DOS, OS/2, VMS, VOS, and Windows can work similarly to Unix with C</> as path separator, or in their own idiosyncratic ways (such as having @@ -232,10 +232,10 @@ S<RISC OS> perl can emulate Unix filenames with C</> as path separator, or go native and use C<.> for path separator and C<:> to signal filesystems and disk names. -If all this is intimidating, have no (well, maybe only a little) fear. -There are modules that can help. The File::Spec modules provide -methods to do the Right Thing on whatever -platform happens to be running the program. +If all this is intimidating, have no (well, maybe only a little) +fear. There are modules that can help. The File::Spec modules +provide methods to do the Right Thing on whatever platform happens +to be running the program. use File::Spec::Functions; chdir(updir()); # go up one directory @@ -243,71 +243,72 @@ platform happens to be running the program. # on Unix and Win32, './temp/file.txt' # on Mac OS, ':temp:file.txt' -File::Spec is available in the standard distribution, as of version +File::Spec is available in the standard distribution as of version 5.004_05. -In general, production code should not have file paths hardcoded; making -them user supplied or from a configuration file is better, keeping in mind -that file path syntax varies on different machines. +In general, production code should not have file paths hardcoded. +Making them user-supplied or read from a configuration file is +better, keeping in mind that file path syntax varies on different +machines. This is especially noticeable in scripts like Makefiles and test suites, which often assume C</> as a path separator for subdirectories. -Also of use is File::Basename, from the standard distribution, which +Also of use is File::Basename from the standard distribution, which splits a pathname into pieces (base filename, full path to directory, and file suffix). -Even when on a single platform (if you can call UNIX a single platform), -remember not to count on the existence or the contents of +Even when on a single platform (if you can call Unix a single platform), +remember not to count on the existence or the contents of particular system-specific files or directories, like F</etc/passwd>, -F</etc/sendmail.conf>, F</etc/resolv.conf>, or even F</tmp/>. For -example, F</etc/passwd> may exist but it may not contain the encrypted -passwords because the system is using some form of enhanced security, -or it may not contain all the accounts because the system is using NIS. +F</etc/sendmail.conf>, F</etc/resolv.conf>, or even F</tmp/>. For +example, F</etc/passwd> may exist but not contain the encrypted +passwords, because the system is using some form of enhanced security. +Or it may not contain all the accounts, because the system is using NIS. If code does need to rely on such a file, include a description of the -file and its format in the code's documentation, and make it easy for +file and its format in the code's documentation, then make it easy for the user to override the default location of the file. -Don't assume a text file will end with a newline. +Don't assume a text file will end with a newline. They should, +but people forget. Do not have two files of the same name with different case, like F<test.pl> and F<Test.pl>, as many platforms have case-insensitive filenames. Also, try not to have non-word characters (except for C<.>) in the names, and keep them to the 8.3 convention, for maximum -portability. +portability, onerous a burden though this may appear. -Likewise, if using the AutoSplit module, try to keep the split functions to -8.3 naming and case-insensitive conventions; or, at the very least, +Likewise, when using the AutoSplit module, try to keep your functions to +8.3 naming and case-insensitive conventions; or, at the least, make it so the resulting files have a unique (case-insensitively) first 8 characters. -There certainly can be whitespace in filenames on most systems, but -some may not allow it. Many systems (DOS, VMS) cannot have more than -one C<.> in their filenames. +Whitespace in filenames is tolerated on most systems, but not all. +Many systems (DOS, VMS) cannot have more than one C<.> in their filenames. Don't assume C<E<gt>> won't be the first character of a filename. -Always use C<E<lt>> explicitly to open a file for reading. +Always use C<E<lt>> explicitly to open a file for reading, +unless you want the user to be able to specify a pipe open. open(FILE, "< $existing_file") or die $!; If filenames might use strange characters, it is safest to open it with C<sysopen> instead of C<open>. C<open> is magic and can translate characters like C<E<gt>>, C<E<lt>>, and C<|>, which may -be the wrong thing to do. - +be the wrong thing to do. (Sometimes, though, it's the right thing.) =head2 System Interaction -Not all platforms provide for the notion of a command line, necessarily. -These are usually platforms that rely on a Graphical User Interface (GUI) -for user interaction. So a program requiring command lines might not work -everywhere. But this is probably for the user of the program to deal -with, so don't stay up late worrying about it. +Not all platforms provide a command line. These are usually platforms +that rely primarily on a Graphical User Interface (GUI) for user +interaction. A program requiring a command line interface might +not work everywhere. This is probably for the user of the program +to deal with, so don't stay up late worrying about it. -Some platforms can't delete or rename files that are being held open by -the system. Remember to C<close> files when you are done with them. -Don't C<unlink> or C<rename> an open file. Don't C<tie> or C<open> a -file that is already tied or opened; C<untie> or C<close> first. +Some platforms can't delete or rename files held open by the system. +Remember to C<close> files when you are done with them. Don't +C<unlink> or C<rename> an open file. Don't C<tie> or C<open> a +file already tied or opened; C<untie> or C<close> it first. Don't open the same file more than once at a time for writing, as some operating systems put mandatory locks on such files. @@ -326,60 +327,59 @@ directories. Don't count on specific values of C<$!>. - =head2 Interprocess Communication (IPC) -In general, don't directly access the system in code that is meant to be -portable. That means, no C<system>, C<exec>, C<fork>, C<pipe>, C<``>, -C<qx//>, C<open> with a C<|>, nor any of the other things that makes being -a Unix perl hacker worth being. +In general, don't directly access the system in code meant to be +portable. That means, no C<system>, C<exec>, C<fork>, C<pipe>, +C<``>, C<qx//>, C<open> with a C<|>, nor any of the other things +that makes being a perl hacker worth being. Commands that launch external processes are generally supported on -most platforms (though many of them do not support any type of forking), -but the problem with using them arises from what you invoke with them. -External tools are often named differently on different platforms, often -not available in the same location, often accept different arguments, -often behave differently, and often represent their results in a -platform-dependent way. Thus you should seldom depend on them to produce -consistent results. - -The UNIX System V IPC (msg*(), sem*(), shm*()) is not available -even in all UNIX platforms. +most platforms (though many of them do not support any type of +forking). The problem with using them arises from what you invoke +them on. External tools are often named differently on different +platforms, may not be available in the same location, migth accept +different arguments, can behave differently, and often present their +results in a platform-dependent way. Thus, you should seldom depend +on them to produce consistent results. (Then again, if you're calling +I<netstat -a>, you probably don't expect it to run on both Unix and CP/M.) -One especially common bit of Perl code is opening a pipe to sendmail: +One especially common bit of Perl code is opening a pipe to B<sendmail>: - open(MAIL, '| /usr/lib/sendmail -t') or die $!; + open(MAIL, '|/usr/lib/sendmail -t') + or die "cannot fork sendmail: $!"; This is fine for systems programming when sendmail is known to be available. But it is not fine for many non-Unix systems, and even some Unix systems that may not have sendmail installed. If a portable -solution is needed, see the various distributions on CPAN that deal with -it. Mail::Mailer and Mail::Send in the MailTools distribution -are commonly used, and provide several mailing methods, including mail, +solution is needed, see the various distributions on CPAN that deal +with it. Mail::Mailer and Mail::Send in the MailTools distribution are +commonly used, and provide several mailing methods, including mail, sendmail, and direct SMTP (via Net::SMTP) if a mail transfer agent is not available. Mail::Sendmail is a standalone module that provides simple, platform-independent mailing. +The Unix System V IPC (C<msg*(), sem*(), shm*()>) is not available +even on all Unix platforms. + The rule of thumb for portable code is: Do it all in portable Perl, or use a module (that may internally implement it with platform-specific code, but expose a common interface). - =head2 External Subroutines (XS) -XS code, in general, can be made to work with any platform; but dependent +XS code can usually be made to work with any platform, but dependent libraries, header files, etc., might not be readily available or portable, or the XS code itself might be platform-specific, just as Perl code might be. If the libraries and headers are portable, then it is normally reasonable to make sure the XS code is portable, too. -There is a different kind of portability issue with writing XS -code: availability of a C compiler on the end-user's system. C brings -with it its own portability issues, and writing XS code will expose you to -some of those. Writing purely in perl is a comparatively easier way to +A different type of portability issue arises when writing XS code: +availability of a C compiler on the end-user's system. C brings +with it its own portability issues, and writing XS code will expose +you to some of those. Writing purely in Perl is an easier way to achieve portability. - =head2 Standard Modules In general, the standard modules work across platforms. Notable @@ -387,22 +387,21 @@ exceptions are the CPAN module (which currently makes connections to external programs that may not be available), platform-specific modules (like ExtUtils::MM_VMS), and DBM modules. -There is no one DBM module that is available on all platforms. +There is no one DBM module available on all platforms. SDBM_File and the others are generally available on all Unix and DOSish ports, but not in MacPerl, where only NBDM_File and DB_File are available. The good news is that at least some DBM module should be available, and AnyDBM_File will use whichever module it can find. Of course, then -the code needs to be fairly strict, dropping to the lowest common -denominator (e.g., not exceeding 1K for each record), so that it will +the code needs to be fairly strict, dropping to the greatest common +factor (e.g., not exceeding 1K for each record), so that it will work with any DBM module. See L<AnyDBM_File> for more details. - =head2 Time and Date The system's notion of time of day and calendar date is controlled in -widely different ways. Don't assume the timezone is stored in C<$ENV{TZ}>, +widely different ways. Don't assume the timezone is stored in C<$ENV{TZ}>, and even if it is, don't assume that you can control the timezone through that variable. @@ -415,29 +414,36 @@ Date::Parse. An array of values, such as those returned by C<localtime>, can be converted to an OS-specific representation using Time::Local. +When calculating specific times, such as for tests in time or date modules, +it may be appropriate to calculate an offset for the epoch. + + require Time::Local; + $offset = Time::Local::timegm(0, 0, 0, 1, 0, 70); + +The value for C<$offset> in Unix will be C<0>, but in Mac OS will be +some large number. C<$offset> can then be added to a Unix time value +to get what should be the proper value on any system. =head2 Character sets and character encoding -Assume very little about character sets. Do not assume anything about -the numerical values (C<ord>, C<chr>) of characters. Do not +Assume little about character sets. Assume nothing about +numerical values (C<ord>, C<chr>) of characters. Do not assume that the alphabetic characters are encoded contiguously (in -numerical sense). Do not assume anything about the ordering of the +the numeric sense). Do not assume anything about the ordering of the characters. The lowercase letters may come before or after the -uppercase letters, the lowercase and uppercase may be interlaced so -that both 'a' and 'A' come before the 'b', the accented and other +uppercase letters; the lowercase and uppercase may be interlaced so +that both `a' and `A' come before `b'; the accented and other international characters may be interlaced so that E<auml> comes -before the 'b'. - +before `b'. =head2 Internationalisation -If you may assume POSIX (a rather large assumption, that in practice -means UNIX), you may read more about the POSIX locale system (see -L<perllocale>. The locale system at least attempts to make things a -little bit more portable, or at least more convenient and -native-friendly for non-English users. The system affects character -sets and encoding, and date and time formatting, among other things. - +If you may assume POSIX (a rather large assumption), you may read +more about the POSIX locale system from L<perllocale>. The locale +system at least attempts to make things a little bit more portable, +or at least more convenient and native-friendly for non-English +users. The system affects character sets and encoding, and date +and time formatting--amongst other things. =head2 System Resources @@ -454,22 +460,21 @@ of avoiding wasteful constructs such as: while (<FILE>) {$file .= $_} # sometimes bad $file = join('', <FILE>); # better -The last two may appear unintuitive to most people. The first of those -two constructs repeatedly grows a string, while the second allocates a -large chunk of memory in one go. On some systems, the latter is more -efficient that the former. - +The last two constructs may appear unintuitive to most people. The +first repeatedly grows a string, whereas the second allocates a +large chunk of memory in one go. On some systems, the second is +more efficient that the first. =head2 Security -Most multi-user platforms provide basic levels of security that is usually -felt at the file-system level. Other platforms usually don't -(unfortunately). Thus the notion of user id, or "home" directory, or even -the state of being logged-in, may be unrecognizable on many platforms. If -you write programs that are security-conscious, it is usually best to know -what type of system you will be operating under, and write code explicitly -for that platform (or class of platforms). - +Most multi-user platforms provide basic levels of security, usually +implemented at the filesystem level. Some, however, do +not--unfortunately. Thus the notion of user id, or "home" directory, +or even the state of being logged-in, may be unrecognizable on many +platforms. If you write programs that are security-conscious, it +is usually best to know what type of system you will be running +under so that you can write code explicitly for that platform (or +class of platforms). =head2 Style @@ -479,15 +484,16 @@ to other platforms easier. Use the Config module and the special variable C<$^O> to differentiate platforms, as described in L<"PLATFORMS">. -Be careful not to depend on a specific output style for errors, -such as when checking C<$@> after an C<eval>. Some platforms -expect a certain output format, and perl on those platforms may -have been adjusted accordingly. Most specifically, don't anchor -a regex when testing an error value. - - $@ =~ /^I got an error!/ # may fail - $@ =~ /I got an error!/ # probably better - +Be careful in the tests you supply with your module or programs. +Module code may be fully portable, but its tests might not be. This +often happens when tests spawn off other processes or call external +programs to aid in the testing, or when (as noted above) the tests +assume certain things about the filesystem and paths. Be careful +not to depend on a specific output style for errors, such as when +checking C<$!> after an system call. Some platforms expect a certain +output format, and perl on those platforms may have been adjusted +accordingly. Most specifically, don't anchor a regex when testing +an error value. =head1 CPAN Testers @@ -498,7 +504,7 @@ this platform), or UNKNOWN (unknown), along with any relevant notations. The purpose of the testing is twofold: one, to help developers fix any problems in their code that crop up because of lack of testing on other -platforms; two, to provide users with information about whether or not +platforms; two, to provide users with information about whether a given module works on a given platform. =over 4 @@ -509,21 +515,19 @@ a given module works on a given platform. =back - =head1 PLATFORMS As of version 5.002, Perl is built with a C<$^O> variable that indicates the operating system it was built on. This was implemented -to help speed up code that would otherwise have to C<use Config;> and -use the value of C<$Config{'osname'}>. Of course, to get +to help speed up code that would otherwise have to C<use Config> +and use the value of C<$Config{osname}>. Of course, to get more detailed information about the system, looking into C<%Config> is certainly recommended. -C<%Config> cannot always be trusted, however, -because it is built at compile time, and if perl was built in once -place and transferred elsewhere, some values may be off, or the -values may have been edited after the fact. - +C<%Config> cannot always be trusted, however, because it was built +at compile time. If perl was built in one place, then transferred +elsewhere, some values may be wrong. The values may even have been +edited after the fact. =head2 Unix @@ -555,21 +559,19 @@ Unix flavors: sn4609 unicos CRAY_C90-unicos sn6521 unicosmk t3e-unicosmk sn9617 unicos CRAY_J90-unicos - sn9716 unicos CRAY_J90-unicos SunOS solaris sun4-solaris SunOS solaris i86pc-solaris SunOS4 sunos sun4-sunos -Note that because the C<$Config{'archname'}> may depend on the hardware -architecture it may vary quite a lot, much more than the C<$^O>. - +Because the value of C<$Config{archname}> may depend on the +hardware architecture, it can vary more than the value of C<$^O>. =head2 DOS and Derivatives -Perl has long been ported to PC style microcomputers running under +Perl has long been ported to Intel-style microcomputers running under systems like PC-DOS, MS-DOS, OS/2, and most Windows platforms you can bring yourself to mention (except for Windows CE, if you count that). -Users familiar with I<COMMAND.COM> and/or I<CMD.EXE> style shells should +Users familiar with I<COMMAND.COM> or I<CMD.EXE> style shells should be aware that each of these file specifications may have subtle differences: @@ -578,35 +580,39 @@ differences: $filespec2 = 'c:\foo\bar\file.txt'; $filespec3 = 'c:\\foo\\bar\\file.txt'; -System calls accept either C</> or C<\> as the path separator. However, -many command-line utilities of DOS vintage treat C</> as the option -prefix, so they may get confused by filenames containing C</>. Aside -from calling any external programs, C</> will work just fine, and -probably better, as it is more consistent with popular usage, and avoids -the problem of remembering what to backwhack and what not to. +System calls accept either C</> or C<\> as the path separator. +However, many command-line utilities of DOS vintage treat C</> as +the option prefix, so may get confused by filenames containing C</>. +Aside from calling any external programs, C</> will work just fine, +and probably better, as it is more consistent with popular usage, +and avoids the problem of remembering what to backwhack and what +not to. -The DOS FAT filesystem can only accommodate "8.3" style filenames. Under -the "case insensitive, but case preserving" HPFS (OS/2) and NTFS (NT) +The DOS FAT filesystem can accommodate only "8.3" style filenames. Under +the "case-insensitive, but case-preserving" HPFS (OS/2) and NTFS (NT) filesystems you may have to be careful about case returned with functions like C<readdir> or used with functions like C<open> or C<opendir>. -DOS also treats several filenames as special, such as AUX, PRN, NUL, CON, -COM1, LPT1, LPT2 etc. Unfortunately these filenames won't even work -if you include an explicit directory prefix, in some cases. It is best -to avoid such filenames, if you want your code to be portable to DOS -and its derivatives. +DOS also treats several filenames as special, such as AUX, PRN, +NUL, CON, COM1, LPT1, LPT2, etc. Unfortunately, sometimes these +filenames won't even work if you include an explicit directory +prefix. It is best to avoid such filenames, if you want your code +to be portable to DOS and its derivatives. It's hard to know what +these all are, unfortunately. Users of these operating systems may also wish to make use of -scripts such as F<pl2bat.bat> or F<pl2cmd> as appropriate to +scripts such as I<pl2bat.bat> or I<pl2cmd> to put wrappers around your scripts. Newline (C<\n>) is translated as C<\015\012> by STDIO when reading from and writing to files (see L<"Newlines">). C<binmode(FILEHANDLE)> will keep C<\n> translated as C<\012> for that filehandle. Since it is a no-op on other systems, C<binmode> should be used for cross-platform code -that deals with binary data. +that deals with binary data. That's assuming you realize in advance +that your data is in binary. General-purpose programs should +often assume nothing about their data. -The C<$^O> variable and the C<$Config{'archname'}> values for various +The C<$^O> variable and the C<$Config{archname}> values for various DOSish perls are as follows: OS $^O $Config{'archname'} @@ -636,7 +642,6 @@ C<ftp://hobbes.nmsu.edu/pub/os2/dev/emx> =back - =head2 S<Mac OS> Any module requiring XS compilation is right out for most people, because @@ -653,9 +658,9 @@ Directories are specified as: :file for relative pathnames file for relative pathnames -Files in a directory are stored in alphabetical order. Filenames are +Files are stored in the directory in alphabetical order. Filenames are limited to 31 characters, and may include any character except for -null and C<:>, which is reserved as path separator. +null and C<:>, which is reserved as the path separator. Instead of C<flock>, see C<FSpSetFLock> and C<FSpRstFLock> in the Mac::Files module, or C<chmod(0444, ...)> and C<chmod(0666, ...)>. @@ -669,13 +674,13 @@ line arguments. @ARGV = split /\s+/, MacPerl::Ask('Arguments?'); } -A MacPerl script saved as a droplet will populate C<@ARGV> with the full +A MacPerl script saved as a "droplet" will populate C<@ARGV> with the full pathnames of the files dropped onto the script. -Mac users can use programs on a kind of command line under MPW (Macintosh -Programmer's Workshop, a free development environment from Apple). -MacPerl was first introduced as an MPW tool, and MPW can be used like a -shell: +Mac users can run programs under a type of command line interface +under MPW (Macintosh Programmer's Workshop, a free development +environment from Apple). MacPerl was first introduced as an MPW +tool, and MPW can be used like a shell: perl myscript.plx some arguments @@ -699,10 +704,9 @@ environment. The new "Cocoa" environment (formerly called the "Yellow Box") may run a slightly modified version of MacPerl, using the Carbon interfaces. S<Mac OS X Server> and its Open Source version, Darwin, both run Unix -perl natively (with a small number of patches). Full support for these +perl natively (with a few patches). Full support for these is slated for perl5.006. - Also see: =over 4 @@ -715,11 +719,10 @@ Also see: =back - =head2 VMS Perl on VMS is discussed in F<vms/perlvms.pod> in the perl distribution. -Note that perl on VMS can accept either VMS- or Unix-style file +Perl on VMS can accept either VMS- or Unix-style file specifications as in either of the following: $ perl -ne "print if /perl_setup/i" SYS$LOGIN:LOGIN.COM @@ -737,7 +740,7 @@ For example: $ perl -e "print ""Hello, world.\n""" Hello, world. -There are a number of ways to wrap your perl scripts in DCL .COM files if +There are several ways to wrap your perl scripts in DCL F<.COM> files, if you are so inclined. For example: $ write sys$output "Hello from DCL!" @@ -760,9 +763,9 @@ length for filenames is 39 characters, and the maximum length for extensions is also 39 characters. Version is a number from 1 to 32767. Valid characters are C</[A-Z0-9$_-]/>. -VMS' RMS filesystem is case insensitive and does not preserve case. +VMS's RMS filesystem is case-insensitive and does not preserve case. C<readdir> returns lowercased filenames, but specifying a file for -opening remains case insensitive. Files without extensions have a +opening remains case-insensitive. Files without extensions have a trailing period on them, so doing a C<readdir> with a file named F<A.;5> will return F<a.> (though that file could be opened with C<open(FH, 'A')>). @@ -779,7 +782,7 @@ process on VMS, is a pure Perl module that can easily be installed on non-VMS platforms and can be helpful for conversions to and from RMS native formats. -What C<\n> represents depends on the type of file that is open. It could +What C<\n> represents depends on the type of file opened. It could be C<\015>, C<\012>, C<\015\012>, or nothing. Reading from a file translates newlines to C<\012>, unless C<binmode> was executed on that handle, just like DOSish perls. @@ -801,10 +804,10 @@ you can examine the content of the C<@INC> array like so: print "I'm not so sure about where $^O is...\n"; } -On VMS perl determines the UTC offset from the C<SYS$TIMEZONE_DIFFERENTIAL> -logical name. Though the VMS epoch began at 17-NOV-1858 00:00:00.00, +On VMS, perl determines the UTC offset from the C<SYS$TIMEZONE_DIFFERENTIAL> +logical name. Although the VMS epoch began at 17-NOV-1858 00:00:00.00, calls to C<localtime> are adjusted to count offsets from -01-JAN-1970 00:00:00.00 just like Unix. +01-JAN-1970 00:00:00.00, just like Unix. Also see: @@ -820,11 +823,10 @@ Put the words C<subscribe vmsperl> in message body. =back - =head2 VOS Perl on VOS is discussed in F<README.vos> in the perl distribution. -Note that perl on VOS can accept either VOS- or Unix-style file +Perl on VOS can accept either VOS- or Unix-style file specifications as in either of the following: $ perl -ne "print if /perl_setup/i" >system>notices @@ -834,7 +836,7 @@ or even a mixture of both as in: $ perl -ne "print if /perl_setup/i" >system/notices -Note that even though VOS allows the slash character to appear in object +Even though VOS allows the slash character to appear in object names, because the VOS port of Perl interprets it as a pathname delimiting character, VOS files, directories, or links whose names contain a slash character cannot be processed. Such files must be @@ -888,13 +890,12 @@ the message body to majordomo@list.stratagy.com. =back - =head2 EBCDIC Platforms Recent versions of Perl have been ported to platforms such as OS/400 on AS/400 minicomputers as well as OS/390 & VM/ESA for IBM Mainframes. Such computers use EBCDIC character sets internally (usually Character Code -Set ID 00819 for OS/400 and IBM-1047 for OS/390 & VM/ESA). Note that on +Set ID 00819 for OS/400 and IBM-1047 for OS/390 & VM/ESA). On the mainframe perl currently works under the "Unix system services for OS/390" (formerly known as OpenEdition) and VM/ESA OpenEdition. @@ -910,7 +911,7 @@ similar to the following simple script: print "Hello from perl!\n"; -On the AS/400, assuming that PERL5 is in your library list, you may need +On the AS/400, if PERL5 is in your library list, you may need to wrap your perl scripts in a CL procedure to invoke them like so: BEGIN @@ -928,9 +929,9 @@ well as bit-fiddling with ASCII constants using operators like C<^>, C<&> and C<|>, not to mention dealing with socket interfaces to ASCII computers (see L<"Newlines">). -Fortunately, most web servers for the mainframe will correctly translate -the C<\n> in the following statement to its ASCII equivalent (note that -C<\r> is the same under both Unix and OS/390 & VM/ESA): +Fortunately, most web servers for the mainframe will correctly +translate the C<\n> in the following statement to its ASCII equivalent +(C<\r> is the same under both Unix and OS/390 & VM/ESA): print "Content-type: text/html\r\n\r\n"; @@ -947,7 +948,7 @@ platform could include any of the following (perhaps all): if (chr(169) eq 'z') { print "EBCDIC may be spoken here!\n"; } -Note that one thing you may not want to rely on is the EBCDIC encoding +One thing you may not want to rely on is the EBCDIC encoding of punctuation characters since these may differ from code page to code page (and once your module or script is rumoured to work with EBCDIC, folks will want it to work with all EBCDIC character sets). @@ -966,15 +967,14 @@ general usage issues for all EBCDIC Perls. Send a message body of =back - =head2 Acorn RISC OS -As Acorns use ASCII with newlines (C<\n>) in text files as C<\012> like -Unix and Unix filename emulation is turned on by default, it is quite -likely that most simple scripts will work "out of the box". The native +Because Acorns use ASCII with newlines (C<\n>) in text files as C<\012> like +Unix, and because Unix filename emulation is turned on by default, +most simple scripts will probably work "out of the box". The native filesystem is modular, and individual filesystems are free to be case-sensitive or insensitive, and are usually case-preserving. Some -native filesystems have name length limits which file and directory +native filesystems have name length limits, which file and directory names are silently truncated to fit. Scripts should be aware that the standard filesystem currently has a name length limit of B<10> characters, with up to 77 items in a directory, but other filesystems @@ -1002,10 +1002,10 @@ the second stage of C<$> interpolation in regular expressions will fall foul of the C<$.> if scripts are not careful. Logical paths specified by system variables containing comma-separated -search lists are also allowed, hence C<System:Modules> is a valid +search lists are also allowed; hence C<System:Modules> is a valid filename, and the filesystem will prefix C<Modules> with each section of C<System$Path> until a name is made that points to an object on disk. -Writing to a new file C<System:Modules> would only be allowed if +Writing to a new file C<System:Modules> would be allowed only if C<System$Path> contains a single item list. The filesystem will also expand system variables in filenames if enclosed in angle brackets, so C<E<lt>System$DirE<gt>.Modules> would look for the file @@ -1017,7 +1017,7 @@ Because C<.> was in use as a directory separator and filenames could not be assumed to be unique after 10 characters, Acorn implemented the C compiler to strip the trailing C<.c> C<.h> C<.s> and C<.o> suffix from filenames specified in source code and store the respective files in -subdirectories named after the suffix. Hence files are translated: +subdirectories named after the suffix. Hence files are translated: foo.h h.foo C:foo.h C:h.foo (logical path variable) @@ -1027,25 +1027,25 @@ subdirectories named after the suffix. Hence files are translated: 11charname_.c c.11charname (assuming filesystem truncates at 10) The Unix emulation library's translation of filenames to native assumes -that this sort of translation is required, and allows a user defined list -of known suffixes which it will transpose in this fashion. This may -appear transparent, but consider that with these rules C<foo/bar/baz.h> +that this sort of translation is required, and it allows a user-defined list +of known suffixes that it will transpose in this fashion. This may +seem transparent, but consider that with these rules C<foo/bar/baz.h> and C<foo/bar/h/baz> both map to C<foo.bar.h.baz>, and that C<readdir> and C<glob> cannot and do not attempt to emulate the reverse mapping. Other C<.>'s in filenames are translated to C</>. -As implied above the environment accessed through C<%ENV> is global, and +As implied above, the environment accessed through C<%ENV> is global, and the convention is that program specific environment variables are of the form C<Program$Name>. Each filesystem maintains a current directory, and the current filesystem's current directory is the B<global> current -directory. Consequently, sociable scripts don't change the current -directory but rely on full pathnames, and scripts (and Makefiles) cannot +directory. Consequently, sociable programs don't change the current +directory but rely on full pathnames, and programs (and Makefiles) cannot assume that they can spawn a child process which can change the current directory without affecting its parent (and everyone else for that matter). -As native operating system filehandles are global and currently are -allocated down from 255, with 0 being a reserved value the Unix emulation +Because native operating system filehandles are global and are currently +allocated down from 255, with 0 being a reserved value, the Unix emulation library emulates Unix filehandles. Consequently, you can't rely on passing C<STDIN>, C<STDOUT>, or C<STDERR> to your children. @@ -1059,26 +1059,27 @@ right. Of course, the problem remains that scripts cannot rely on any Unix tools being available, or that any tools found have Unix-like command line arguments. -Extensions and XS are, in theory, buildable by anyone using free tools. -In practice, many don't, as users of the Acorn platform are used to binary -distribution. MakeMaker does run, but no available make currently copes -with MakeMaker's makefiles; even if/when this is fixed, the lack of a -Unix-like shell can cause problems with makefile rules, especially lines -of the form C<cd sdbm && make all>, and anything using quoting. +Extensions and XS are, in theory, buildable by anyone using free +tools. In practice, many don't, as users of the Acorn platform are +used to binary distributions. MakeMaker does run, but no available +make currently copes with MakeMaker's makefiles; even if and when +this should be fixed, the lack of a Unix-like shell will cause +problems with makefile rules, especially lines of the form C<cd +sdbm && make all>, and anything using quoting. "S<RISC OS>" is the proper name for the operating system, but the value in C<$^O> is "riscos" (because we don't like shouting). - =head2 Other perls -Perl has been ported to a variety of platforms that do not fit into any of -the above categories. Some, such as AmigaOS, Atari MiNT, BeOS, HP MPE/iX, -QNX, Plan 9, and VOS, have been well-integrated into the standard Perl source -code kit. You may need to see the F<ports/> directory on CPAN for -information, and possibly binaries, for the likes of: aos, Atari ST, lynxos, -riscos, Novell Netware, Tandem Guardian, I<etc.> (yes we know that some of -these OSes may fall under the Unix category, but we are not a standards body.) +Perl has been ported to many platforms that do not fit into any of +the categories listed above. Some, such as AmigaOS, Atari MiNT, +BeOS, HP MPE/iX, QNX, Plan 9, and VOS, have been well-integrated +into the standard Perl source code kit. You may need to see the +F<ports/> directory on CPAN for information, and possibly binaries, +for the likes of: aos, Atari ST, lynxos, riscos, Novell Netware, +Tandem Guardian, I<etc.> (Yes, we know that some of these OSes may +fall under the Unix category, but we are not a standards body.) See also: @@ -1096,24 +1097,24 @@ as well as from CPAN. =back - =head1 FUNCTION IMPLEMENTATIONS -Listed below are functions unimplemented or implemented differently on -various platforms. Following each description will be, in parentheses, a -list of platforms that the description applies to. +Listed below are functions that are either completely unimplemented +or else have been implemented differently on various platforms. +Following each description will be, in parentheses, a list of +platforms that the description applies to. -The list may very well be incomplete, or wrong in some places. When in -doubt, consult the platform-specific README files in the Perl source -distribution, and other documentation resources for a given port. +The list may well be incomplete, or even wrong in some places. When +in doubt, consult the platform-specific README files in the Perl +source distribution, and any other documentation resources accompanying +a given port. Be aware, moreover, that even among Unix-ish systems there are variations. -For many functions, you can also query C<%Config>, exported by default -from the Config module. For example, to check if the platform has the C<lstat> -call, check C<$Config{'d_lstat'}>. See L<Config> for a full -description of available variables. - +For many functions, you can also query C<%Config>, exported by +default from the Config module. For example, to check whether the +platform has the C<lstat> call, check C<$Config{d_lstat}>. See +L<Config> for a full description of available variables. =head2 Alphabetical Listing of Perl Functions @@ -1125,19 +1126,19 @@ description of available variables. =item -X -C<-r>, C<-w>, and C<-x> have only a very limited meaning; directories +C<-r>, C<-w>, and C<-x> have a limited meaning only; directories and applications are executable, and there are no uid/gid -considerations. C<-o> is not supported. (S<Mac OS>) +considerations. C<-o> is not supported. (S<Mac OS>) -C<-r>, C<-w>, C<-x>, and C<-o> tell whether or not file is accessible, -which may not reflect UIC-based file protections. (VMS) +C<-r>, C<-w>, C<-x>, and C<-o> tell whether the file is accessible, +which may not reflect UIC-based file protections. (VMS) C<-s> returns the size of the data fork, not the total size of data fork plus resource fork. (S<Mac OS>). C<-s> by name on an open file will return the space reserved on disk, rather than the current extent. C<-s> on an open filehandle returns the -current size. (S<RISC OS>) +current size. (S<RISC OS>) C<-R>, C<-W>, C<-X>, C<-O> are indistinguishable from C<-r>, C<-w>, C<-x>, C<-o>. (S<Mac OS>, Win32, VMS, S<RISC OS>) @@ -1153,17 +1154,17 @@ C<-d> is true if passed a device spec without an explicit directory. C<-T> and C<-B> are implemented, but might misclassify Mac text files with foreign characters; this is the case will all platforms, but may -affect S<Mac OS> often. (S<Mac OS>) +affect S<Mac OS> often. (S<Mac OS>) C<-x> (or C<-X>) determine if a file ends in one of the executable -suffixes. C<-S> is meaningless. (Win32) +suffixes. C<-S> is meaningless. (Win32) C<-x> (or C<-X>) determine if a file has an executable file type. (S<RISC OS>) =item binmode FILEHANDLE -Meaningless. (S<Mac OS>, S<RISC OS>) +Meaningless. (S<Mac OS>, S<RISC OS>) Reopens file and restores pointer; if function fails, underlying filehandle may be closed, or pointer may be in a different position. @@ -1174,7 +1175,7 @@ the filehandle may be flushed. (Win32) =item chmod LIST -Only limited meaning. Disabling/enabling write permission is mapped to +Only limited meaning. Disabling/enabling write permission is mapped to locking/unlocking the file. (S<Mac OS>) Only good for changing "owner" read-write access, "group", and "other" @@ -1374,7 +1375,7 @@ Not implemented. (S<Mac OS>, Plan9) Globbing built-in, but only C<*> and C<?> metacharacters are supported. (S<Mac OS>) -Features depend on external perlglob.exe or perlglob.bat. May be +Features depend on external perlglob.exe or perlglob.bat. May be overridden with something like File::DosGlob, which is recommended. (Win32) @@ -1431,7 +1432,7 @@ Not implemented. (S<Mac OS>, Win32, VMS, Plan9, S<RISC OS>, VOS) =item open FILEHANDLE -The C<|> variants are only supported if ToolServer is installed. +The C<|> variants are supported only if ToolServer is installed. (S<Mac OS>) open to C<|-> and C<-|> are unsupported. (S<Mac OS>, Win32, S<RISC OS>) @@ -1524,7 +1525,7 @@ OS>, OS/390, VM/ESA) Only implemented if ToolServer is installed. (S<Mac OS>) As an optimization, may not call the command shell specified in -C<$ENV{PERL5SHELL}>. C<system(1, @args)> spawns an external +C<$ENV{PERL5SHELL}>. C<system(1, @args)> spawns an external process and immediately returns its process designator, without waiting for it to terminate. Return value may be used subsequently in C<wait> or C<waitpid>. (Win32) @@ -1573,8 +1574,8 @@ should not be held open elsewhere. (Win32) Returns undef where unavailable, as of version 5.005. -C<umask()> works but the correct permissions are only set when the file -is finally close()d. (AmigaOS) +C<umask> works but the correct permissions are set only when the file +is finally closed. (AmigaOS) =item utime LIST @@ -1603,6 +1604,14 @@ Not useful. (S<RISC OS>) =over 4 +=item v1.43, 24 May 1999 + +Added a lot of cleaning up from Tom Christiansen. + +=item v1.42, 22 May 1999 + +Added notes about tests, sprintf/printf, and epoch offsets. + =item v1.41, 19 May 1999 Lots more little changes to formatting and content. @@ -1675,6 +1684,7 @@ Nick Ing-Simmons E<lt>nick@ni-s.u-net.comE<gt>, Andreas J. KE<ouml>nig E<lt>koenig@kulturbox.deE<gt>, Markus Laker E<lt>mlaker@contax.co.ukE<gt>, Andrew M. Langmead E<lt>aml@world.std.comE<gt>, +Larry Moore E<lt>ljmoore@freespace.netE<gt>, Paul Moore E<lt>Paul.Moore@uk.origin-it.comE<gt>, Chris Nandor E<lt>pudge@pobox.comE<gt>, Matthias Neeracher E<lt>neeri@iis.ee.ethz.chE<gt>, @@ -1693,4 +1703,4 @@ E<lt>pudge@pobox.comE<gt>. =head1 VERSION -Version 1.41, last modified 19 May 1999 +Version 1.43, last modified 24 May 1999 diff --git a/pod/perlre.pod b/pod/perlre.pod index 95d473439e..ca95638605 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -6,13 +6,13 @@ perlre - Perl regular expressions This page describes the syntax of regular expressions in Perl. For a description of how to I<use> regular expressions in matching -operations, plus various examples of the same, see discussion +operations, plus various examples of the same, see discussions of C<m//>, C<s///>, C<qr//> and C<??> in L<perlop/"Regexp Quote-Like Operators">. -The matching operations can have various modifiers. The modifiers +Matching operations can have various modifiers. Modifiers that relate to the interpretation of the regular expression inside -are listed below. For the modifiers that alter the way a regular expression -is used by Perl, see L<perlop/"Regexp Quote-Like Operators"> and +are listed below. Modifiers that alter the way a regular expression +is used by Perl are detailed in L<perlop/"Regexp Quote-Like Operators"> and L<perlop/"Gory details of parsing quoted constructs">. =over 4 @@ -27,20 +27,21 @@ locale. See L<perllocale>. =item m Treat string as multiple lines. That is, change "^" and "$" from matching -at only the very start or end of the string to the start or end of any +the start or end of the string to matching the start or end of any line anywhere within the string. =item s Treat string as single line. That is, change "." to match any character -whatsoever, even a newline, which it normally would not match. +whatsoever, even a newline, which normally it would not match. -The C</s> and C</m> modifiers both override the C<$*> setting. That is, no matter -what C<$*> contains, C</s> without C</m> will force "^" to match only at the -beginning of the string and "$" to match only at the end (or just before a -newline at the end) of the string. Together, as /ms, they let the "." match -any character whatsoever, while yet allowing "^" and "$" to match, -respectively, just after and just before newlines within the string. +The C</s> and C</m> modifiers both override the C<$*> setting. That +is, no matter what C<$*> contains, C</s> without C</m> will force +"^" to match only at the beginning of the string and "$" to match +only at the end (or just before a newline at the end) of the string. +Together, as /ms, they let the "." match any character whatsoever, +while yet allowing "^" and "$" to match, respectively, just after +and just before newlines within the string. =item x @@ -49,9 +50,9 @@ Extend your pattern's legibility by permitting whitespace and comments. =back These are usually written as "the C</x> modifier", even though the delimiter -in question might not actually be a slash. In fact, any of these +in question might not really be a slash. Any of these modifiers may also be embedded within the regular expression itself using -the new C<(?...)> construct. See below. +the C<(?...)> construct. See below. The C</x> modifier itself needs a little more explanation. It tells the regular expression parser to ignore whitespace that is neither @@ -59,7 +60,7 @@ backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The C<#> character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real -whitespace or C<#> characters in the pattern (outside of a character +whitespace or C<#> characters in the pattern (outside a character class, where they are unaffected by C</x>), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making Perl's regular expressions @@ -70,11 +71,11 @@ in L<perlop>. =head2 Regular Expressions -The patterns used in pattern matching are regular expressions such as -those supplied in the Version 8 regex routines. (In fact, the -routines are derived (distantly) from Henry Spencer's freely -redistributable reimplementation of the V8 routines.) -See L<Version 8 Regular Expressions> for details. +The patterns used in Perl pattern matching derive from supplied in +the Version 8 regex routines. (The routines are derived +(distantly) from Henry Spencer's freely redistributable reimplementation +of the V8 routines.) See L<Version 8 Regular Expressions> for +details. In particular the following metacharacters have their standard I<egrep>-ish meanings: @@ -87,9 +88,9 @@ meanings: () Grouping [] Character class -By default, the "^" character is guaranteed to match at only the -beginning of the string, the "$" character at only the end (or before the -newline at the end) and Perl does certain optimizations with the +By default, the "^" character is guaranteed to match only the +beginning of the string, the "$" character only the end (or before the +newline at the end), and Perl does certain optimizations with the assumption that the string contains only one line. Embedded newlines will not be matched by "^" or "$". You may, however, wish to treat a string as a multi-line buffer, such that the "^" will match after any @@ -98,7 +99,7 @@ cost of a little more overhead, you can do this by using the /m modifier on the pattern match operator. (Older programs did this by setting C<$*>, but this practice is now deprecated.) -To facilitate multi-line substitutions, the "." character never matches a +To simplify multi-line substitutions, the "." character never matches a newline unless you use the C</s> modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't. The C</s> modifier also overrides the setting of C<$*>, in case you have some (badly behaved) older @@ -177,12 +178,13 @@ In addition, Perl defines the following: equivalent to C<(?:\PM\pM*)> \C Match a single C char (octet) even under utf8. -A C<\w> matches a single alphanumeric character, not a whole -word. To match a word you'd need to say C<\w+>. If C<use locale> is in -effect, the list of alphabetic characters generated by C<\w> is taken -from the current locale. See L<perllocale>. You may use C<\w>, C<\W>, -C<\s>, C<\S>, C<\d>, and C<\D> within character classes (though not as -either end of a range). +A C<\w> matches a single alphanumeric character, not a whole word. +Use C<\w+> to match a string of Perl-identifier characters (which isn't +the same as matching an English word). If C<use locale> is in effect, the +list of alphabetic characters generated by C<\w> is taken from the +current locale. See L<perllocale>. You may use C<\w>, C<\W>, C<\s>, C<\S>, +C<\d>, and C<\D> within character classes (though not as either end of +a range). See L<utf8> for details about C<\pP>, C<\PP>, and C<\X>. Perl defines the following zero-width assertions: @@ -193,99 +195,154 @@ Perl defines the following zero-width assertions: \z Match only at end of string \G Match only where previous m//g left off (works only with /g) -A word boundary (C<\b>) is defined as a spot between two characters that -has a C<\w> on one side of it and a C<\W> on the other side of it (in -either order), counting the imaginary characters off the beginning and -end of the string as matching a C<\W>. (Within character classes C<\b> -represents backspace rather than a word boundary.) The C<\A> and C<\Z> are -just like "^" and "$", except that they won't match multiple times when the -C</m> modifier is used, while "^" and "$" will match at every internal line -boundary. To match the actual end of the string, not ignoring newline, -you can use C<\z>. The C<\G> assertion can be used to chain global -matches (using C<m//g>), as described in -L<perlop/"Regexp Quote-Like Operators">. - -It is also useful when writing C<lex>-like scanners, when you have several -patterns that you want to match against consequent substrings of your -string, see the previous reference. -The actual location where C<\G> will match can also be influenced -by using C<pos()> as an lvalue. See L<perlfunc/pos>. - -When the bracketing construct C<( ... )> is used, \E<lt>digitE<gt> matches the -digit'th substring. Outside of the pattern, always use "$" instead of "\" -in front of the digit. (While the \E<lt>digitE<gt> notation can on rare occasion work -outside the current pattern, this should not be relied upon. See the -WARNING below.) The scope of $E<lt>digitE<gt> (and C<$`>, C<$&>, and C<$'>) -extends to the end of the enclosing BLOCK or eval string, or to the next -successful pattern match, whichever comes first. If you want to use -parentheses to delimit a subpattern (e.g., a set of alternatives) without -saving it as a subpattern, follow the ( with a ?:. - -You may have as many parentheses as you wish. If you have more -than 9 substrings, the variables $10, $11, ... refer to the -corresponding substring. Within the pattern, \10, \11, etc. refer back -to substrings if there have been at least that many left parentheses before -the backreference. Otherwise (for backward compatibility) \10 is the -same as \010, a backspace, and \11 the same as \011, a tab. And so -on. (\1 through \9 are always backreferences.) - -C<$+> returns whatever the last bracket match matched. C<$&> returns the -entire matched string. (C<$0> used to return the same thing, but not any -more.) C<$`> returns everything before the matched string. C<$'> returns -everything after the matched string. Examples: +A word boundary (C<\b>) is a spot between two characters +that has a C<\w> on one side of it and a C<\W> on the other side +of it (in either order), counting the imaginary characters off the +beginning and end of the string as matching a C<\W>. (Within +character classes C<\b> represents backspace rather than a word +boundary, just as it normally does in any double-quoted string.) +The C<\A> and C<\Z> are just like "^" and "$", except that they +won't match multiple times when the C</m> modifier is used, while +"^" and "$" will match at every internal line boundary. To match +the actual end of the string and not ignore an optional trailing +newline, use C<\z>. + +The C<\G> assertion can be used to chain global matches (using +C<m//g>), as described in L<perlop/"Regexp Quote-Like Operators">. +It is also useful when writing C<lex>-like scanners, when you have +several patterns that you want to match against consequent substrings +of your string, see the previous reference. The actual location +where C<\G> will match can also be influenced by using C<pos()> as +an lvalue. See L<perlfunc/pos>. + +The bracketing construct C<( ... )> creates capture buffers. To +refer to the digit'th buffer use \E<lt>digitE<gt> within the +match. Outside the match use "$" instead of "\". (The +\E<lt>digitE<gt> notation works in certain circumstances outside +the match. See the warning below about \1 vs $1 for details.) +Referring back to another part of the match is called a +I<backreference>. + +There is no limit to the number of captured substrings that you may +use. However Perl also uses \10, \11, etc. as aliases for \010, +\011, etc. (Recall that 0 means octal, so \011 is the 9'th ASCII +character, a tab.) Perl resolves this ambiguity by interpreting +\10 as a backreference only if at least 10 left parentheses have +opened before it. Likewise \11 is a backreference only if at least +11 left parentheses have opened before it. And so on. \1 through +\9 are always interpreted as backreferences." + +Examples: s/^([^ ]*) *([^ ]*)/$2 $1/; # swap first two words - if (/Time: (..):(..):(..)/) { + if (/(.)\1/) { # find first doubled char + print "'$1' is the first doubled character\n"; + } + + if (/Time: (..):(..):(..)/) { # parse out values $hours = $1; $minutes = $2; $seconds = $3; } - -Once perl sees that you need one of C<$&>, C<$`> or C<$'> anywhere in -the program, it has to provide them on each and every pattern match. -This can slow your program down. The same mechanism that handles -these provides for the use of $1, $2, etc., so you pay the same price -for each pattern that contains capturing parentheses. But if you never -use $&, etc., in your script, then patterns I<without> capturing -parentheses won't be penalized. So avoid $&, $', and $` if you can, -but if you can't (and some algorithms really appreciate them), once -you've used them once, use them at will, because you've already paid -the price. As of 5.005, $& is not so costly as the other two. - -Backslashed metacharacters in Perl are -alphanumeric, such as C<\b>, C<\w>, C<\n>. Unlike some other regular -expression languages, there are no backslashed symbols that aren't -alphanumeric. So anything that looks like \\, \(, \), \E<lt>, \E<gt>, -\{, or \} is always interpreted as a literal character, not a -metacharacter. This was once used in a common idiom to disable or -quote the special meanings of regular expression metacharacters in a -string that you want to use for a pattern. Simply quote all -non-alphanumeric characters: + +Several special variables also refer back to portions of the previous +match. C<$+> returns whatever the last bracket match matched. +C<$&> returns the entire matched string. (At one point C<$0> did +also, but now it returns the name of the program.) C<$`> returns +everything before the matched string. And C<$'> returns everything +after the matched string. + +The numbered variables ($1, $2, $3, etc.) and the related punctuation +set (C<<$+>, C<$&>, C<$`>, and C<$'>) are all dynamically scoped +until the end of the enclosing block or until the next successful +match, whichever comes first. (See L<perlsyn/"Compound Statements">.) + +B<WARNING>: Once Perl sees that you need one of C<$&>, C<$`>, or +C<$'> anywhere in the program, it has to provide them for every +pattern match. This may substantially slow your program. Perl +uses the same mechanism to produce $1, $2, etc, so you also pay a +price for each pattern that contains capturing parentheses. (To +avoid this cost while retaining the grouping behaviour, use the +extended regular expression C<(?: ... )> instead.) But if you never +use C<$&>, C<$`> or C<$'>, then patterns I<without> capturing +parentheses will not be penalized. So avoid C<$&>, C<$'>, and C<$`> +if you can, but if you can't (and some algorithms really appreciate +them), once you've used them once, use them at will, because you've +already paid the price. As of 5.005, C<$&> is not so costly as the +other two. + +Backslashed metacharacters in Perl are alphanumeric, such as C<\b>, +C<\w>, C<\n>. Unlike some other regular expression languages, there +are no backslashed symbols that aren't alphanumeric. So anything +that looks like \\, \(, \), \E<lt>, \E<gt>, \{, or \} is always +interpreted as a literal character, not a metacharacter. This was +once used in a common idiom to disable or quote the special meanings +of regular expression metacharacters in a string that you want to +use for a pattern. Simply quote all non-alphanumeric characters: $pattern =~ s/(\W)/\\$1/g; -Now it is much more common to see either the quotemeta() function or -the C<\Q> escape sequence used to disable all metacharacters' special +Today it is more common to use the quotemeta() function or the C<\Q> +metaquoting escape sequence to disable all metacharacters' special meanings like this: /$unquoted\Q$quoted\E$unquoted/ -Perl defines a consistent extension syntax for regular expressions. -The syntax is a pair of parentheses with a question mark as the first -thing within the parentheses (this was a syntax error in older -versions of Perl). The character after the question mark gives the -function of the extension. Several extensions are already supported: +=head2 Extended Patterns + +Perl also defines a consistent extension syntax for features not +found in standard tools like B<awk> and B<lex>. The syntax is a +pair of parentheses with a question mark as the first thing within +the parentheses. The character after the question mark indicates +the extension. + +The stability of these extensions varies widely. Some have been +part of the core language for many years. Others are experimental +and may change without warning or be completely removed. Check +the documentation on an individual feature to verify its current +status. + +A question mark was chosen for this and for the minimal-matching +construct because 1) question marks are rare in older regular +expressions, and 2) whenever you see one, you should stop and +"question" exactly what is going on. That's psychology... =over 10 =item C<(?#text)> -A comment. The text is ignored. If the C</x> switch is used to enable -whitespace formatting, a simple C<#> will suffice. Note that perl closes +A comment. The text is ignored. If the C</x> modifier enables +whitespace formatting, a simple C<#> will suffice. Note that Perl closes the comment as soon as it sees a C<)>, so there is no way to put a literal C<)> in the comment. +=item C<(?imsx-imsx)> + +One or more embedded pattern-match modifiers. This is particularly +useful for dynamic patterns, such as those read in from a configuration +file, read in as an argument, are specified in a table somewhere, +etc. Consider the case that some of which want to be case sensitive +and some do not. The case insensitive ones need to include merely +C<(?i)> at the front of the pattern. For example: + + $pattern = "foobar"; + if ( /$pattern/i ) { } + + # more flexible: + + $pattern = "(?i)foobar"; + if ( /$pattern/ ) { } + +Letters after a C<-> turn those modifiers off. These modifiers are +localized inside an enclosing group (if any). For example, + + ( (?i) blah ) \s+ \1 + +will match a repeated (I<including the case>!) word C<blah> in any +case, assuming C<x> modifier, and no C<i> modifier outside this +group. + =item C<(?:pattern)> =item C<(?imsx-imsx:pattern)> @@ -299,28 +356,29 @@ is like @fields = split(/\b(a|b|c)\b/) -but doesn't spit out extra fields. +but doesn't spit out extra fields. It's also cheaper not to capture +characters if you don't need to. -The letters between C<?> and C<:> act as flags modifiers, see -L<C<(?imsx-imsx)>>. In particular, +Any letters between C<?> and C<:> act as flags modifiers as with +C<(?imsx-imsx)>. For example, /(?s-i:more.*than).*million/i -is equivalent to more verbose +is equivalent to the more verbose /(?:(?s-i)more.*than).*million/i =item C<(?=pattern)> -A zero-width positive lookahead assertion. For example, C</\w+(?=\t)/> +A zero-width positive look-ahead assertion. For example, C</\w+(?=\t)/> matches a word followed by a tab, without including the tab in C<$&>. =item C<(?!pattern)> -A zero-width negative lookahead assertion. For example C</foo(?!bar)/> +A zero-width negative look-ahead assertion. For example C</foo(?!bar)/> matches any occurrence of "foo" that isn't followed by "bar". Note -however that lookahead and lookbehind are NOT the same thing. You cannot -use this for lookbehind. +however that look-ahead and look-behind are NOT the same thing. You cannot +use this for look-behind. If you are looking for a "bar" that isn't preceded by a "foo", C</(?!foo)bar/> will not do what you want. That's because the C<(?!foo)> is just saying that @@ -332,29 +390,32 @@ Sometimes it's still easier just to say: if (/bar/ && $` !~ /foo$/) -For lookbehind see below. +For look-behind see below. =item C<(?E<lt>=pattern)> -A zero-width positive lookbehind assertion. For example, C</(?E<lt>=\t)\w+/> -matches a word following a tab, without including the tab in C<$&>. -Works only for fixed-width lookbehind. +A zero-width positive look-behind assertion. For example, C</(?E<lt>=\t)\w+/> +matches a word that follows a tab, without including the tab in C<$&>. +Works only for fixed-width look-behind. =item C<(?<!pattern)> -A zero-width negative lookbehind assertion. For example C</(?<!bar)foo/> -matches any occurrence of "foo" that isn't following "bar". -Works only for fixed-width lookbehind. +A zero-width negative look-behind assertion. For example C</(?<!bar)foo/> +matches any occurrence of "foo" that does not follow "bar". Works +only for fixed-width look-behind. =item C<(?{ code })> -Experimental "evaluate any Perl code" zero-width assertion. Always -succeeds. C<code> is not interpolated. Currently the rules to -determine where the C<code> ends are somewhat convoluted. +B<WARNING>: This extended regular expression feature is considered +highly experimental, and may be changed or deleted without notice. -The C<code> is properly scoped in the following sense: if the assertion -is backtracked (compare L<"Backtracking">), all the changes introduced after -C<local>isation are undone, so +This zero-width assertion evaluate any embedded Perl code. It +always succeeds, and its C<code> is not interpolated. Currently, +the rules to determine where the C<code> ends are somewhat convoluted. + +The C<code> is properly scoped in the following sense: If the assertion +is backtracked (compare L<"Backtracking">), all changes introduced after +C<local>ization are undone, so that $_ = 'a' x 8; m< @@ -370,51 +431,55 @@ C<local>isation are undone, so # location. >x; -will set C<$res = 4>. Note that after the match $cnt returns to the globally -introduced value 0, since the scopes which restrict C<local> statements +will set C<$res = 4>. Note that after the match, $cnt returns to the globally +introduced value, because the scopes that restrict C<local> operators are unwound. -This assertion may be used as L<C<(?(condition)yes-pattern|no-pattern)>> -switch. If I<not> used in this way, the result of evaluation of C<code> -is put into variable $^R. This happens immediately, so $^R can be used from -other C<(?{ code })> assertions inside the same regular expression. +This assertion may be used as a C<(?(condition)yes-pattern|no-pattern)> +switch. If I<not> used in this way, the result of evaluation of +C<code> is put into the special variable C<$^R>. This happens +immediately, so C<$^R> can be used from other C<(?{ code })> assertions +inside the same regular expression. -The above assignment to $^R is properly localized, thus the old value of $^R -is restored if the assertion is backtracked (compare L<"Backtracking">). +The assignment to C<$^R> above is properly localized, so the old +value of C<$^R> is restored if the assertion is backtracked; compare +L<"Backtracking">. -Due to security concerns, this construction is not allowed if the regular -expression involves run-time interpolation of variables, unless -C<use re 'eval'> pragma is used (see L<re>), or the variables contain -results of qr() operator (see L<perlop/"qr/STRING/imosx">). +For reasons of security, this construct is forbidden if the regular +expression involves run-time interpolation of variables, unless the +perilous C<use re 'eval'> pragma has been used (see L<re>), or the +variables contain results of C<qr//> operator (see +L<perlop/"qr/STRING/imosx">). -This restriction is due to the wide-spread (questionable) practice of -using the construct +This restriction is because of the wide-spread and remarkably convenient +custom of using run-time determined strings as patterns. For example: $re = <>; chomp $re; $string =~ /$re/; -without tainting. While this code is frowned upon from security point -of view, when C<(?{})> was introduced, it was considered bad to add -I<new> security holes to existing scripts. - -B<NOTE:> Use of the above insecure snippet without also enabling taint mode -is to be severely frowned upon. C<use re 'eval'> does not disable tainting -checks, thus to allow $re in the above snippet to contain C<(?{})> -I<with tainting enabled>, one needs both C<use re 'eval'> and untaint -the $re. +Before Perl knew how to execute interpolated code within a pattern, +this operation was completely safe from a security point of view, +although it could raise an exception from an illegal pattern. If +you turn on the C<use re 'eval'>, though, it is no longer secure, +so you should only do so if you are also using taint checking. +Better yet, use the carefully constrained evaluation within a Safe +module. See L<perlsec> for details about both these mechanisms. =item C<(?p{ code })> -I<Very experimental> "postponed" regular subexpression. C<code> is evaluated -at runtime, at the moment this subexpression may match. The result of -evaluation is considered as a regular expression, and matched as if it -were inserted instead of this construct. +B<WARNING>: This extended regular expression feature is considered +highly experimental, and may be changed or deleted without notice. -C<code> is not interpolated. Currently the rules to -determine where the C<code> ends are somewhat convoluted. +This is a "postponed" regular subexpression. The C<code> is evaluated +at run time, at the moment this subexpression may match. The result +of evaluation is considered as a regular expression and matched as +if it were inserted instead of this construct. -The following regular expression matches matching parenthesized group: +C<code> is not interpolated. As before, the rules to determine +where the C<code> ends are currently somewhat convoluted. + +The following pattern matches a parenthesized group: $re = qr{ \( @@ -428,31 +493,33 @@ The following regular expression matches matching parenthesized group: =item C<(?E<gt>pattern)> -An "independent" subexpression. Matches the substring that a -I<standalone> C<pattern> would match if anchored at the given position, -B<and only this substring>. - -Say, C<^(?E<gt>a*)ab> will never match, since C<(?E<gt>a*)> (anchored -at the beginning of string, as above) will match I<all> characters -C<a> at the beginning of string, leaving no C<a> for C<ab> to match. -In contrast, C<a*ab> will match the same as C<a+b>, since the match of -the subgroup C<a*> is influenced by the following group C<ab> (see -L<"Backtracking">). In particular, C<a*> inside C<a*ab> will match -fewer characters than a standalone C<a*>, since this makes the tail match. - -An effect similar to C<(?E<gt>pattern)> may be achieved by - - (?=(pattern))\1 - -since the lookahead is in I<"logical"> context, thus matches the same -substring as a standalone C<a+>. The following C<\1> eats the matched -string, thus making a zero-length assertion into an analogue of -C<(?E<gt>...)>. (The difference between these two constructs is that the -second one uses a catching group, thus shifting ordinals of -backreferences in the rest of a regular expression.) - -This construct is useful for optimizations of "eternal" -matches, because it will not backtrack (see L<"Backtracking">). +B<WARNING>: This extended regular expression feature is considered +highly experimental, and may be changed or deleted without notice. + +An "independent" subexpression, one which matches the substring +that a I<standalone> C<pattern> would match if anchored at the given +position--but it matches no more than this substring. This +construct is useful for optimizations of what would otherwise be +"eternal" matches, because it will not backtrack (see L<"Backtracking">). + +For example: C<^(?E<gt>a*)ab> will never match, since C<(?E<gt>a*)> +(anchored at the beginning of string, as above) will match I<all> +characters C<a> at the beginning of string, leaving no C<a> for +C<ab> to match. In contrast, C<a*ab> will match the same as C<a+b>, +since the match of the subgroup C<a*> is influenced by the following +group C<ab> (see L<"Backtracking">). In particular, C<a*> inside +C<a*ab> will match fewer characters than a standalone C<a*>, since +this makes the tail match. + +An effect similar to C<(?E<gt>pattern)> may be achieved by writing +C<(?=(pattern))\1>. This matches the same substring as a standalone +C<a+>, and the following C<\1> eats the matched string; it therefore +makes a zero-length assertion into an analogue of C<(?E<gt>...)>. +(The difference between these two constructs is that the second one +uses a capturing group, thus shifting ordinals of backreferences +in the rest of a regular expression.) + +Consider this pattern: m{ \( ( @@ -463,17 +530,16 @@ matches, because it will not backtrack (see L<"Backtracking">). \) }x -That will efficiently match a nonempty group with matching -two-or-less-level-deep parentheses. However, if there is no such group, -it will take virtually forever on a long string. That's because there are -so many different ways to split a long string into several substrings. -This is what C<(.+)+> is doing, and C<(.+)+> is similar to a subpattern -of the above pattern. Consider that the above pattern detects no-match -on C<((()aaaaaaaaaaaaaaaaaa> in several seconds, but that each extra -letter doubles this time. This exponential performance will make it -appear that your program has hung. - -However, a tiny modification of this pattern +That will efficiently match a nonempty group with matching parentheses +two levels deep or less. However, if there is no such group, it +will take virtually forever on a long string. That's because there +are so many different ways to split a long string into several +substrings. This is what C<(.+)+> is doing, and C<(.+)+> is similar +to a subpattern of the above pattern. Consider how the pattern +above detects no-match on C<((()aaaaaaaaaaaaaaaaaa> in several +seconds, but that each extra letter doubles this time. This +exponential performance will make it appear that your program has +hung. However, a tiny change to this pattern m{ \( ( @@ -491,18 +557,21 @@ however, that this pattern currently triggers a warning message under B<-w> saying it C<"matches the null string many times">): On simple groups, such as the pattern C<(?E<gt> [^()]+ )>, a comparable -effect may be achieved by negative lookahead, as in C<[^()]+ (?! [^()] )>. +effect may be achieved by negative look-ahead, as in C<[^()]+ (?! [^()] )>. This was only 4 times slower on a string with 1000000 C<a>s. =item C<(?(condition)yes-pattern|no-pattern)> =item C<(?(condition)yes-pattern)> +B<WARNING>: This extended regular expression feature is considered +highly experimental, and may be changed or deleted without notice. + Conditional expression. C<(condition)> should be either an integer in parentheses (which is valid if the corresponding pair of parentheses -matched), or lookahead/lookbehind/evaluate zero-width assertion. +matched), or look-ahead/look-behind/evaluate zero-width assertion. -Say, +For example: m{ ( \( )? [^()]+ @@ -512,39 +581,8 @@ Say, matches a chunk of non-parentheses, possibly included in parentheses themselves. -=item C<(?imsx-imsx)> - -One or more embedded pattern-match modifiers. This is particularly -useful for patterns that are specified in a table somewhere, some of -which want to be case sensitive, and some of which don't. The case -insensitive ones need to include merely C<(?i)> at the front of the -pattern. For example: - - $pattern = "foobar"; - if ( /$pattern/i ) { } - - # more flexible: - - $pattern = "(?i)foobar"; - if ( /$pattern/ ) { } - -Letters after C<-> switch modifiers off. - -These modifiers are localized inside an enclosing group (if any). Say, - - ( (?i) blah ) \s+ \1 - -(assuming C<x> modifier, and no C<i> modifier outside of this group) -will match a repeated (I<including the case>!) word C<blah> in any -case. - =back -A question mark was chosen for this and for the new minimal-matching -construct because 1) question mark is pretty rare in older regular -expressions, and 2) whenever you see one, you should stop and "question" -exactly what is going on. That's psychology... - =head2 Backtracking A fundamental feature of regular expression matching involves the @@ -589,7 +627,7 @@ Which perhaps unexpectedly yields: got <d is under the bar in the > That's because C<.*> was greedy, so you get everything between the -I<first> "foo" and the I<last> "bar". In this case, it's more effective +I<first> "foo" and the I<last> "bar". Here it's more effective to use minimal matching to make sure you get the text between a "foo" and the first "bar" thereafter. @@ -652,7 +690,7 @@ definition might succeed against a particular string. And if there are multiple ways it might succeed, you need to understand backtracking to know which variety of success you will achieve. -When using lookahead assertions and negations, this can all get even +When using look-ahead assertions and negations, this can all get even tricker. Imagine you'd like to find a sequence of non-digits not followed by "123". You might try to write that as @@ -688,8 +726,9 @@ that you've asked "Is it true that at the start of $x, following 0 or more non-digits, you have something that's not 123?" If the pattern matcher had let C<\D*> expand to "ABC", this would have caused the whole pattern to fail. + The search engine will initially match C<\D*> with "ABC". Then it will -try to match C<(?!123> with "123", which of course fails. But because +try to match C<(?!123> with "123", which fails. But because a quantifier (C<\D*>) has been used in the regular expression, the search engine can backtrack and retry the match differently in the hope of matching the complete regular expression. @@ -697,13 +736,13 @@ in the hope of matching the complete regular expression. The pattern really, I<really> wants to succeed, so it uses the standard pattern back-off-and-retry and lets C<\D*> expand to just "AB" this time. Now there's indeed something following "AB" that is not -"123". It's in fact "C123", which suffices. +"123". It's "C123", which suffices. -We can deal with this by using both an assertion and a negation. We'll -say that the first part in $1 must be followed by a digit, and in fact, it -must also be followed by something that's not "123". Remember that the -lookaheads are zero-width expressions--they only look, but don't consume -any of the string in their match. So rewriting this way produces what +We can deal with this by using both an assertion and a negation. +We'll say that the first part in $1 must be followed both by a digit +and by something that's not "123". Remember that the look-aheads +are zero-width expressions--they only look, but don't consume any +of the string in their match. So rewriting this way produces what you'd expect; that is, case 5 will fail, but case 6 succeeds: print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/ ; @@ -712,7 +751,7 @@ you'd expect; that is, case 5 will fail, but case 6 succeeds: 6: got ABC In other words, the two zero-width assertions next to each other work as though -they're ANDed together, just as you'd use any builtin assertions: C</^$/> +they're ANDed together, just as you'd use any built-in assertions: C</^$/> matches only if you're at the beginning of the line AND the end of the line simultaneously. The deeper underlying truth is that juxtaposition in regular expressions always means AND, except when you write an explicit OR @@ -720,22 +759,22 @@ using the vertical bar. C</ab/> means match "a" AND (then) match "b", although the attempted matches are made at different positions because "a" is not a zero-width assertion, but a one-width assertion. -One warning: particularly complicated regular expressions can take -exponential time to solve due to the immense number of possible ways they -can use backtracking to try match. For example this will take a very long -time to run +B<WARNING>: particularly complicated regular expressions can take +exponential time to solve because of the immense number of possible +ways they can use backtracking to try match. For example, this will +take a painfully long time to run /((a{0,5}){0,5}){0,5}/ -And if you used C<*>'s instead of limiting it to 0 through 5 matches, then -it would take literally forever--or until you ran out of stack space. +And if you used C<*>'s instead of limiting it to 0 through 5 matches, +then it would take forever--or until you ran out of stack space. A powerful tool for optimizing such beasts is "independent" groups, which do not backtrace (see L<C<(?E<gt>pattern)>>). Note also that -zero-length lookahead/lookbehind assertions will not backtrace to make -the tail match, since they are in "logical" context: only the fact -whether they match or not is considered relevant. For an example -where side-effects of a lookahead I<might> have influenced the +zero-length look-ahead/look-behind assertions will not backtrace to make +the tail match, since they are in "logical" context: only +whether they match is considered relevant. For an example +where side-effects of a look-ahead I<might> have influenced the following match, see L<C<(?E<gt>pattern)>>. =head2 Version 8 Regular Expressions @@ -754,7 +793,7 @@ would match "blurfl" in the target string. You can specify a character class, by enclosing a list of characters in C<[]>, which will match any one character from the list. If the first character after the "[" is "^", the class matches any character not -in the list. Within a list, the "-" character is used to specify a +in the list. Within a list, the "-" character specifies a range, so that C<a-z> represents all characters between "a" and "z", inclusive. If you want "-" itself to be a member of a class, put it at the start or end of the list, or escape it with a backslash. (The @@ -784,8 +823,8 @@ or "foe" in the target string (as would C<f(e|i|o)e>). The first alternative includes everything from the last pattern delimiter ("(", "[", or the beginning of the pattern) up to the first "|", and the last alternative contains everything from the last "|" to the next -pattern delimiter. For this reason, it's common practice to include -alternatives in parentheses, to minimize confusion about where they +pattern delimiter. That's why it's common practice to include +alternatives in parentheses: to minimize confusion about where they start and end. Alternatives are tried from left to right, so the first @@ -799,18 +838,18 @@ important when you are capturing matched text using parentheses.) Also remember that "|" is interpreted as a literal within square brackets, so if you write C<[fee|fie|foe]> you're really only matching C<[feio|]>. -Within a pattern, you may designate subpatterns for later reference by -enclosing them in parentheses, and you may refer back to the I<n>th -subpattern later in the pattern using the metacharacter \I<n>. -Subpatterns are numbered based on the left to right order of their -opening parenthesis. A backreference matches whatever -actually matched the subpattern in the string being examined, not the -rules for that subpattern. Therefore, C<(0|0x)\d*\s\1\d*> will -match "0x1234 0x4321", but not "0x1234 01234", because subpattern 1 -actually matched "0x", even though the rule C<0|0x> could -potentially match the leading 0 in the second number. +Within a pattern, you may designate subpatterns for later reference +by enclosing them in parentheses, and you may refer back to the +I<n>th subpattern later in the pattern using the metacharacter +\I<n>. Subpatterns are numbered based on the left to right order +of their opening parenthesis. A backreference matches whatever +actually matched the subpattern in the string being examined, not +the rules for that subpattern. Therefore, C<(0|0x)\d*\s\1\d*> will +match "0x1234 0x4321", but not "0x1234 01234", because subpattern +1 matched "0x", even though the rule C<0|0x> could potentially match +the leading 0 in the second number. -=head2 WARNING on \1 vs $1 +=head2 Warning on \1 vs $1 Some people get too used to writing things like: @@ -831,13 +870,13 @@ Or if you try to do s/(\d+)/\1000/; You can't disambiguate that by saying C<\{1}000>, whereas you can fix it with -C<${1}000>. Basically, the operation of interpolation should not be confused +C<${1}000>. The operation of interpolation should not be confused with the operation of matching a backreference. Certainly they mean two different things on the I<left> side of the C<s///>. =head2 Repeated patterns matching zero-length substring -WARNING: Difficult material (and prose) ahead. This section needs a rewrite. +B<WARNING>: Difficult material (and prose) ahead. This section needs a rewrite. Regular expressions provide a terse and powerful programming language. As with most other power tools, power comes together with the ability @@ -850,7 +889,7 @@ loops using regular expressions, with something as innocuous as: The C<o?> can match at the beginning of C<'foo'>, and since the position in the string is not moved by the match, C<o?> would match again and again -due to the C<*> modifier. Another common way to create a similar cycle +because of the C<*> modifier. Another common way to create a similar cycle is with the looping modifier C<//g>: @matches = ( 'foo' =~ m{ o? }xg ); @@ -862,8 +901,8 @@ or or the loop implied by split(). However, long experience has shown that many programming tasks may -be significantly simplified by using repeated subexpressions which -may match zero-length substrings, with a simple example being: +be significantly simplified by using repeated subexpressions that +may match zero-length substrings. Here's a simple example being: @chars = split //, $string; # // is not magic in split ($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// / @@ -873,8 +912,9 @@ the infinite loop>. The rules for this are different for lower-level loops given by the greedy modifiers C<*+{}>, and for higher-level ones like the C</g> modifier or split() operator. -The lower-level loops are I<interrupted> when it is detected that a -repeated expression did match a zero-length substring, thus +The lower-level loops are I<interrupted> (that is, the loop is +broken) when Perl detects that a repeated expression matched a +zero-length substring. Thus m{ (?: NON_ZERO_LENGTH | ZERO_LENGTH )* }x; @@ -892,7 +932,7 @@ This prohibition interacts with backtracking (see L<"Backtracking">), and so the I<second best> match is chosen if the I<best> match is of zero length. -Say, +For example: $_ = 'bar'; s/\w??/<$&>/g; @@ -905,7 +945,7 @@ alternate with one-character-long matches. Similarly, for repeated C<m/()/g> the second-best match is the match at the position one notch further in the string. -The additional state of being I<matched with zero-length> is associated to +The additional state of being I<matched with zero-length> is associated with the matched string, and is reset by each assignment to pos(). =head2 Creating custom RE engines @@ -955,14 +995,22 @@ part of this regular expression needs to be converted explicitly $re = customre::convert $re; /\Y|$re\Y|/; -=head2 SEE ALSO +=head1 BUGS + +This manpage is varies from difficult to understand to completely +and utterly opaque. + +=head1 SEE ALSO L<perlop/"Regexp Quote-Like Operators">. L<perlop/"Gory details of parsing quoted constructs">. +L<perlfaq6>. + L<perlfunc/pos>. L<perllocale>. -I<Mastering Regular Expressions> (see L<perlbook>) by Jeffrey Friedl. +I<Mastering Regular Expressions> by Jeffrey Friedl, published +by O'Reilly and Associates. diff --git a/pod/perlref.pod b/pod/perlref.pod index 596ff72c1a..5958a7233c 100644 --- a/pod/perlref.pod +++ b/pod/perlref.pod @@ -21,7 +21,7 @@ hashes of arrays, arrays of hashes of functions, and so on. Hard references are smart--they keep track of reference counts for you, automatically freeing the thing referred to when its reference count goes -to zero. (Note: the reference counts for values in self-referential or +to zero. (Reference counts for values in self-referential or cyclic data structures may not go to zero without a little help; see L<perlobj/"Two-Phased Garbage Collection"> for a detailed explanation.) If that thing happens to be an object, the object is destructed. See @@ -31,7 +31,7 @@ have been officially "blessed" into a class package.) Symbolic references are names of variables or other objects, just as a symbolic link in a Unix filesystem contains merely the name of a file. -The C<*glob> notation is a kind of symbolic reference. (Symbolic +The C<*glob> notation is something of a of symbolic reference. (Symbolic references are sometimes called "soft references", but please don't call them that; references are confusing enough without useless synonyms.) @@ -56,8 +56,8 @@ References can be created in several ways. =item 1. By using the backslash operator on a variable, subroutine, or value. -(This works much like the & (address-of) operator in C.) Note -that this typically creates I<ANOTHER> reference to a variable, because +(This works much like the & (address-of) operator in C.) +This typically creates I<another> reference to a variable, because there's already a reference to the variable in the symbol table. But the symbol table reference might go away, and you'll still have the reference that the backslash returned. Here are some examples: @@ -87,7 +87,7 @@ elements. (The multidimensional syntax described later can be used to access this. For example, after the above, C<$arrayref-E<gt>[2][1]> would have the value "b".) -Note that taking a reference to an enumerated list is not the same +Taking a reference to an enumerated list is not the same as using square brackets--instead it's the same as creating a list of references! @@ -136,7 +136,7 @@ On the other hand, if you want the other meaning, you can do this: sub showem { {; @_ } } # ok sub showem { { return @_ } } # ok -Note how the leading C<+{> and C<{;> always serve to disambiguate +The leading C<+{> and C<{;> always serve to disambiguate the expression to mean either the HASH reference, or the BLOCK. =item 4. @@ -146,18 +146,18 @@ C<sub> without a subname: $coderef = sub { print "Boink!\n" }; -Note the presence of the semicolon. Except for the fact that the code -inside isn't executed immediately, a C<sub {}> is not so much a +Note the semicolon. Except for the code +inside not being immediately executed, a C<sub {}> is not so much a declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no matter how many times you execute that particular line (unless you're in an -C<eval("...")>), C<$coderef> will still have a reference to the I<SAME> +C<eval("...")>), $coderef will still have a reference to the I<same> anonymous subroutine.) Anonymous subroutines act as closures with respect to my() variables, -that is, variables visible lexically within the current scope. Closure +that is, variables lexically visible within the current scope. Closure is a notion out of the Lisp world that says if you define an anonymous function in a particular lexical context, it pretends to run in that -context even when it's called outside of the context. +context even when it's called outside the context. In human terms, it's a funny way of passing arguments to a subroutine when you define it as well as when you call it. It's useful for setting up @@ -165,11 +165,9 @@ little bits of code to run later, such as callbacks. You can even do object-oriented stuff with it, though Perl already provides a different mechanism to do that--see L<perlobj>. -You can also think of closure as a way to write a subroutine template without -using eval. (In fact, in version 5.000, eval was the I<only> way to get -closures. You may wish to use "require 5.001" if you use closures.) - -Here's a small example of how closures works: +You might also think of closure as a way to write a subroutine +template without using eval(). Here's a small example of how +closures work: sub newprint { my $x = shift; @@ -188,10 +186,10 @@ This prints Howdy, world! Greetings, earthlings! -Note particularly that $x continues to refer to the value passed into -newprint() I<despite> the fact that the "my $x" has seemingly gone out of -scope by the time the anonymous subroutine runs. That's what closure -is all about. +Note particularly that $x continues to refer to the value passed +into newprint() I<despite> "my $x" having gone out of scope by the +time the anonymous subroutine runs. That's what a closure is all +about. This applies only to lexical variables, by the way. Dynamic variables continue to work as they have always worked. Closure is not something @@ -200,7 +198,7 @@ that most Perl programmers need trouble themselves about to begin with. =item 5. References are often returned by special subroutines called constructors. -Perl objects are just references to a special kind of object that happens to know +Perl objects are just references to a special type of object that happens to know which package it's associated with. Constructors are just special subroutines that know how to create that association. They do so by starting with an ordinary reference, and it remains an ordinary reference @@ -241,35 +239,37 @@ known as foo). $ioref = *STDIN{IO}; $globref = *foo{GLOB}; -All of these are self-explanatory except for *foo{IO}. It returns the -IO handle, used for file handles (L<perlfunc/open>), sockets -(L<perlfunc/socket> and L<perlfunc/socketpair>), and directory handles -(L<perlfunc/opendir>). For compatibility with previous versions of -Perl, *foo{FILEHANDLE} is a synonym for *foo{IO}. +All of these are self-explanatory except for C<*foo{IO}>. It returns +the IO handle, used for file handles (L<perlfunc/open>), sockets +(L<perlfunc/socket> and L<perlfunc/socketpair>), and directory +handles (L<perlfunc/opendir>). For compatibility with previous +versions of Perl, C<*foo{FILEHANDLE}> is a synonym for C<*foo{IO}>. -*foo{THING} returns undef if that particular THING hasn't been used yet, -except in the case of scalars. *foo{SCALAR} returns a reference to an +C<*foo{THING}> returns undef if that particular THING hasn't been used yet, +except in the case of scalars. C<*foo{SCALAR}> returns a reference to an anonymous scalar if $foo hasn't been used yet. This might change in a future release. -*foo{IO} is an alternative to the \*HANDLE mechanism given in +C<*foo{IO}> is an alternative to the C<*HANDLE> mechanism given in L<perldata/"Typeglobs and Filehandles"> for passing filehandles into or out of subroutines, or storing into larger data structures. Its disadvantage is that it won't create a new filehandle for you. -Its advantage is that you have no risk of clobbering more than you want -to with a typeglob assignment, although if you assign to a scalar instead -of a typeglob, you're ok. +Its advantage is that you have less risk of clobbering more than +you want to with a typeglob assignment. (It still conflates file +and directory handles, though.) However, if you assign the incoming +value to a scalar instead of a typeglob as we do in the examples +below, there's no risk of that happening. - splutter(*STDOUT); - splutter(*STDOUT{IO}); + splutter(*STDOUT); # pass the whole glob + splutter(*STDOUT{IO}); # pass both file and dir handles sub splutter { my $fh = shift; print $fh "her um well a hmmm\n"; } - $rec = get_rec(*STDIN); - $rec = get_rec(*STDIN{IO}); + $rec = get_rec(*STDIN); # pass the whole glob + $rec = get_rec(*STDIN{IO}); # pass both file and dir handles sub get_rec { my $fh = shift; @@ -299,9 +299,9 @@ a simple scalar variable containing a reference of the correct type: &$coderef(1,2,3); print $globref "output\n"; -It's important to understand that we are specifically I<NOT> dereferencing +It's important to understand that we are specifically I<not> dereferencing C<$arrayref[0]> or C<$hashref{"KEY"}> there. The dereference of the -scalar variable happens I<BEFORE> it does any key lookups. Anything more +scalar variable happens I<before> it does any key lookups. Anything more complicated than a simple scalar variable must use methods 2 or 3 below. However, a "simple scalar" includes an identifier that itself uses method 1 recursively. Therefore, the following prints "howdy". @@ -334,7 +334,7 @@ people often make the mistake of viewing the dereferencing symbols as proper operators, and wonder about their precedence. If they were, though, you could use parentheses instead of braces. That's not the case. Consider the difference below; case 0 is a short-hand version of case 1, -I<NOT> case 2: +I<not> case 2: $$hashref{"KEY"} = "VALUE"; # CASE 0 ${$hashref}{"KEY"} = "VALUE"; # CASE 1 @@ -356,7 +356,7 @@ syntactic sugar, the examples for method 2 may be written: $coderef->(1,2,3); # Subroutine call The left side of the arrow can be any expression returning a reference, -including a previous dereference. Note that C<$array[$x]> is I<NOT> the +including a previous dereference. Note that C<$array[$x]> is I<not> the same thing as C<$array-E<gt>[$x]> here: $array[$x]->{"foo"}->[0] = "January"; @@ -369,7 +369,7 @@ C<{"foo"}> in it. Likewise C<$array[$x]-E<gt>{"foo"}> will automatically get defined with an array reference so that we can look up C<[0]> in it. This process is called I<autovivification>. -One more thing here. The arrow is optional I<BETWEEN> brackets +One more thing here. The arrow is optional I<between> brackets subscripts, so you can shrink the above down to $array[$x]{"foo"}[0] = "January"; @@ -394,14 +394,27 @@ civility though. =back -The ref() operator may be used to determine what type of thing the -reference is pointing to. See L<perlfunc>. +Using a string or number as a reference produces a symbolic reference, +as explained above. Using a reference as a number produces an +integer representing its storage location in memory. The only +useful thing to be done with this is to compare two references +numerically to see whether they refer to the same location. + + if ($ref1 == $ref2) { # cheap numeric compare of references + print "refs 1 and 2 refer to the same thing\n"; + } + +Using a reference as a string produces both its referent's type, +including any package blessing as described in L<perlobj>, as well +as the numeric address expressed in hex. The ref() operator returns +just the type of thing the reference is pointing to, without the +address. See L<perlfunc/ref> for details and examples of its use. The bless() operator may be used to associate the object a reference points to with a package functioning as an object class. See L<perlobj>. A typeglob may be dereferenced the same way a reference can, because -the dereference syntax always indicates the kind of reference desired. +the dereference syntax always indicates the type of reference desired. So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable. Here's a trick for interpolating a subroutine call into a string: @@ -421,9 +434,9 @@ chicanery is also useful for arbitrary expressions: We said that references spring into existence as necessary if they are undefined, but we didn't say what happens if a value used as a -reference is already defined, but I<ISN'T> a hard reference. If you -use it as a reference in this case, it'll be treated as a symbolic -reference. That is, the value of the scalar is taken to be the I<NAME> +reference is already defined, but I<isn't> a hard reference. If you +use it as a reference, it'll be treated as a symbolic +reference. That is, the value of the scalar is taken to be the I<name> of a variable, rather than a direct link to a (possibly) anonymous value. @@ -439,7 +452,7 @@ People frequently expect it to work like this. So it does. $pack = "THAT"; ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval -This is very powerful, and slightly dangerous, in that it's possible +This is powerful, and slightly dangerous, in that it's possible to intend (with the utmost sincerity) to use a hard reference, and accidentally use a symbolic reference instead. To protect against that, you can say @@ -474,7 +487,7 @@ always have within a string. That is, $push = "pop on "; print "${push}over"; -has always meant to print "pop on over", despite the fact that push is +has always meant to print "pop on over", even though push is a reserved word. This has been generalized to work the same outside of quotes, so that @@ -485,7 +498,7 @@ and even print ${ push } . "over"; will have the same effect. (This would have been a syntax error in -Perl 5.000, though Perl 4 allowed it in the spaceless form.) Note that this +Perl 5.000, though Perl 4 allowed it in the spaceless form.) This construct is I<not> considered to be a symbolic reference when you're using strict refs: @@ -521,10 +534,10 @@ string is effectively quoted. =head2 Pseudo-hashes: Using an array as a hash -WARNING: This section describes an experimental feature. Details may +B<WARNING>: This section describes an experimental feature. Details may change without notice in future versions. -Beginning with release 5.005 of Perl you can use an array reference +Beginning with release 5.005 of Perl, you may use an array reference in some contexts that would normally require a hash reference. This allows you to access array elements using symbolic names, as if they were fields in a structure. @@ -550,7 +563,6 @@ or try to access nonexistent fields. For better performance, Perl can also do the translation from field names to array indices at compile time for typed object references. See L<fields>. - =head2 Function Templates As explained above, a closure is an anonymous function with access to the @@ -564,7 +576,7 @@ that generated HTML font changes for the various colors: print "Be ", red("careful"), "with that ", green("light"); -The red() and green() functions would be very similar. To create these, +The red() and green() functions would be similar. To create these, we'll assign a closure to a typeglob of the name of the function we're trying to build. @@ -598,7 +610,7 @@ above--only works with closures, not general subroutines. In the general case, then, named subroutines do not nest properly, although anonymous ones do. If you are accustomed to using nested subroutines in other programming languages with their own private variables, you'll have to -work at it a bit in Perl. The intuitive coding of this kind of thing +work at it a bit in Perl. The intuitive coding of this type of thing incurs mysterious warnings about ``will not stay shared''. For example, this won't work: @@ -646,7 +658,7 @@ The standard Tie::RefHash module provides a convenient workaround to this. =head1 SEE ALSO Besides the obvious documents, source code can be instructive. -Some rather pathological examples of the use of references can be found +Some pathological examples of the use of references can be found in the F<t/op/ref.t> regression test in the Perl source directory. See also L<perldsc> and L<perllol> for how to use references to create diff --git a/pod/perlrun.pod b/pod/perlrun.pod index 7cb9aed4c0..c71b9f3ca4 100644 --- a/pod/perlrun.pod +++ b/pod/perlrun.pod @@ -17,7 +17,11 @@ B<perl> S<[ B<-sTuU> ]> =head1 DESCRIPTION -Upon startup, Perl looks for your script in one of the following +The normal way to run a Perl program is by making it directly +executable, or else by passing the name of the source file as an +argument on the command line. (An interactive Perl environment +is also possible--see L<perldebug> for details on how to do that.) +Upon startup, Perl looks for your program in one of the following places: =over 4 @@ -35,61 +39,71 @@ way. See L<Location of Perl>.) =item 3. Passed in implicitly via standard input. This works only if there are -no filename arguments--to pass arguments to a STDIN script you -must explicitly specify a "-" for the script name. +no filename arguments--to pass arguments to a STDIN-read program you +must explicitly specify a "-" for the program name. =back With methods 2 and 3, Perl starts parsing the input file from the beginning, unless you've specified a B<-x> switch, in which case it scans for the first line starting with #! and containing the word -"perl", and starts there instead. This is useful for running a script +"perl", and starts there instead. This is useful for running a program embedded in a larger message. (In this case you would indicate the end -of the script using the C<__END__> token.) +of the program using the C<__END__> token.) The #! line is always examined for switches as the line is being parsed. Thus, if you're on a machine that allows only one argument with the #! line, or worse, doesn't even recognize the #! line, you still can get consistent switch behavior regardless of how Perl was -invoked, even if B<-x> was used to find the beginning of the script. - -Because many operating systems silently chop off kernel interpretation of -the #! line after 32 characters, some switches may be passed in on the -command line, and some may not; you could even get a "-" without its -letter, if you're not careful. You probably want to make sure that all -your switches fall either before or after that 32 character boundary. -Most switches don't actually care if they're processed redundantly, but -getting a - instead of a complete switch could cause Perl to try to -execute standard input instead of your script. And a partial B<-I> switch +invoked, even if B<-x> was used to find the beginning of the program. + +Because historically some operating systems silently chopped off +kernel interpretation of the #! line after 32 characters, some +switches may be passed in on the command line, and some may not; +you could even get a "-" without its letter, if you're not careful. +You probably want to make sure that all your switches fall either +before or after that 32-character boundary. Most switches don't +actually care if they're processed redundantly, but getting a "-" +instead of a complete switch could cause Perl to try to execute +standard input instead of your program. And a partial B<-I> switch could also cause odd results. -Some switches do care if they are processed twice, for instance combinations -of B<-l> and B<-0>. Either put all the switches after the 32 character -boundary (if applicable), or replace the use of B<-0>I<digits> by -C<BEGIN{ $/ = "\0digits"; }>. +Some switches do care if they are processed twice, for instance +combinations of B<-l> and B<-0>. Either put all the switches after +the 32-character boundary (if applicable), or replace the use of +B<-0>I<digits> by C<BEGIN{ $/ = "\0digits"; }>. Parsing of the #! switches starts wherever "perl" is mentioned in the line. The sequences "-*" and "- " are specifically ignored so that you could, if you were so inclined, say #!/bin/sh -- # -*- perl -*- -p - eval 'exec /usr/bin/perl -wS $0 ${1+"$@"}' + eval 'exec perl -wS $0 ${1+"$@"}' if $running_under_some_shell; -to let Perl see the B<-p> switch. +to let Perl see the B<-p> switch. + +A similar trick involves the B<env> program, if you have it. + + #!/usr/bin/env perl + +The examples above use a relative path to the perl interpreter, +getting whatever version is first in the user's path. If you want +a specific version of Perl, say, perl5.005_57, you should place +that directly in the #! line's path. If the #! line does not contain the word "perl", the program named after the #! is executed instead of the Perl interpreter. This is slightly bizarre, but it helps people on machines that don't do #!, because they -can tell a program that their SHELL is /usr/bin/perl, and Perl will then +can tell a program that their SHELL is F</usr/bin/perl>, and Perl will then dispatch the program to the correct interpreter for them. -After locating your script, Perl compiles the entire script to an +After locating your program, Perl compiles the entire program to an internal form. If there are any compilation errors, execution of the -script is not attempted. (This is unlike the typical shell script, +program is not attempted. (This is unlike the typical shell script, which might run part-way through before finding a syntax error.) -If the script is syntactically correct, it is executed. If the script +If the program is syntactically correct, it is executed. If the program runs off the end without hitting an exit() or die() operator, an implicit C<exit(0)> is provided to indicate successful completion. @@ -105,12 +119,12 @@ Put extproc perl -S -your_switches -as the first line in C<*.cmd> file (C<-S> due to a bug in cmd.exe's +as the first line in C<*.cmd> file (B<-S> due to a bug in cmd.exe's `extproc' handling). =item MS-DOS -Create a batch file to run your script, and codify it in +Create a batch file to run your program, and codify it in C<ALTERNATIVE_SHEBANG> (see the F<dosish.h> file in the source distribution for more information). @@ -126,7 +140,7 @@ and a Perl library file. =item Macintosh -Macintosh perl scripts will have the appropriate Creator and +A Macintosh perl program will have the appropriate Creator and Type, so that double-clicking them will invoke the perl application. =item VMS @@ -136,10 +150,10 @@ Put $ perl -mysw 'f$env("procedure")' 'p1' 'p2' 'p3' 'p4' 'p5' 'p6' 'p7' 'p8' ! $ exit++ + ++$status != 0 and $exit = $status = undef; -at the top of your script, where C<-mysw> are any command line switches you -want to pass to Perl. You can now invoke the script directly, by saying -C<perl script>, or as a DCL procedure, by saying C<@script> (or implicitly -via F<DCL$PATH> by just using the name of the script). +at the top of your program, where B<-mysw> are any command line switches you +want to pass to Perl. You can now invoke the program directly, by saying +C<perl program>, or as a DCL procedure, by saying C<@program> (or implicitly +via F<DCL$PATH> by just using the name of the program). This incantation is a bit much to remember, but Perl will display it for you if you say C<perl "-V:startperl">. @@ -150,10 +164,10 @@ Command-interpreters on non-Unix systems have rather different ideas on quoting than Unix shells. You'll need to learn the special characters in your command-interpreter (C<*>, C<\> and C<"> are common) and how to protect whitespace and these characters to run -one-liners (see C<-e> below). +one-liners (see B<-e> below). On some systems, you may have to change single-quotes to double ones, -which you must I<NOT> do on Unix or Plan9 systems. You might also +which you must I<not> do on Unix or Plan9 systems. You might also have to change a single % to a %%. For example: @@ -171,13 +185,13 @@ For example: # VMS perl -e "print ""Hello world\n""" -The problem is that none of this is reliable: it depends on the command -and it is entirely possible neither works. If 4DOS was the command shell, this would -probably work better: +The problem is that none of this is reliable: it depends on the +command and it is entirely possible neither works. If B<4DOS> were +the command shell, this would probably work better: perl -e "print <Ctrl-x>"Hello world\n<Ctrl-x>"" -CMD.EXE in Windows NT slipped a lot of standard Unix functionality in +B<CMD.EXE> in Windows NT slipped a lot of standard Unix functionality in when nobody was looking, but just try to find documentation for its quoting rules. @@ -191,22 +205,30 @@ There is no general solution to all of this. It's just a mess. =head2 Location of Perl It may seem obvious to say, but Perl is useful only when users can -easily find it. When possible, it's good for both B</usr/bin/perl> and -B</usr/local/bin/perl> to be symlinks to the actual binary. If that -can't be done, system administrators are strongly encouraged to put -(symlinks to) perl and its accompanying utilities, such as perldoc, into -a directory typically found along a user's PATH, or in another obvious -and convenient place. +easily find it. When possible, it's good for both F</usr/bin/perl> +and F</usr/local/bin/perl> to be symlinks to the actual binary. If +that can't be done, system administrators are strongly encouraged +to put (symlinks to) perl and its accompanying utilities into a +directory typically found along a user's PATH, or in some other +obvious and convenient place. + +In this documentation, C<#!/usr/bin/perl> on the first line of the program +will stand in for whatever method works on your system. You are +advised to use a specific path if you care about a specific version. -In this documentation, C<#!/usr/bin/perl> on the first line of the script -will stand in for whatever method works on your system. + #!/usr/local/bin/perl5.00554 -=head2 Switches +or if you just want to be running at least version, place a statement +like this at the top of your program: -A single-character switch may be combined with the following switch, if -any. + use 5.005_54; - #!/usr/bin/perl -spi.bak # same as -s -p -i.bak +=head2 Command Switches + +As with all standard commands, a single-character switch may be +clustered with the following switch, if any. + + #!/usr/bin/perl -spi.orig # same as -s -p -i.orig Switches include: @@ -220,7 +242,7 @@ precede or follow the digits. For example, if you have a version of B<find> which can print filenames terminated by the null character, you can say this: - find . -name '*.bak' -print0 | perl -n0e unlink + find . -name '*.orig' -print0 | perl -n0e unlink The special value 00 will cause Perl to slurp files in paragraph mode. The value 0777 will cause Perl to slurp files whole because there is no @@ -245,26 +267,26 @@ An alternate delimiter may be specified using B<-F>. =item B<-c> -causes Perl to check the syntax of the script and then exit without +causes Perl to check the syntax of the program and then exit without executing it. Actually, it I<will> execute C<BEGIN>, C<END>, and C<use> blocks, because these are considered as occurring outside the execution of -your program. +your program. C<INIT> blocks, however, will be skipped. =item B<-d> -runs the script under the Perl debugger. See L<perldebug>. +runs the program under the Perl debugger. See L<perldebug>. =item B<-d:>I<foo> -runs the script under the control of a debugging or tracing module -installed as Devel::foo. E.g., B<-d:DProf> executes the script using the -Devel::DProf profiler. See L<perldebug>. +runs the program under the control of a debugging, profiling, or +tracing module installed as Devel::foo. E.g., B<-d:DProf> executes +the program using the Devel::DProf profiler. See L<perldebug>. =item B<-D>I<letters> =item B<-D>I<number> -sets debugging flags. To watch how it executes your script, use +sets debugging flags. To watch how it executes your program, use B<-Dtls>. (This works only if debugging is compiled into your Perl.) Another nice value is B<-Dx>, which lists your compiled syntax tree. And B<-Dr> displays compiled regular expressions. As an @@ -283,24 +305,35 @@ equivalent to B<-Dtls>): 512 r Regular expression parsing and execution 1024 x Syntax tree dump 2048 u Tainting checks - 4096 L Memory leaks (needs C<-DLEAKTEST> when compiling Perl) + 4096 L Memory leaks (needs -DLEAKTEST when compiling Perl) 8192 H Hash dump -- usurps values() 16384 X Scratchpad allocation 32768 D Cleaning up 65536 S Thread synchronization -All these flags require C<-DDEBUGGING> when you compile the Perl -executable. This flag is automatically set if you include C<-g> +All these flags require B<-DDEBUGGING> when you compile the Perl +executable. See the F<INSTALL> file in the Perl source distribution +for how to do this. This flag is automatically set if you include B<-g> option when C<Configure> asks you about optimizer/debugger flags. +If you're just trying to get a print out of each line of Perl code +as it executes, the way that C<sh -x> provides for shell scripts, +you can't use Perl's B<-D> switch. Instead do this + + # Bourne shell syntax + $ PERLDB_OPTS="NonStop=1 AutoTrace=1 frame=2" perl -dS program + + # csh syntax + % (setenv PERLDB_OPTS "NonStop=1 AutoTrace=1 frame=2"; perl -dS program) + +See L<perldebug> for details and variations. + =item B<-e> I<commandline> -may be used to enter one line of script. -If B<-e> is given, Perl -will not look for a script filename in the argument list. -Multiple B<-e> commands may -be given to build up a multi-line script. -Make sure to use semicolons where you would in a normal program. +may be used to enter one line of program. If B<-e> is given, Perl +will not look for a filename in the argument list. Multiple B<-e> +commands may be given to build up a multi-line script. Make sure +to use semicolons where you would in a normal program. =item B<-F>I<pattern> @@ -324,47 +357,46 @@ rules: If no extension is supplied, no backup is made and the current file is overwritten. -If the extension doesn't contain a C<*> then it is appended to the end -of the current filename as a suffix. - -If the extension does contain one or more C<*> characters, then each C<*> -is replaced with the current filename. In perl terms you could think of -this as: +If the extension doesn't contain a C<*>, then it is appended to the +end of the current filename as a suffix. If the extension does +contain one or more C<*> characters, then each C<*> is replaced +with the current filename. In Perl terms, you could think of this +as: ($backup = $extension) =~ s/\*/$file_name/g; This allows you to add a prefix to the backup file, instead of (or in addition to) a suffix: - $ perl -pi'bak_*' -e 's/bar/baz/' fileA # backup to 'bak_fileA' + $ perl -pi 'orig_*' -e 's/bar/baz/' fileA # backup to 'orig_fileA' Or even to place backup copies of the original files into another directory (provided the directory already exists): - $ perl -pi'old/*.bak' -e 's/bar/baz/' fileA # backup to 'old/fileA.bak' + $ perl -pi 'old/*.orig' -e 's/bar/baz/' fileA # backup to 'old/fileA.orig' These sets of one-liners are equivalent: $ perl -pi -e 's/bar/baz/' fileA # overwrite current file - $ perl -pi'*' -e 's/bar/baz/' fileA # overwrite current file + $ perl -pi '*' -e 's/bar/baz/' fileA # overwrite current file - $ perl -pi'.bak' -e 's/bar/baz/' fileA # backup to 'fileA.bak' - $ perl -pi'*.bak' -e 's/bar/baz/' fileA # backup to 'fileA.bak' + $ perl -pi '.orig' -e 's/bar/baz/' fileA # backup to 'fileA.orig' + $ perl -pi '*.orig' -e 's/bar/baz/' fileA # backup to 'fileA.orig' From the shell, saying - $ perl -p -i.bak -e "s/foo/bar/; ... " + $ perl -p -i.orig -e "s/foo/bar/; ... " -is the same as using the script: +is the same as using the program: - #!/usr/bin/perl -pi.bak + #!/usr/bin/perl -pi.orig s/foo/bar/; which is equivalent to #!/usr/bin/perl - $extension = '.bak'; - while (<>) { + $extension = '.orig'; + LINE: while (<>) { if ($ARGV ne $oldargv) { if ($extension !~ /\*/) { $backup = $ARGV . $extension; @@ -392,9 +424,9 @@ output filehandle after the loop. As shown above, Perl creates the backup file whether or not any output is actually changed. So this is just a fancy way to copy files: - $ perl -p -i'/some/file/path/*' -e 1 file1 file2 file3... - or - $ perl -p -i'.bak' -e 1 file1 file2 file3... + $ perl -p -i '/some/file/path/*' -e 1 file1 file2 file3... +or + $ perl -p -i '.orig' -e 1 file1 file2 file3... You can use C<eof> without parentheses to locate the end of each input file, in case you want to append to each file, or reset line numbering @@ -404,15 +436,19 @@ If, for a given file, Perl is unable to create the backup file as specified in the extension then it will skip that file and continue on with the next one (if it exists). -For a discussion of issues surrounding file permissions and C<-i>, see -L<perlfaq5/Why does Perl let me delete read-only files? Why does -i clobber protected files? Isn't this a bug in Perl?>. +For a discussion of issues surrounding file permissions and B<-i>, +see L<perlfaq5/Why does Perl let me delete read-only files? Why +does -i clobber protected files? Isn't this a bug in Perl?>. You cannot use B<-i> to create directories or to strip extensions from files. -Perl does not expand C<~>, so don't do that. +Perl does not expand C<~> in filenames, which is good, since some +folks use it for their backup files: -Finally, note that the B<-i> switch does not impede execution when no + $ perl -pi~ -e 's/foo/bar/' file1 file2 file3... + +Finally, the B<-i> switch does not impede execution when no files are given on the command line. In this case, no backup is made (the original file cannot, of course, be determined) and processing proceeds from STDIN to STDOUT as might be expected. @@ -426,13 +462,13 @@ searches /usr/include and /usr/lib/perl. =item B<-l>[I<octnum>] -enables automatic line-ending processing. It has two effects: first, -it automatically chomps "C<$/>" (the input record separator) when used -with B<-n> or B<-p>, and second, it assigns "C<$\>" -(the output record separator) to have the value of I<octnum> so that -any print statements will have that separator added back on. If -I<octnum> is omitted, sets "C<$\>" to the current value of "C<$/>". For -instance, to trim lines to 80 columns: +enables automatic line-ending processing. It has two separate +effects. First, it automatically chomps C<$/> (the input record +separator) when used with B<-n> or B<-p>. Second, it assigns C<$\> +(the output record separator) to have the value of I<octnum> so +that any print statements will have that separator added back on. +If I<octnum> is omitted, sets C<$\> to the current value of +C<$/>. For instance, to trim lines to 80 columns: perl -lpe 'substr($_, 80) = ""' @@ -452,55 +488,59 @@ This sets C<$\> to newline and then sets C<$/> to the null character. =item B<-[mM]>[B<->]I<module=arg[,arg]...> -C<-m>I<module> executes C<use> I<module> C<();> before executing your -script. +B<-m>I<module> executes C<use> I<module> C<();> before executing your +program. -C<-M>I<module> executes C<use> I<module> C<;> before executing your -script. You can use quotes to add extra code after the module name, -e.g., C<-M'module qw(foo bar)'>. +B<-M>I<module> executes C<use> I<module> C<;> before executing your +program. You can use quotes to add extra code after the module name, +e.g., C<'-Mmodule qw(foo bar)'>. -If the first character after the C<-M> or C<-m> is a dash (C<->) +If the first character after the B<-M> or B<-m> is a dash (C<->) then the 'use' is replaced with 'no'. A little builtin syntactic sugar means you can also say -C<-mmodule=foo,bar> or C<-Mmodule=foo,bar> as a shortcut for -C<-M'module qw(foo bar)'>. This avoids the need to use quotes when -importing symbols. The actual code generated by C<-Mmodule=foo,bar> is +B<-mmodule=foo,bar> or B<-Mmodule=foo,bar> as a shortcut for +C<'-Mmodule qw(foo bar)'>. This avoids the need to use quotes when +importing symbols. The actual code generated by B<-Mmodule=foo,bar> is C<use module split(/,/,q{foo,bar})>. Note that the C<=> form -removes the distinction between C<-m> and C<-M>. +removes the distinction between B<-m> and B<-M>. =item B<-n> -causes Perl to assume the following loop around your script, which +causes Perl to assume the following loop around your program, which makes it iterate over filename arguments somewhat like B<sed -n> or B<awk>: + LINE: while (<>) { - ... # your script goes here + ... # your program goes here } Note that the lines are not printed by default. See B<-p> to have lines printed. If a file named by an argument cannot be opened for -some reason, Perl warns you about it, and moves on to the next file. +some reason, Perl warns you about it and moves on to the next file. Here is an efficient way to delete all files older than a week: - find . -mtime +7 -print | perl -nle 'unlink;' + find . -mtime +7 -print | perl -nle unlink -This is faster than using the C<-exec> switch of B<find> because you don't -have to start a process on every filename found. +This is faster than using the B<-exec> switch of B<find> because you don't +have to start a process on every filename found. It does suffer from +the bug of mishandling newlines in pathnames, which you can fix if +you C<BEGIN> and C<END> blocks may be used to capture control before or after -the implicit loop, just as in B<awk>. +the implicit program loop, just as in B<awk>. =item B<-p> -causes Perl to assume the following loop around your script, which +causes Perl to assume the following loop around your program, which makes it iterate over filename arguments somewhat like B<sed>: + LINE: while (<>) { - ... # your script goes here + ... # your program goes here } continue { print or die "-p destination: $!\n"; } @@ -512,30 +552,31 @@ treated as fatal. To suppress printing use the B<-n> switch. A B<-p> overrides a B<-n> switch. C<BEGIN> and C<END> blocks may be used to capture control before or after -the implicit loop, just as in awk. +the implicit loop, just as in B<awk>. =item B<-P> -causes your script to be run through the C preprocessor before -compilation by Perl. (Because both comments and cpp directives begin +causes your program to be run through the C preprocessor before +compilation by Perl. (Because both comments and B<cpp> directives begin with the # character, you should avoid starting comments with any words recognized by the C preprocessor such as "if", "else", or "define".) =item B<-s> -enables some rudimentary switch parsing for switches on the command -line after the script name but before any filename arguments (or before +enables rudimentary switch parsing for switches on the command +line after the program name but before any filename arguments (or before a B<-->). Any switch found there is removed from @ARGV and sets the -corresponding variable in the Perl script. The following script -prints "true" if and only if the script is invoked with a B<-xyz> switch. +corresponding variable in the Perl program. The following program +prints "true" if and only if the program is invoked with a B<-xyz> switch. #!/usr/bin/perl -s - if ($xyz) { print "true\n"; } + if ($xyz) { print "true\n" } =item B<-S> makes Perl use the PATH environment variable to search for the -script (unless the name of the script contains directory separators). +program (unless the name of the program contains directory separators). + On some platforms, this also makes Perl append suffixes to the filename while searching for it. For example, on Win32 platforms, the ".bat" and ".cmd" suffixes are appended if a lookup for the @@ -543,16 +584,6 @@ original name fails, and if the name does not already end in one of those suffixes. If your Perl was compiled with DEBUGGING turned on, using the -Dp switch to Perl shows how the search progresses. -If the filename supplied contains directory separators (i.e. it is an -absolute or relative pathname), and if the file is not found, -platforms that append file extensions will do so and try to look -for the file with those extensions added, one by one. - -On DOS-like platforms, if the script does not contain directory -separators, it will first be searched for in the current directory -before being searched for on the PATH. On Unix platforms, the -script will be searched for strictly on the PATH. - Typically this is used to emulate #! startup on platforms that don't support #!. This example works on many platforms that have a shell compatible with Bourne shell: @@ -561,94 +592,121 @@ have a shell compatible with Bourne shell: eval 'exec /usr/bin/perl -wS $0 ${1+"$@"}' if $running_under_some_shell; -The system ignores the first line and feeds the script to /bin/sh, -which proceeds to try to execute the Perl script as a shell script. +The system ignores the first line and feeds the program to F</bin/sh>, +which proceeds to try to execute the Perl program as a shell script. The shell executes the second line as a normal shell command, and thus starts up the Perl interpreter. On some systems $0 doesn't always contain the full pathname, so the B<-S> tells Perl to search for the -script if necessary. After Perl locates the script, it parses the +program if necessary. After Perl locates the program, it parses the lines and ignores them because the variable $running_under_some_shell -is never true. If the script will be interpreted by csh, you will need +is never true. If the program will be interpreted by csh, you will need to replace C<${1+"$@"}> with C<$*>, even though that doesn't understand embedded spaces (and such) in the argument list. To start up sh rather than csh, some systems may have to replace the #! line with a line containing just a colon, which will be politely ignored by Perl. Other systems can't control that, and need a totally devious construct that -will work under any of csh, sh, or Perl, such as the following: +will work under any of B<csh>, B<sh>, or Perl, such as the following: - eval '(exit $?0)' && eval 'exec /usr/bin/perl -wS $0 ${1+"$@"}' + eval '(exit $?0)' && eval 'exec perl -wS $0 ${1+"$@"}' & eval 'exec /usr/bin/perl -wS $0 $argv:q' if $running_under_some_shell; +If the filename supplied contains directory separators (i.e., is an +absolute or relative pathname), and if that file is not found, +platforms that append file extensions will do so and try to look +for the file with those extensions added, one by one. + +On DOS-like platforms, if the program does not contain directory +separators, it will first be searched for in the current directory +before being searched for on the PATH. On Unix platforms, the +program will be searched for strictly on the PATH. + =item B<-T> forces "taint" checks to be turned on so you can test them. Ordinarily -these checks are done only when running setuid or setgid. It's a good -idea to turn them on explicitly for programs run on another's behalf, -such as CGI programs. See L<perlsec>. Note that (for security reasons) -this option must be seen by Perl quite early; usually this means it must -appear early on the command line or in the #! line (for systems which -support that). +these checks are done only when running setuid or setgid. It's a +good idea to turn them on explicitly for programs that run on behalf +of someone else whom you might not necessarily trust, such as CGI +programs or any internet servers you might write in Perl. See +L<perlsec> for details. For security reasons, this option must be +seen by Perl quite early; usually this means it must appear early +on the command line or in the #! line for systems which support +that construct. =item B<-u> -causes Perl to dump core after compiling your script. You can then -in theory take this core dump and turn it into an executable file by using the -B<undump> program (not supplied). This speeds startup at the expense of -some disk space (which you can minimize by stripping the executable). -(Still, a "hello world" executable comes out to about 200K on my -machine.) If you want to execute a portion of your script before dumping, -use the dump() operator instead. Note: availability of B<undump> is -platform specific and may not be available for a specific port of -Perl. It has been superseded by the new perl-to-C compiler, which is more -portable, even though it's still only considered beta. +This obsolete switch causes Perl to dump core after compiling your +program. You can then in theory take this core dump and turn it +into an executable file by using the B<undump> program (not supplied). +This speeds startup at the expense of some disk space (which you +can minimize by stripping the executable). (Still, a "hello world" +executable comes out to about 200K on my machine.) If you want to +execute a portion of your program before dumping, use the dump() +operator instead. Note: availability of B<undump> is platform +specific and may not be available for a specific port of Perl. + +This switch has been superseded in favor of the new Perl code +generator backends to the compiler. See L<B> and L<B::Bytecode> +for details. =item B<-U> allows Perl to do unsafe operations. Currently the only "unsafe" operations are the unlinking of directories while running as superuser, and running setuid programs with fatal taint checks turned into -warnings. Note that the B<-w> switch (or the C<$^W> variable) must -be used along with this option to actually B<generate> the +warnings. Note that the B<-w> switch (or the C<$^W> variable) must +be used along with this option to actually I<generate> the taint-check warnings. =item B<-v> -prints the version and patchlevel of your Perl executable. +prints the version and patchlevel of your perl executable. =item B<-V> prints summary of the major perl configuration values and the current -value of @INC. +values of @INC. =item B<-V:>I<name> Prints to STDOUT the value of the named configuration variable. +For example, -=item B<-w> + $ perl -V:man.dir + +will provide strong clues about what your MANPATH variable should +be set to in order to access the Perl documentation. -prints warnings about variable names that are mentioned only once, and -scalar variables that are used before being set. Also warns about -redefined subroutines, and references to undefined filehandles or -filehandles opened read-only that you are attempting to write on. Also -warns you if you use values as a number that doesn't look like numbers, -using an array as though it were a scalar, if your subroutines recurse -more than 100 deep, and innumerable other things. +=item B<-w> -You can disable specific warnings using C<__WARN__> hooks, as described -in L<perlvar> and L<perlfunc/warn>. See also L<perldiag> and L<perltrap>. +prints warnings about dubious constructs, such as variable names +that are mentioned only once and scalar variables that are used +before being set, redefined subroutines, references to undefined +filehandles or filehandles opened read-only that you are attempting +to write on, values used as a number that doesn't look like numbers, +using an array as though it were a scalar, if your subroutines +recurse more than 100 deep, and innumerable other things. + +This switch really just enables the internal C<^$W> variable. You +can disable or promote into fatal errors specific warnings using +C<__WARN__> hooks, as described in L<perlvar> and L<perlfunc/warn>. +See also L<perldiag> and L<perltrap>. A new, fine-grained warning +facility is also available if you want to manipulate entire classes +of warnings; see L<warning> (or better yet, its source code) about +that. =item B<-x> I<directory> -tells Perl that the script is embedded in a message. Leading -garbage will be discarded until the first line that starts with #! and -contains the string "perl". Any meaningful switches on that line will -be applied. If a directory name is specified, Perl will switch to -that directory before running the script. The B<-x> switch controls -only the disposal of leading garbage. The script must be -terminated with C<__END__> if there is trailing garbage to be ignored (the -script can process any or all of the trailing garbage via the DATA -filehandle if desired). +tells Perl that the program is embedded in a larger chunk of unrelated +ASCII text, such as in a mail message. Leading garbage will be +discarded until the first line that starts with #! and contains the +string "perl". Any meaningful switches on that line will be applied. +If a directory name is specified, Perl will switch to that directory +before running the program. The B<-x> switch controls only the +disposal of leading garbage. The program must be terminated with +C<__END__> if there is trailing garbage to be ignored (the program +can process any or all of the trailing garbage via the DATA filehandle +if desired). =back @@ -666,7 +724,7 @@ Used if chdir has no argument and HOME is not set. =item PATH -Used in executing subprocesses, and in finding the script if B<-S> is +Used in executing subprocesses, and in finding the program if B<-S> is used. =item PERL5LIB @@ -674,8 +732,8 @@ used. A colon-separated list of directories in which to look for Perl library files before looking in the standard library and the current directory. If PERL5LIB is not defined, PERLLIB is used. When running -taint checks (because the script was running setuid or setgid, or the -B<-T> switch was used), neither variable is used. The script should +taint checks (because the program was running setuid or setgid, or the +B<-T> switch was used), neither variable is used. The program should instead say use lib "/my/directory"; @@ -684,7 +742,7 @@ instead say Command-line options (switches). Switches in this variable are taken as if they were on every Perl command line. Only the B<-[DIMUdmw]> -switches are allowed. When running taint checks (because the script +switches are allowed. When running taint checks (because the program was running setuid or setgid, or the B<-T> switch was used), this variable is ignored. If PERL5OPT begins with B<-T>, tainting will be enabled, and any subsequent options ignored. @@ -701,12 +759,12 @@ The command used to load the debugger code. The default is: BEGIN { require 'perl5db.pl' } -=item PERL5SHELL (specific to WIN32 port) +=item PERL5SHELL (specific to the Win32 port) May be set to an alternative shell that perl must use internally for executing "backtick" commands or system(). Default is C<cmd.exe /x/c> on WindowsNT and C<command.com /c> on Windows95. The value is considered -to be space delimited. Precede any character that needs to be protected +to be space-separated. Precede any character that needs to be protected (like a space or backslash) with a backslash. Note that Perl doesn't use COMSPEC for this purpose because @@ -736,12 +794,11 @@ Perl also has environment variables that control how Perl handles data specific to particular natural languages. See L<perllocale>. Apart from these, Perl uses no other environment variables, except -to make them available to the script being executed, and to child -processes. However, scripts running setuid would do well to execute +to make them available to the program being executed, and to child +processes. However, programs running setuid would do well to execute the following lines before doing anything else, just to keep people honest: - $ENV{PATH} = '/bin:/usr/bin'; # or whatever you need + $ENV{PATH} = '/bin:/usr/bin'; # or whatever you need $ENV{SHELL} = '/bin/sh' if exists $ENV{SHELL}; delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; - diff --git a/pod/perlsec.pod b/pod/perlsec.pod index 0b22acd9cd..212879af93 100644 --- a/pod/perlsec.pod +++ b/pod/perlsec.pod @@ -139,7 +139,7 @@ metacharacters, nor are dot, dash, or at going to mean something special to the shell. Use of C</.+/> would have been insecure in theory because it lets everything through, but Perl doesn't check for that. The lesson is that when untainting, you must be exceedingly careful with your patterns. -Laundering data using regular expression is the I<ONLY> mechanism for +Laundering data using regular expression is the I<only> mechanism for untainting dirty data, unless you use the strategy detailed below to fork a child of lesser privilege. diff --git a/pod/perlsub.pod b/pod/perlsub.pod index bfab0fe81e..2bd1cfd1ee 100644 --- a/pod/perlsub.pod +++ b/pod/perlsub.pod @@ -19,22 +19,23 @@ To define an anonymous subroutine at runtime: To import subroutines: - use PACKAGE qw(NAME1 NAME2 NAME3); + use MODULE qw(NAME1 NAME2 NAME3); To call subroutines: NAME(LIST); # & is optional with parentheses. NAME LIST; # Parentheses optional if predeclared/imported. + &NAME(LIST); # Circumvent prototypes. &NAME; # Makes current @_ visible to called subroutine. =head1 DESCRIPTION -Like many languages, Perl provides for user-defined subroutines. These -may be located anywhere in the main program, loaded in from other files -via the C<do>, C<require>, or C<use> keywords, or even generated on the -fly using C<eval> or anonymous subroutines (closures). You can even call -a function indirectly using a variable containing its name or a CODE reference -to it. +Like many languages, Perl provides for user-defined subroutines. +These may be located anywhere in the main program, loaded in from +other files via the C<do>, C<require>, or C<use> keywords, or +generated on the fly using C<eval> or anonymous subroutines (closures). +You can even call a function indirectly using a variable containing +its name or a CODE reference. The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and @@ -44,37 +45,38 @@ collapse, losing their identities--but you may always use pass-by-reference instead to avoid this. Both call and return lists may contain as many or as few scalar elements as you'd like. (Often a function without an explicit return statement is called a subroutine, but -there's really no difference from the language's perspective.) - -Any arguments passed to the routine come in as the array C<@_>. Thus if you -called a function with two arguments, those would be stored in C<$_[0]> -and C<$_[1]>. The array C<@_> is a local array, but its elements are -aliases for the actual scalar parameters. In particular, if an element -C<$_[0]> is updated, the corresponding argument is updated (or an error -occurs if it is not updatable). If an argument is an array or hash -element which did not exist when the function was called, that element is -created only when (and if) it is modified or if a reference to it is -taken. (Some earlier versions of Perl created the element whether or not -it was assigned to.) Note that assigning to the whole array C<@_> removes -the aliasing, and does not update any arguments. - -The return value of the subroutine is the value of the last expression -evaluated. Alternatively, a C<return> statement may be used to exit the +there's really no difference from Perl's perspective.) + +Any arguments passed in show up in the array C<@_>. Therefore, if +you called a function with two arguments, those would be stored in +C<$_[0]> and C<$_[1]>. The array C<@_> is a local array, but its +elements are aliases for the actual scalar parameters. In particular, +if an element C<$_[0]> is updated, the corresponding argument is +updated (or an error occurs if it is not updatable). If an argument +is an array or hash element which did not exist when the function +was called, that element is created only when (and if) it is modified +or a reference to it is taken. (Some earlier versions of Perl +created the element whether or not the element was assigned to.) +Assigning to the whole array C<@_> removes that aliasing, and does +not update any arguments. + +The return value of a subroutine is the value of the last expression +evaluated. More explicitly, a C<return> statement may be used to exit the subroutine, optionally specifying the returned value, which will be evaluated in the appropriate context (list, scalar, or void) depending on the context of the subroutine call. If you specify no return value, -the subroutine will return an empty list in a list context, an undefined -value in a scalar context, or nothing in a void context. If you return -one or more arrays and/or hashes, these will be flattened together into -one large indistinguishable list. - -Perl does not have named formal parameters, but in practice all you do is -assign to a C<my()> list of these. Any variables you use in the function -that aren't declared private are global variables. For the gory details -on creating private variables, see -L<"Private Variables via my()"> and L<"Temporary Values via local()">. -To create protected environments for a set of functions in a separate -package (and probably a separate file), see L<perlmod/"Packages">. +the subroutine returns an empty list in list context, the undefined +value in scalar context, or nothing in void context. If you return +one or more aggregates (arrays and hashes), these will be flattened +together into one large indistinguishable list. + +Perl does not have named formal parameters. In practice all you +do is assign to a C<my()> list of these. Variables that aren't +declared to be private are global variables. For gory details +on creating private variables, see L<"Private Variables via my()"> +and L<"Temporary Values via local()">. To create protected +environments for a set of functions in a separate package (and +probably a separate file), see L<perlmod/"Packages">. Example: @@ -93,7 +95,7 @@ Example: # that start with whitespace sub get_line { - $thisline = $lookahead; # GLOBAL VARIABLES!! + $thisline = $lookahead; # global variables! LINE: while (defined($lookahead = <STDIN>)) { if ($lookahead =~ /^[ \t]/) { $thisline .= $lookahead; @@ -102,24 +104,25 @@ Example: last LINE; } } - $thisline; + return $thisline; } $lookahead = <STDIN>; # get first line - while ($_ = get_line()) { + while (defined($line = get_line())) { ... } -Use array assignment to a local list to name your formal arguments: +Asisng to a list of private variables to name your arguments: sub maybeset { my($key, $value) = @_; $Foo{$key} = $value unless $Foo{$key}; } -This also has the effect of turning call-by-reference into call-by-value, -because the assignment copies the values. Otherwise a function is free to -do in-place modifications of C<@_> and change its caller's values. +Because the assignment copies the values, this also has the effect +of turning call-by-reference into call-by-value. Otherwise a +function is free to do in-place modifications of C<@_> and change +its caller's values. upcase_in($v1, $v2); # this changes $v1 and $v2 sub upcase_in { @@ -136,7 +139,7 @@ It would be much safer if the C<upcase_in()> function were written to return a copy of its parameters instead of changing them in place: - ($v3, $v4) = upcase($v1, $v2); # this doesn't + ($v3, $v4) = upcase($v1, $v2); # this doesn't change $v1 and $v2 sub upcase { return unless defined wantarray; # void context, do nothing my @parms = @_; @@ -144,12 +147,12 @@ of changing them in place: return wantarray ? @parms : $parms[0]; } -Notice how this (unprototyped) function doesn't care whether it was passed -real scalars or arrays. Perl will see everything as one big long flat C<@_> -parameter list. This is one of the ways where Perl's simple -argument-passing style shines. The C<upcase()> function would work perfectly -well without changing the C<upcase()> definition even if we fed it things -like this: +Notice how this (unprototyped) function doesn't care whether it was +passed real scalars or arrays. Perl sees all arugments as one big, +long, flat parameter list in C<@_>. This is one area where +Perl's simple argument-passing style shines. The C<upcase()> +function would work perfectly well without changing the C<upcase()> +definition even if we fed it things like this: @newlist = upcase(@list1, @list2); @newlist = upcase( split /:/, $var ); @@ -158,24 +161,26 @@ Do not, however, be tempted to do this: (@a, @b) = upcase(@list1, @list2); -Because like its flat incoming parameter list, the return list is also -flat. So all you have managed to do here is stored everything in C<@a> and -made C<@b> an empty list. See L<Pass by Reference> for alternatives. - -A subroutine may be called using the "C<&>" prefix. The "C<&>" is optional -in modern Perls, and so are the parentheses if the subroutine has been -predeclared. (Note, however, that the "C<&>" is I<NOT> optional when -you're just naming the subroutine, such as when it's used as an -argument to C<defined()> or C<undef()>. Nor is it optional when you want to -do an indirect subroutine call with a subroutine name or reference -using the C<&$subref()> or C<&{$subref}()> constructs. See L<perlref> -for more on that.) - -Subroutines may be called recursively. If a subroutine is called using -the "C<&>" form, the argument list is optional, and if omitted, no C<@_> array is -set up for the subroutine: the C<@_> array at the time of the call is -visible to subroutine instead. This is an efficiency mechanism that -new users may wish to avoid. +Like the flattened incoming parameter list, the return list is also +flattened on return. So all you have managed to do here is stored +everything in C<@a> and made C<@b> an empty list. See L<Pass by +Reference> for alternatives. + +A subroutine may be called using an explicit C<&> prefix. The +C<&> is optional in modern Perl, as are parentheses if the +subroutine has been predeclared. The C<&> is I<not> optional +when just naming the subroutine, such as when it's used as +an argument to defined() or undef(). Nor is it optional when you +want to do an indirect subroutine call with a subroutine name or +reference using the C<&$subref()> or C<&{$subref}()> constructs, +although the C<$subref-E<gt>()> notation solves that problem. +See L<perlref> for more about all that. + +Subroutines may be called recursively. If a subroutine is called +using the C<&> form, the argument list is optional, and if omitted, +no C<@_> array is set up for the subroutine: the C<@_> array at the +time of the call is visible to subroutine instead. This is an +efficiency mechanism that new users may wish to avoid. &foo(1,2,3); # pass three arguments foo(1,2,3); # the same @@ -186,18 +191,19 @@ new users may wish to avoid. &foo; # foo() get current args, like foo(@_) !! foo; # like foo() IFF sub foo predeclared, else "foo" -Not only does the "C<&>" form make the argument list optional, but it also -disables any prototype checking on the arguments you do provide. This +Not only does the C<&> form make the argument list optional, it also +disables any prototype checking on arguments you do provide. This is partly for historical reasons, and partly for having a convenient way -to cheat if you know what you're doing. See the section on Prototypes below. +to cheat if you know what you're doing. See L<Prototypes> below. -Function whose names are in all upper case are reserved to the Perl core, -just as are modules whose names are in all lower case. A function in -all capitals is a loosely-held convention meaning it will be called -indirectly by the run-time system itself. Functions that do special, -pre-defined things are C<BEGIN>, C<END>, C<AUTOLOAD>, and C<DESTROY>--plus all the -functions mentioned in L<perltie>. The 5.005 release adds C<INIT> -to this list. +Function whose names are in all upper case are reserved to the Perl +core, as are modules whose names are in all lower case. A +function in all capitals is a loosely-held convention meaning it +will be called indirectly by the run-time system itself, usually +due to a triggered event. Functions that do special, pre-defined +things include C<BEGIN>, C<END>, C<AUTOLOAD>, and C<DESTROY>--plus +all functions mentioned in L<perltie>. The 5.005 release adds +C<INIT> to this list. =head2 Private Variables via my() @@ -208,36 +214,38 @@ Synopsis: my $foo = "flurp"; # declare $foo lexical, and init it my @oof = @bar; # declare @oof lexical, and init it -A "C<my>" declares the listed variables to be confined (lexically) to the -enclosing block, conditional (C<if/unless/elsif/else>), loop -(C<for/foreach/while/until/continue>), subroutine, C<eval>, or -C<do/require/use>'d file. If more than one value is listed, the list -must be placed in parentheses. All listed elements must be legal lvalues. -Only alphanumeric identifiers may be lexically scoped--magical -builtins like C<$/> must currently be C<local>ize with "C<local>" instead. - -Unlike dynamic variables created by the "C<local>" operator, lexical -variables declared with "C<my>" are totally hidden from the outside world, -including any called subroutines (even if it's the same subroutine called -from itself or elsewhere--every call gets its own copy). - -This doesn't mean that a C<my()> variable declared in a statically -I<enclosing> lexical scope would be invisible. Only the dynamic scopes -are cut off. For example, the C<bumpx()> function below has access to the -lexical C<$x> variable because both the my and the sub occurred at the same -scope, presumably the file scope. +The C<my> operator declares the listed variables to be lexically +confined to the enclosing block, conditional (C<if/unless/elsif/else>), +loop (C<for/foreach/while/until/continue>), subroutine, C<eval>, +or C<do/require/use>'d file. If more than one value is listed, the +list must be placed in parentheses. All listed elements must be +legal lvalues. Only alphanumeric identifiers may be lexically +scoped--magical built-in like C<$/> must currently be C<local>ize +with C<local> instead. + +Unlike dynamic variables created by the C<local> operator, lexical +variables declared with C<my> are totally hidden from the outside +world, including any called subroutines. This is true if it's the +same subroutine called from itself or elsewhere--every call gets +its own copy. + +This doesn't mean that a C<my> variable declared in a statically +enclosing lexical scope would be invisible. Only dynamic scopes +are cut off. For example, the C<bumpx()> function below has access +to the lexical $x variable because both the C<my> and the C<sub> +occurred at the same scope, presumably file scope. my $x = 10; sub bumpx { $x++ } -(An C<eval()>, however, can see the lexical variables of the scope it is -being evaluated in so long as the names aren't hidden by declarations within -the C<eval()> itself. See L<perlref>.) +An C<eval()>, however, can see lexical variables of the scope it is +being evaluated in, so long as the names aren't hidden by declarations within +the C<eval()> itself. See L<perlref>. -The parameter list to C<my()> may be assigned to if desired, which allows you +The parameter list to my() may be assigned to if desired, which allows you to initialize your variables. (If no initializer is given for a particular variable, it is created with the undefined value.) Commonly -this is used to name the parameters to a subroutine. Examples: +this is used to name input parameters to a subroutine. Examples: $arg = "fred"; # "global" variable $n = cube_root(27); @@ -250,8 +258,8 @@ this is used to name the parameters to a subroutine. Examples: return $arg; } -The "C<my>" is simply a modifier on something you might assign to. So when -you do assign to the variables in its argument list, the "C<my>" doesn't +The C<my> is simply a modifier on something you might assign to. So when +you do assign to variables in its argument list, C<my> doesn't change whether those variables are viewed as a scalar or an array. So my ($foo) = <STDIN>; # WRONG? @@ -275,24 +283,24 @@ the current statement. Thus, my $x = $x; -can be used to initialize the new $x with the value of the old C<$x>, and +can be used to initialize a new $x with the value of the old $x, and the expression my $x = 123 and $x == 123 -is false unless the old C<$x> happened to have the value C<123>. +is false unless the old $x happened to have the value C<123>. Lexical scopes of control structures are not bounded precisely by the braces that delimit their controlled blocks; control expressions are -part of the scope, too. Thus in the loop +part of that scope, too. Thus in the loop - while (defined(my $line = <>)) { + while (my $line = <>) { $line = lc $line; } continue { print $line; } -the scope of C<$line> extends from its declaration throughout the rest of +the scope of $line extends from its declaration throughout the rest of the loop construct (including the C<continue> clause), but not beyond it. Similarly, in the conditional @@ -305,44 +313,48 @@ it. Similarly, in the conditional die "'$answer' is neither 'yes' nor 'no'"; } -the scope of C<$answer> extends from its declaration throughout the rest -of the conditional (including C<elsif> and C<else> clauses, if any), +the scope of $answer extends from its declaration through the rest +of that conditional, including any C<elsif> and C<else> clauses, but not beyond it. -(None of the foregoing applies to C<if/unless> or C<while/until> +None of the foregoing text applies to C<if/unless> or C<while/until> modifiers appended to simple statements. Such modifiers are not -control structures and have no effect on scoping.) +control structures and have no effect on scoping. The C<foreach> loop defaults to scoping its index variable dynamically -(in the manner of C<local>; see below). However, if the index -variable is prefixed with the keyword "C<my>", then it is lexically -scoped instead. Thus in the loop +in the manner of C<local>. However, if the index variable is +prefixed with the keyword C<my>, or if there is already a lexical +by that name in scope, then a new lexical is created instead. Thus +in the loop for my $i (1, 2, 3) { some_function(); } -the scope of C<$i> extends to the end of the loop, but not beyond it, and -so the value of C<$i> is unavailable in C<some_function()>. +the scope of $i extends to the end of the loop, but not beyond it, +rendering the value of $i inaccessible within C<some_function()>. Some users may wish to encourage the use of lexically scoped variables. -As an aid to catching implicit references to package variables, -if you say +As an aid to catching implicit uses to package variables, +which are always global, if you say use strict 'vars'; -then any variable reference from there to the end of the enclosing -block must either refer to a lexical variable, or must be fully -qualified with the package name. A compilation error results -otherwise. An inner block may countermand this with S<"C<no strict 'vars'>">. - -A C<my()> has both a compile-time and a run-time effect. At compile time, -the compiler takes notice of it; the principle usefulness of this is to -quiet S<"C<use strict 'vars'>">. The actual initialization is delayed until -run time, so it gets executed appropriately; every time through a loop, -for example. - -Variables declared with "C<my>" are not part of any package and are therefore +then any variable mentioned from there to the end of the enclosing +block must either refer to a lexical variable, be predeclared via +C<use vars>, or else must be fully qualified with the package name. +A compilation error results otherwise. An inner block may countermand +this with C<no strict 'vars'>. + +A C<my> has both a compile-time and a run-time effect. At compile +time, the compiler takes notice of it. The principle usefulness +of this is to quiet C<use strict 'vars'>, but it is also essential +for generation of closures as detailed in L<perlref>. Actual +initialization is delayed until run time, though, so it gets executed +at the appropriate time, such as each time through a loop, for +example. + +Variables declared with C<my> are not part of any package and are therefore never fully qualified with the package name. In particular, you're not allowed to try to make a package variable (or other global) lexical: @@ -360,13 +372,14 @@ lexical of the same name is also visible: That will print out C<20> and C<10>. -You may declare "C<my>" variables at the outermost scope of a file to hide -any such identifiers totally from the outside world. This is similar -to C's static variables at the file level. To do this with a subroutine -requires the use of a closure (anonymous function with lexical access). -If a block (such as an C<eval()>, function, or C<package>) wants to create -a private subroutine that cannot be called from outside that block, -it can declare a lexical variable containing an anonymous sub reference: +You may declare C<my> variables at the outermost scope of a file +to hide any such identifiers from the world outside that file. This +is similar in spirit to C's static variables when they are used at +the file level. To do this with a subroutine requires the use of +a closure (an anonymous function that accesses enclosing lexicals). +If you want to create a private subroutine that cannot be called +from outside that block, it can declare a lexical variable containing +an anonymous sub reference: my $secret_version = '1.001-beta'; my $secret_sub = sub { print $secret_version }; @@ -375,11 +388,13 @@ it can declare a lexical variable containing an anonymous sub reference: As long as the reference is never returned by any function within the module, no outside module can see the subroutine, because its name is not in any package's symbol table. Remember that it's not I<REALLY> called -C<$some_pack::secret_version> or anything; it's just C<$secret_version>, +C<$some_pack::secret_version> or anything; it's just $secret_version, unqualified and unqualifiable. -This does not work with object methods, however; all object methods have -to be in the symbol table of some package to be found. +This does not work with object methods, however; all object methods +have to be in the symbol table of some package to be found. See +L<perlref/"Function Templates"> for something of a work-around to +this. =head2 Persistent Private Variables @@ -415,7 +430,7 @@ and put the static variable outside the function but in the block. If this function is being sourced in from a separate file via C<require> or C<use>, then this is probably just fine. If it's -all in the main program, you'll need to arrange for the C<my()> +all in the main program, you'll need to arrange for the C<my> to be executed early, either by putting the whole block above your main program, or more likely, placing merely a C<BEGIN> sub around it to make sure it gets executed before your program @@ -428,20 +443,21 @@ starts to run: } } -See L<perlmod/"Package Constructors and Destructors"> about the C<BEGIN> function. +See L<perlmod/"Package Constructors and Destructors"> about the +special triggered functions, C<BEGIN> and C<INIT>. -If declared at the outermost scope, the file scope, then lexicals work -someone like C's file statics. They are available to all functions in -that same file declared below them, but are inaccessible from outside of -the file. This is sometimes used in modules to create private variables -for the whole module. +If declared at the outermost scope (the file scope), then lexicals +work somewhat like C's file statics. They are available to all +functions in that same file declared below them, but are inaccessible +from outside that file. This strategy is sometimes used in modules +to create private variables that the whole module can see. =head2 Temporary Values via local() -B<NOTE>: In general, you should be using "C<my>" instead of "C<local>", because +B<WARNING>: In general, you should be using C<my> instead of C<local>, because it's faster and safer. Exceptions to this include the global punctuation variables, filehandles and formats, and direct manipulation of the Perl -symbol table itself. Format variables often use "C<local>" though, as do +symbol table itself. Format variables often use C<local> though, as do other variables whose current value must be visible to called subroutines. @@ -458,14 +474,14 @@ Synopsis: local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc -A C<local()> modifies its listed variables to be "local" to the enclosing -block, C<eval>, or C<do FILE>--and to I<any subroutine called from within that block>. -A C<local()> just gives temporary values to global (meaning package) -variables. It does B<not> create a local variable. This is known as -dynamic scoping. Lexical scoping is done with "C<my>", which works more -like C's auto declarations. +A C<local> modifies its listed variables to be "local" to the +enclosing block, C<eval>, or C<do FILE>--and to I<any subroutine +called from within that block>. A C<local> just gives temporary +values to global (meaning package) variables. It does I<not> create +a local variable. This is known as dynamic scoping. Lexical scoping +is done with C<my>, which works more like C's auto declarations. -If more than one variable is given to C<local()>, they must be placed in +If more than one variable is given to C<local>, they must be placed in parentheses. All listed elements must be legal lvalues. This operator works by saving the current values of those variables in its argument list on a hidden stack and restoring them upon exiting the block, subroutine, or @@ -490,7 +506,7 @@ subroutine. Examples: } # old %digits restored here -Because C<local()> is a run-time command, it gets executed every time +Because C<local> is a run-time operator, it gets executed each time through a loop. In releases of Perl previous to 5.0, this used more stack storage each time until the loop was exited. Perl now reclaims the space each time through, but it's still more efficient to declare your variables @@ -581,34 +597,15 @@ Perl will print This is a test only a test. The array has 6 elements: 0, 1, 2, undef, undef, 5 -Note also that when you C<local>ize a member of a composite type that -B<does not exist previously>, the value is treated as though it were -in an lvalue context, i.e., it is first created and then C<local>ized. -The consequence of this is that the hash or array is in fact permanently -modified. For instance, if you say - - %hash = ( 'This' => 'is', 'a' => 'test' ); - @ary = ( 0..5 ); - { - local($ary[8]) = 0; - local($hash{'b'}) = 'whatever'; - } - printf "%%hash has now %d keys, \@ary %d elements.\n", - scalar(keys(%hash)), scalar(@ary); - -Perl will print - - %hash has now 3 keys, @ary 9 elements. - -The above behavior of local() on non-existent members of composite +The behavior of local() on non-existent members of composite types is subject to change in future. =head2 Passing Symbol Table Entries (typeglobs) -[Note: The mechanism described in this section was originally the only -way to simulate pass-by-reference in older versions of Perl. While it -still works fine in modern versions, the new reference mechanism is -generally easier to work with. See below.] +B<WARNING>: The mechanism described in this section was originally +the only way to simulate pass-by-reference in older versions of +Perl. While it still works fine in modern versions, the new reference +mechanism is generally easier to work with. See below. Sometimes you don't want to pass the value of an array to a subroutine but rather the name of it, so that the subroutine can modify the global @@ -621,7 +618,7 @@ funny prefix characters on variables and subroutines and such. When evaluated, the typeglob produces a scalar value that represents all the objects of that name, including any filehandle, format, or subroutine. When assigned to, it causes the name mentioned to refer to -whatever "C<*>" value was assigned to it. Example: +whatever C<*> value was assigned to it. Example: sub doubleary { local(*someary) = @_; @@ -632,7 +629,7 @@ whatever "C<*>" value was assigned to it. Example: doubleary(*foo); doubleary(*bar); -Note that scalars are already passed by reference, so you can modify +Scalars are already passed by reference, so you can modify scalar arguments without using this mechanism by referring explicitly to C<$_[0]> etc. You can modify all the elements of an array by passing all the elements as scalars, but you have to use the C<*> mechanism (or @@ -647,13 +644,13 @@ L<perldata/"Typeglobs and Filehandles">. =head2 When to Still Use local() -Despite the existence of C<my()>, there are still three places where the -C<local()> operator still shines. In fact, in these three places, you +Despite the existence of C<my>, there are still three places where the +C<local> operator still shines. In fact, in these three places, you I<must> use C<local> instead of C<my>. =over -=item 1. You need to give a global variable a temporary value, especially C<$_>. +=item 1. You need to give a global variable a temporary value, especially $_. The global variables, like C<@ARGV> or the punctuation variables, must be C<local>ized with C<local()>. This block reads in F</etc/motd>, and splits @@ -667,7 +664,7 @@ in C<@Fields>. @Fields = split /^\s*=+\s*$/; } -It particular, it's important to C<local>ize C<$_> in any routine that assigns +It particular, it's important to C<local>ize $_ in any routine that assigns to it. Look out for implicit assignments in C<while> conditionals. =item 2. You need to create a local file or directory handle or a local function. @@ -724,9 +721,9 @@ you're going to have to use an explicit pass-by-reference. Before you do that, you need to understand references as detailed in L<perlref>. This section may not make much sense to you otherwise. -Here are a few simple examples. First, let's pass in several -arrays to a function and have it C<pop> all of then, return a new -list of all their former last elements: +Here are a few simple examples. First, let's pass in several arrays +to a function and have it C<pop> all of then, returning a new list +of all their former last elements: @tailings = popmany ( \@a, \@b, \@c, \@d ); @@ -765,9 +762,10 @@ Where people get into trouble is here: or (%a, %b) = func(%c, %d); -That syntax simply won't work. It sets just C<@a> or C<%a> and clears the C<@b> or -C<%b>. Plus the function didn't get passed into two separate arrays or -hashes: it got one long list in C<@_>, as always. +That syntax simply won't work. It sets just C<@a> or C<%a> and +clears the C<@b> or C<%b>. Plus the function didn't get passed +into two separate arrays or hashes: it got one long list in C<@_>, +as always. If you can arrange for everyone to deal with this through references, it's cleaner code, although not so nice to look at. Here's a function that @@ -799,12 +797,13 @@ It turns out that you can actually do this also: } Here we're using the typeglobs to do symbol table aliasing. It's -a tad subtle, though, and also won't work if you're using C<my()> -variables, because only globals (well, and C<local()>s) are in the symbol table. +a tad subtle, though, and also won't work if you're using C<my> +variables, because only globals (even in disguised as C<local>s) +are in the symbol table. If you're passing around filehandles, you could usually just use the bare -typeglob, like C<*STDOUT>, but typeglobs references would be better because -they'll still work properly under S<C<use strict 'refs'>>. For example: +typeglob, like C<*STDOUT>, but typeglobs references work, too. +For example: splutter(\*STDOUT); sub splutter { @@ -818,45 +817,41 @@ they'll still work properly under S<C<use strict 'refs'>>. For example: return scalar <$fh>; } -Another way to do this is using C<*HANDLE{IO}>, see L<perlref> for usage -and caveats. - -If you're planning on generating new filehandles, you could do this: +If you're planning on generating new filehandles, you could do this. +Notice to pass back just the bare *FH, not its reference. sub openit { - my $name = shift; + my $path = shift; local *FH; return open (FH, $path) ? *FH : undef; } -Although that will actually produce a small memory leak. See the bottom -of L<perlfunc/open()> for a somewhat cleaner way using the C<IO::Handle> -package. - =head2 Prototypes -As of the 5.002 release of perl, if you declare +Perl supports a very limited kind of compile-time argument checking +using function prototyping. If you declare sub mypush (\@@) -then C<mypush()> takes arguments exactly like C<push()> does. The declaration -of the function to be called must be visible at compile time. The prototype -affects only the interpretation of new-style calls to the function, where -new-style is defined as not using the C<&> character. In other words, -if you call it like a builtin function, then it behaves like a builtin -function. If you call it like an old-fashioned subroutine, then it -behaves like an old-fashioned subroutine. It naturally falls out from -this rule that prototypes have no influence on subroutine references -like C<\&foo> or on indirect subroutine calls like C<&{$subref}> or -C<$subref-E<gt>()>. +then C<mypush()> takes arguments exactly like C<push()> does. The +function declaration must be visible at compile time. The prototype +affects only interpretation of new-style calls to the function, +where new-style is defined as not using the C<&> character. In +other words, if you call it like a built-in function, then it behaves +like a built-in function. If you call it like an old-fashioned +subroutine, then it behaves like an old-fashioned subroutine. It +naturally falls out from this rule that prototypes have no influence +on subroutine references like C<\&foo> or on indirect subroutine +calls like C<&{$subref}> or C<$subref-E<gt>()>. Method calls are not influenced by prototypes either, because the -function to be called is indeterminate at compile time, because it depends -on inheritance. +function to be called is indeterminate at compile time, since +the exact code called depends on inheritance. -Because the intent is primarily to let you define subroutines that work -like builtin commands, here are the prototypes for some other functions -that parse almost exactly like the corresponding builtins. +Because the intent of this feature is primarily to let you define +subroutines that work like built-in functions, here are prototypes +for some other functions that parse almost exactly like the +corresponding built-in. Declared as Called as @@ -877,35 +872,35 @@ that parse almost exactly like the corresponding builtins. Any backslashed prototype character represents an actual argument that absolutely must start with that character. The value passed -to the subroutine (as part of C<@_>) will be a reference to the -actual argument given in the subroutine call, obtained by applying -C<\> to that argument. +as part of C<@_> will be a reference to the actual argument given +in the subroutine call, obtained by applying C<\> to that argument. Unbackslashed prototype characters have special meanings. Any -unbackslashed C<@> or C<%> eats all the rest of the arguments, and forces +unbackslashed C<@> or C<%> eats all remaining arguments, and forces list context. An argument represented by C<$> forces scalar context. An C<&> requires an anonymous subroutine, which, if passed as the first -argument, does not require the "C<sub>" keyword or a subsequent comma. A +argument, does not require the C<sub> keyword or a subsequent comma. A C<*> allows the subroutine to accept a bareword, constant, scalar expression, typeglob, or a reference to a typeglob in that slot. The value will be available to the subroutine either as a simple scalar, or (in the latter two cases) as a reference to the typeglob. A semicolon separates mandatory arguments from optional arguments. -(It is redundant before C<@> or C<%>.) +It is redundant before C<@> or C<%>, which gobble up everything else. -Note how the last three examples above are treated specially by the parser. -C<mygrep()> is parsed as a true list operator, C<myrand()> is parsed as a -true unary operator with unary precedence the same as C<rand()>, and -C<mytime()> is truly without arguments, just like C<time()>. That is, if you -say +Note how the last three examples in the table above are treated +specially by the parser. C<mygrep()> is parsed as a true list +operator, C<myrand()> is parsed as a true unary operator with unary +precedence the same as C<rand()>, and C<mytime()> is truly without +arguments, just like C<time()>. That is, if you say mytime +2; you'll get C<mytime() + 2>, not C<mytime(2)>, which is how it would be parsed -without the prototype. +without a prototype. -The interesting thing about C<&> is that you can generate new syntax with it: +The interesting thing about C<&> is that you can generate new syntax with it, +provided it's in the initial position: sub try (&@) { my($try,$catch) = @_; @@ -924,12 +919,12 @@ The interesting thing about C<&> is that you can generate new syntax with it: }; That prints C<"unphooey">. (Yes, there are still unresolved -issues having to do with the visibility of C<@_>. I'm ignoring that +issues having to do with visibility of C<@_>. I'm ignoring that question for the moment. (But note that if we make C<@_> lexically scoped, those anonymous subroutines can act like closures... (Gee, is this sounding a little Lispish? (Never mind.)))) -And here's a reimplementation of C<grep>: +And here's a reimplementation of the Perl C<grep> operator: sub mygrep (&@) { my $code = shift; @@ -965,12 +960,12 @@ returning a list: func(@foo); func( split /:/ ); -Then you've just supplied an automatic C<scalar()> in front of their +Then you've just supplied an automatic C<scalar> in front of their argument, which can be more than a bit surprising. The old C<@foo> which used to hold one thing doesn't get passed in. Instead, -the C<func()> now gets passed in C<1>, that is, the number of elements -in C<@foo>. And the C<split()> gets called in a scalar context and -starts scribbling on your C<@_> parameter list. +C<func()> now gets passed in a C<1>; that is, the number of elements +in C<@foo>. And the C<split> gets called in scalar context so it +starts scribbling on your C<@_> parameter list. Ouch! This is all very powerful, of course, and should be used only in moderation to make the world a better place. @@ -978,12 +973,11 @@ to make the world a better place. =head2 Constant Functions Functions with a prototype of C<()> are potential candidates for -inlining. If the result after optimization and constant folding is -either a constant or a lexically-scoped scalar which has no other +inlining. If the result after optimization and constant folding +is either a constant or a lexically-scoped scalar which has no other references, then it will be used in place of function calls made -without C<&> or C<do>. Calls made using C<&> or C<do> are never -inlined. (See F<constant.pm> for an easy way to declare most -constants.) +without C<&>. Calls made using C<&> are never inlined. (See +F<constant.pm> for an easy way to declare most constants.) The following functions would all be inlined: @@ -1019,55 +1013,57 @@ a mandatory warning. (You can use this warning to tell whether or not a particular subroutine is considered constant.) The warning is considered severe enough not to be optional because previously compiled invocations of the function will still be using the old value of the -function. If you need to be able to redefine the subroutine you need to +function. If you need to be able to redefine the subroutine, you need to ensure that it isn't inlined, either by dropping the C<()> prototype -(which changes the calling semantics, so beware) or by thwarting the +(which changes calling semantics, so beware) or by thwarting the inlining mechanism in some other way, such as sub not_inlined () { 23 if $]; } -=head2 Overriding Builtin Functions +=head2 Overriding Built-in Functions -Many builtin functions may be overridden, though this should be tried +Many built-in functions may be overridden, though this should be tried only occasionally and for good reason. Typically this might be -done by a package attempting to emulate missing builtin functionality +done by a package attempting to emulate missing built-in functionality on a non-Unix system. Overriding may be done only by importing the name from a module--ordinary predeclaration isn't good enough. However, the -C<subs> pragma (compiler directive) lets you, in effect, predeclare subs -via the import syntax, and these names may then override the builtin ones: +C<use subs> pragma lets you, in effect, predeclare subs +via the import syntax, and these names may then override built-in ones: use subs 'chdir', 'chroot', 'chmod', 'chown'; chdir $somewhere; sub chdir { ... } -To unambiguously refer to the builtin form, one may precede the -builtin name with the special package qualifier C<CORE::>. For example, -saying C<CORE::open()> will always refer to the builtin C<open()>, even +To unambiguously refer to the built-in form, precede the +built-in name with the special package qualifier C<CORE::>. For example, +saying C<CORE::open()> always refers to the built-in C<open()>, even if the current package has imported some other subroutine called -C<&open()> from elsewhere. +C<&open()> from elsewhere. Even though it looks like a regular +function calls, it isn't: you can't take a reference to it, such as +the incorrect C<\&CORE::open> might appear to produce. -Library modules should not in general export builtin names like "C<open>" -or "C<chdir>" as part of their default C<@EXPORT> list, because these may +Library modules should not in general export built-in names like C<open> +or C<chdir> as part of their default C<@EXPORT> list, because these may sneak into someone else's namespace and change the semantics unexpectedly. -Instead, if the module adds the name to the C<@EXPORT_OK> list, then it's +Instead, if the module adds that name to C<@EXPORT_OK>, then it's possible for a user to import the name explicitly, but not implicitly. That is, they could say use Module 'open'; -and it would import the C<open> override, but if they said +and it would import the C<open> override. But if they said use Module; -they would get the default imports without the overrides. +they would get the default imports without overrides. -The foregoing mechanism for overriding builtins is restricted, quite +The foregoing mechanism for overriding built-in is restricted, quite deliberately, to the package that requests the import. There is a second -method that is sometimes applicable when you wish to override a builtin +method that is sometimes applicable when you wish to override a built-in everywhere, without regard to namespace boundaries. This is achieved by importing a sub into the special namespace C<CORE::GLOBAL::>. Here is an example that quite brazenly replaces the C<glob> operator with something @@ -1089,9 +1085,12 @@ that understands regular expressions. sub glob { my $pat = shift; my @got; - local(*D); - if (opendir D, '.') { @got = grep /$pat/, readdir D; closedir D; } - @got; + local *D; + if (opendir D, '.') { + @got = grep /$pat/, readdir D; + closedir D; + } + return @got; } 1; @@ -1102,44 +1101,45 @@ And here's how it could be (ab)used: use REGlob 'glob'; # override glob() in Foo:: only print for <^[a-z_]+\.pm\$>; # show all pragmatic modules -Note that the initial comment shows a contrived, even dangerous example. +The initial comment shows a contrived, even dangerous example. By overriding C<glob> globally, you would be forcing the new (and -subversive) behavior for the C<glob> operator for B<every> namespace, +subversive) behavior for the C<glob> operator for I<every> namespace, without the complete cognizance or cooperation of the modules that own those namespaces. Naturally, this should be done with extreme caution--if it must be done at all. The C<REGlob> example above does not implement all the support needed to -cleanly override perl's C<glob> operator. The builtin C<glob> has +cleanly override perl's C<glob> operator. The built-in C<glob> has different behaviors depending on whether it appears in a scalar or list -context, but our C<REGlob> doesn't. Indeed, many perl builtins have such +context, but our C<REGlob> doesn't. Indeed, many perl built-in have such context sensitive behaviors, and these must be adequately supported by a properly written override. For a fully functional example of overriding C<glob>, study the implementation of C<File::DosGlob> in the standard library. - =head2 Autoloading -If you call a subroutine that is undefined, you would ordinarily get an -immediate fatal error complaining that the subroutine doesn't exist. -(Likewise for subroutines being used as methods, when the method -doesn't exist in any base class of the class package.) If, -however, there is an C<AUTOLOAD> subroutine defined in the package or -packages that were searched for the original subroutine, then that -C<AUTOLOAD> subroutine is called with the arguments that would have been -passed to the original subroutine. The fully qualified name of the -original subroutine magically appears in the C<$AUTOLOAD> variable in the -same package as the C<AUTOLOAD> routine. The name is not passed as an -ordinary argument because, er, well, just because, that's why... - -Most C<AUTOLOAD> routines will load in a definition for the subroutine in -question using eval, and then execute that subroutine using a special -form of "goto" that erases the stack frame of the C<AUTOLOAD> routine -without a trace. (See the standard C<AutoLoader> module, for example.) -But an C<AUTOLOAD> routine can also just emulate the routine and never -define it. For example, let's pretend that a function that wasn't defined -should just call C<system()> with those arguments. All you'd do is this: +If you call a subroutine that is undefined, you would ordinarily +get an immediate, fatal error complaining that the subroutine doesn't +exist. (Likewise for subroutines being used as methods, when the +method doesn't exist in any base class of the class's package.) +However, if an C<AUTOLOAD> subroutine is defined in the package or +packages used to locate the original subroutine, then that +C<AUTOLOAD> subroutine is called with the arguments that would have +been passed to the original subroutine. The fully qualified name +of the original subroutine magically appears in the global $AUTOLOAD +variable of the same package as the C<AUTOLOAD> routine. The name +is not passed as an ordinary argument because, er, well, just +because, that's why... + +Many C<AUTOLOAD> routines load in a definition for the requested +subroutine using eval(), then execute that subroutine using a special +form of goto() that erases the stack frame of the C<AUTOLOAD> routine +without a trace. (See the source to the standard module documented +in L<AutoLoader>, for example.) But an C<AUTOLOAD> routine can +also just emulate the routine and never define it. For example, +let's pretend that a function that wasn't defined should just invoke +C<system> with those arguments. All you'd do is: sub AUTOLOAD { my $program = $AUTOLOAD; @@ -1150,8 +1150,8 @@ should just call C<system()> with those arguments. All you'd do is this: who('am', 'i'); ls('-l'); -In fact, if you predeclare the functions you want to call that way, you don't -even need the parentheses: +In fact, if you predeclare functions you want to call that way, you don't +even need parentheses: use subs qw(date who ls); date; @@ -1159,16 +1159,19 @@ even need the parentheses: ls -l; A more complete example of this is the standard Shell module, which -can treat undefined subroutine calls as calls to Unix programs. +can treat undefined subroutine calls as calls to external programs. -Mechanisms are available for modules writers to help split the modules -up into autoloadable files. See the standard AutoLoader module +Mechanisms are available to help modules writers split their modules +into autoloadable files. See the standard AutoLoader module described in L<AutoLoader> and in L<AutoSplit>, the standard SelfLoader modules in L<SelfLoader>, and the document on adding C -functions to perl code in L<perlxs>. +functions to Perl code in L<perlxs>. =head1 SEE ALSO -See L<perlref> for more about references and closures. See L<perlxs> if -you'd like to learn about calling C subroutines from perl. See L<perlmod> -to learn about bundling up your functions in separate files. +See L<perlref/"Function Templates"> for more about references and closures. +See L<perlxs> if you'd like to learn about calling C subroutines from Perl. +See L<perlembed> if you'd like to learn about calling PErl subroutines from C. +See L<perlmod> to learn about bundling up your functions in separate files. +See L<perlmodlib> to learn what library modules come standard on your system. +See L<perltoot> to learn how to make object method calls. diff --git a/pod/perlsyn.pod b/pod/perlsyn.pod index a3bc5ab547..ee668e1187 100644 --- a/pod/perlsyn.pod +++ b/pod/perlsyn.pod @@ -44,7 +44,7 @@ subroutine without defining it by saying C<sub name>, thus: sub myname; $me = myname $0 or die "can't get myname"; -Note that it functions as a list operator, not as a unary operator; so +Note that my() functions as a list operator, not as a unary operator; so be careful to use C<or> instead of C<||> in this case. However, if you were to declare the subroutine as C<sub myname ($)>, then C<myname> would function as a unary operator, so either C<or> or @@ -86,7 +86,7 @@ presuming you're a speaker of English. The C<foreach> modifier is an iterator: For each value in EXPR, it aliases C<$_> to the value and executes the statement. The C<while> and C<until> modifiers have the usual "C<while> loop" semantics (conditional evaluated first), except -when applied to a C<do>-BLOCK (or to the now-deprecated C<do>-SUBROUTINE +when applied to a C<do>-BLOCK (or to the deprecated C<do>-SUBROUTINE statement), in which case the block executes once before the conditional is evaluated. This is so that you can write loops like: @@ -289,9 +289,7 @@ is therefore visible only within the loop. Otherwise, the variable is implicitly local to the loop and regains its former value upon exiting the loop. If the variable was previously declared with C<my>, it uses that variable instead of the global one, but it's still localized to -the loop. (Note that a lexically scoped variable can cause problems -if you have subroutine or format declarations within the loop which -refer to it.) +the loop. The C<foreach> keyword is actually a synonym for the C<for> keyword, so you can use C<foreach> for readability or C<for> for brevity. (Or because @@ -490,15 +488,15 @@ C<HTTP_USER_AGENT> envariable. That kind of switch statement only works when you know the C<&&> clauses will be true. If you don't, the previous C<?:> example should be used. -You might also consider writing a hash instead of synthesizing a C<switch> -statement. +You might also consider writing a hash of subroutine references +instead of synthesizing a C<switch> statement. =head2 Goto -Although not for the faint of heart, Perl does support a C<goto> statement. -A loop's LABEL is not actually a valid target for a C<goto>; -it's just the name of the loop. There are three forms: C<goto>-LABEL, -C<goto>-EXPR, and C<goto>-&NAME. +Although not for the faint of heart, Perl does support a C<goto> +statement. There are three forms: C<goto>-LABEL, C<goto>-EXPR, and +C<goto>-&NAME. A loop's LABEL is not actually a valid target for +a C<goto>; it's just the name of the loop. The C<goto>-LABEL form finds the statement labeled with LABEL and resumes execution there. It may not be used to go into any construct that diff --git a/pod/perlthrtut.pod b/pod/perlthrtut.pod index f2ca3bda64..fc88561da7 100644 --- a/pod/perlthrtut.pod +++ b/pod/perlthrtut.pod @@ -5,7 +5,7 @@ perlthrtut - tutorial on threads in Perl =head1 DESCRIPTION One of the most prominent new features of Perl 5.005 is the inclusion -of threads. Threads make a number of things a lot easier, and are a +of threads. Threads make a number of things a lot easier, and are a very useful addition to your bag of programming tricks. =head1 What Is A Thread Anyway? @@ -14,44 +14,44 @@ A thread is a flow of control through a program with a single execution point. Sounds an awful lot like a process, doesn't it? Well, it should. -Threads are one of the pieces of a process. Every process has at least +Threads are one of the pieces of a process. Every process has at least one thread and, up until now, every process running Perl had only one -thread. With 5.005, though, you can create extra threads. We're going +thread. With 5.005, though, you can create extra threads. We're going to show you how, when, and why. =head1 Threaded Program Models There are three basic ways that you can structure a threaded -program. Which model you choose depends on what you need your program -to do. For many non-trivial threaded programs you'll need to choose +program. Which model you choose depends on what you need your program +to do. For many non-trivial threaded programs you'll need to choose different models for different pieces of your program. =head2 Boss/Worker The boss/worker model usually has one `boss' thread and one or more -`worker' threads. The boss thread gathers or generates tasks that need +`worker' threads. The boss thread gathers or generates tasks that need to be done, then parcels those tasks out to the appropriate worker thread. This model is common in GUI and server programs, where a main thread waits for some event and then passes that event to the appropriate -worker threads for processing. Once the event has been passed on, the +worker threads for processing. Once the event has been passed on, the boss thread goes back to waiting for another event. -The boss thread does relatively little work. While tasks aren't +The boss thread does relatively little work. While tasks aren't necessarily performed faster than with any other method, it tends to have the best user-response times. =head2 Work Crew In the work crew model, several threads are created that do -essentially the same thing to different pieces of data. It closely +essentially the same thing to different pieces of data. It closely mirrors classical parallel processing and vector processors, where a large array of processors do the exact same thing to many pieces of data. This model is particularly useful if the system running the program -will distribute multiple threads across different processors. It can +will distribute multiple threads across different processors. It can also be useful in ray tracing or rendering engines, where the individual threads can pass on interim results to give the user visual feedback. @@ -60,29 +60,29 @@ feedback. The pipeline model divides up a task into a series of steps, and passes the results of one step on to the thread processing the -next. Each thread does one thing to each piece of data and passes the +next. Each thread does one thing to each piece of data and passes the results to the next thread in line. This model makes the most sense if you have multiple processors so two or more threads will be executing in parallel, though it can often -make sense in other contexts as well. It tends to keep the individual +make sense in other contexts as well. It tends to keep the individual tasks small and simple, as well as allowing some parts of the pipeline to block (on I/O or system calls, for example) while other parts keep -going. If you're running different parts of the pipeline on different +going. If you're running different parts of the pipeline on different processors you may also take advantage of the caches on each processor. This model is also handy for a form of recursive programming where, rather than having a subroutine call itself, it instead creates -another thread. Prime and Fibonacci generators both map well to this +another thread. Prime and Fibonacci generators both map well to this form of the pipeline model. (A version of a prime number generator is presented later on.) =head1 Native threads -There are several different ways to implement threads on a system. How +There are several different ways to implement threads on a system. How threads are implemented depends both on the vendor and, in some cases, -the version of the operating system. Often the first implementation +the version of the operating system. Often the first implementation will be relatively simple, but later versions of the OS will be more sophisticated. @@ -93,42 +93,42 @@ There are three basic categories of threads-user-mode threads, kernel threads, and multiprocessor kernel threads. User-mode threads are threads that live entirely within a program and -its libraries. In this model, the OS knows nothing about threads. As +its libraries. In this model, the OS knows nothing about threads. As far as it's concerned, your process is just a process. This is the easiest way to implement threads, and the way most OSes -start. The big disadvantage is that, since the OS knows nothing about -threads, if one thread blocks they all do. Typical blocking activities +start. The big disadvantage is that, since the OS knows nothing about +threads, if one thread blocks they all do. Typical blocking activities include most system calls, most I/O, and things like sleep(). -Kernel threads are the next step in thread evolution. The OS knows +Kernel threads are the next step in thread evolution. The OS knows about kernel threads, and makes allowances for them. The main difference between a kernel thread and a user-mode thread is -blocking. With kernel threads, things that block a single thread don't -block other threads. This is not the case with user-mode threads, +blocking. With kernel threads, things that block a single thread don't +block other threads. This is not the case with user-mode threads, where the kernel blocks at the process level and not the thread level. This is a big step forward, and can give a threaded program quite a performance boost over non-threaded programs. Threads that block performing I/O, for example, won't block threads that are doing other -things. Each process still has only one thread running at once, +things. Each process still has only one thread running at once, though, regardless of how many CPUs a system might have. Since kernel threading can interrupt a thread at any time, they will uncover some of the implicit locking assumptions you may make in your -program. For example, something as simple as C<$a = $a + 2> can behave -unpredictably with kernel threads if C<$a> is visible to other -threads, as another thread may have changed C<$a> between the time it +program. For example, something as simple as C<$a = $a + 2> can behave +unpredictably with kernel threads if $a is visible to other +threads, as another thread may have changed $a between the time it was fetched on the right hand side and the time the new value is stored. Multiprocessor Kernel Threads are the final step in thread -support. With multiprocessor kernel threads on a machine with multiple +support. With multiprocessor kernel threads on a machine with multiple CPUs, the OS may schedule two or more threads to run simultaneously on different CPUs. This can give a serious performance boost to your threaded program, -since more than one thread will be executing at the same time. As a +since more than one thread will be executing at the same time. As a tradeoff, though, any of those nagging synchronization issues that might not have shown with basic kernel threads will appear with a vengeance. @@ -138,14 +138,14 @@ different OSes (and different thread implementations for a particular OS) allocate CPU cycles to threads in different ways. Cooperative multitasking systems have running threads give up control -if one of two things happen. If a thread calls a yield function, it -gives up control. It also gives up control if the thread does -something that would cause it to block, such as perform I/O. In a +if one of two things happen. If a thread calls a yield function, it +gives up control. It also gives up control if the thread does +something that would cause it to block, such as perform I/O. In a cooperative multitasking implementation, one thread can starve all the others for CPU time if it so chooses. Preemptive multitasking systems interrupt threads at regular intervals -while the system decides which thread should run next. In a preemptive +while the system decides which thread should run next. In a preemptive multitasking system, one thread usually won't monopolize the CPU. On some systems, there can be cooperative and preemptive threads @@ -156,18 +156,18 @@ normal priorities behave preemptively.) =head1 What kind of threads are perl threads? If you have experience with other thread implementations, you might -find that things aren't quite what you expect. It's very important to +find that things aren't quite what you expect. It's very important to remember when dealing with Perl threads that Perl Threads Are Not X Threads, for all values of X. They aren't POSIX threads, or -DecThreads, or Java's Green threads, or Win32 threads. There are +DecThreads, or Java's Green threads, or Win32 threads. There are similarities, and the broad concepts are the same, but if you start looking for implementation details you're going to be either -disappointed or confused. Possibly both. +disappointed or confused. Possibly both. This is not to say that Perl threads are completely different from -everything that's ever come before--they're not. Perl's threading -model owes a lot to other thread models, especially POSIX. Just as -Perl is not C, though, Perl threads are not POSIX threads. So if you +everything that's ever come before--they're not. Perl's threading +model owes a lot to other thread models, especially POSIX. Just as +Perl is not C, though, Perl threads are not POSIX threads. So if you find yourself looking for mutexes, or thread priorities, it's time to step back a bit and think about what you want to do and how Perl can do it. @@ -175,28 +175,28 @@ do it. =head1 Threadsafe Modules The addition of threads has changed Perl's internals -substantially. There are implications for people who write -modules--especially modules with XS code or external libraries. While +substantially. There are implications for people who write +modules--especially modules with XS code or external libraries. While most modules won't encounter any problems, modules that aren't explicitly tagged as thread-safe should be tested before being used in production code. Not all modules that you might use are thread-safe, and you should always assume a module is unsafe unless the documentation says -otherwise. This includes modules that are distributed as part of the -core. Threads are a beta feature, and even some of the standard +otherwise. This includes modules that are distributed as part of the +core. Threads are a beta feature, and even some of the standard modules aren't thread-safe. If you're using a module that's not thread-safe for some reason, you can protect yourself by using semaphores and lots of programming -discipline to control access to the module. Semaphores are covered +discipline to control access to the module. Semaphores are covered later in the article. Perl Threads Are Different =head1 Thread Basics The core Thread module provides the basic functions you need to write -threaded programs. In the following sections we'll cover the basics, -showing you what you need to do to create a threaded program. After +threaded programs. In the following sections we'll cover the basics, +showing you what you need to do to create a threaded program. After that, we'll go over some of the features of the Thread module that make threaded programming easier. @@ -208,7 +208,7 @@ your programs are compiled. If your Perl wasn't compiled with thread support enabled, then any attempt to use threads will fail. Remember that the threading support in 5.005 is in beta release, and -should be treated as such. You should expect that it may not function +should be treated as such. You should expect that it may not function entirely properly, and the thread interface may well change some before it is a fully supported, production release. The beta version shouldn't be used for mission-critical projects. Having said that, @@ -237,13 +237,13 @@ have code like this: Since code that runs both with and without threads is usually pretty messy, it's best to isolate the thread-specific code in its own -module. In our example above, that's what MyMod_threaded is, and it's +module. In our example above, that's what MyMod_threaded is, and it's only imported if we're running on a threaded Perl. =head2 Creating Threads The Thread package provides the tools you need to create new -threads. Like any other module, you need to tell Perl you want to use +threads. Like any other module, you need to tell Perl you want to use it; use Thread imports all the pieces you need to create basic threads. @@ -258,11 +258,11 @@ The simplest, straightforward way to create a thread is with new(): } The new() method takes a reference to a subroutine and creates a new -thread, which starts executing in the referenced subroutine. Control +thread, which starts executing in the referenced subroutine. Control then passes both to the subroutine and the caller. If you need to, your program can pass parameters to the subroutine as -part of the thread startup. Just include the list of parameters as +part of the thread startup. Just include the list of parameters as part of the C<Thread::new> call, like this: use Thread; @@ -281,8 +281,8 @@ part of the C<Thread::new> call, like this: The subroutine runs like a normal Perl subroutine, and the call to new Thread returns whatever the subroutine returns. -The last example illustrates another feature of threads. You can spawn -off several threads using the same subroutine. Each thread executes +The last example illustrates another feature of threads. You can spawn +off several threads using the same subroutine. Each thread executes the same subroutine, but in a separate thread with a separate environment and potentially separate arguments. @@ -305,22 +305,22 @@ spin off a chunk of code like eval(), but into its own thread: You'll notice we did a use Thread qw(async) in that example. async is not exported by default, so if you want it, you'll either need to import it before you use it or fully qualify it as -Thread::async. You'll also note that there's a semicolon after the -closing brace. That's because async() treats the following block as an +Thread::async. You'll also note that there's a semicolon after the +closing brace. That's because async() treats the following block as an anonymous subroutine, so the semicolon is necessary. Like eval(), the code executes in the same context as it would if it -weren't spun off. Since both the code inside and after the async start -executing, you need to be careful with any shared resources. Locking +weren't spun off. Since both the code inside and after the async start +executing, you need to be careful with any shared resources. Locking and other synchronization techniques are covered later. =head2 Giving up control There are times when you may find it useful to have a thread -explicitly give up the CPU to another thread. Your threading package +explicitly give up the CPU to another thread. Your threading package might not support preemptive multitasking for threads, for example, or you may be doing something compute-intensive and want to make sure -that the user-interface thread gets called frequently. Regardless, +that the user-interface thread gets called frequently. Regardless, there are times that you might want a thread to give up the processor. Perl's threading package provides the yield() function that does @@ -344,7 +344,7 @@ this. yield() is pretty straightforward, and works like this: =head2 Waiting For A Thread To Exit -Since threads are also subroutines, they can return values. To wait +Since threads are also subroutines, they can return values. To wait for a thread to exit and extract any scalars it might return, you can use the join() method. @@ -357,11 +357,11 @@ use the join() method. sub sub1 { return "Fifty-six", "foo", 2; } In the example above, the join() method returns as soon as the thread -ends. In addition to waiting for a thread to finish and gathering up +ends. In addition to waiting for a thread to finish and gathering up any values that the thread might have returned, join() also performs any OS cleanup necessary for the thread. That cleanup might be important, especially for long-running programs that spawn lots of -threads. If you don't want the return values and don't want to wait +threads. If you don't want the return values and don't want to wait for the thread to finish, you should call the detach() method instead. detach() is covered later in the article. @@ -369,7 +369,7 @@ instead. detach() is covered later in the article. So what happens when an error occurs in a thread? Any errors that could be caught with eval() are postponed until the thread is -joined. If your program never joins, the errors appear when your +joined. If your program never joins, the errors appear when your program exits. Errors deferred until a join() can be caught with eval(): @@ -390,12 +390,12 @@ to get them. =head2 Ignoring A Thread join() does three things:it waits for a thread to exit, cleans up -after it, and returns any data the thread may have produced. But what +after it, and returns any data the thread may have produced. But what if you're not interested in the thread's return values, and you don't really care when the thread finishes? All you want is for the thread to get cleaned up after when it's done. -In this case, you use the detach() method. Once a thread is detached, +In this case, you use the detach() method. Once a thread is detached, it'll run until it's finished, then Perl will clean up after it automatically. @@ -421,29 +421,29 @@ lost. =head1 Threads And Data Now that we've covered the basics of threads, it's time for our next -topic: data. Threading introduces a couple of complications to data +topic: data. Threading introduces a couple of complications to data access that non-threaded programs never need to worry about. =head2 Shared And Unshared Data The single most important thing to remember when using threads is that all threads potentially have access to all the data anywhere in your -program. While this is true with a nonthreaded Perl program as well, +program. While this is true with a nonthreaded Perl program as well, it's especially important to remember with a threaded program, since more than one thread can be accessing this data at once. Perl's scoping rules don't change because you're using threads. If a subroutine (or block, in the case of async()) could see a variable if -you weren't running with threads, it can see it if you are. This is +you weren't running with threads, it can see it if you are. This is especially important for the subroutines that create, and makes my -variables even more important. Remember--if your variables aren't +variables even more important. Remember--if your variables aren't lexically scoped (declared with C<my>) you're probably sharing it between threads. =head2 Thread Pitfall: Races While threads bring a new set of useful tools, they also bring a -number of pitfalls. One pitfall is the race condition: +number of pitfalls. One pitfall is the race condition: use Thread; $a = 1; @@ -458,14 +458,14 @@ number of pitfalls. One pitfall is the race condition: What do you think $a will be? The answer, unfortunately, is "it depends." Both sub1() and sub2() access the global variable $a, once -to read and once to write. Depending on factors ranging from your +to read and once to write. Depending on factors ranging from your thread implementation's scheduling algorithm to the phase of the moon, $a can be 2 or 3. Race conditions are caused by unsynchronized access to shared -data. Without explicit synchronization, there's no way to be sure that +data. Without explicit synchronization, there's no way to be sure that nothing has happened to the shared data between the time you access it -and the time you update it. Even this simple code fragment has the +and the time you update it. Even this simple code fragment has the possibility of error: use Thread qw(async); @@ -473,8 +473,8 @@ possibility of error: async{ $b = $a; $a = $b + 1; }; async{ $c = $a; $a = $c + 1; }; -Two threads both access $a. Each thread can potentially be interrupted -at any point, or be executed in any order. At the end, $a could be 3 +Two threads both access $a. Each thread can potentially be interrupted +at any point, or be executed in any order. At the end, $a could be 3 or 4, and both $b and $c could be 2 or 3. Whenever your program accesses data or resources that can be accessed @@ -484,9 +484,9 @@ data corruption and race conditions. =head2 Controlling access: lock() The lock() function takes a variable (or subroutine, but we'll get to -that later) and puts a lock on it. No other thread may lock the +that later) and puts a lock on it. No other thread may lock the variable until the locking thread exits the innermost block containing -the lock. Using lock() is straightforward: +the lock. Using lock() is straightforward: use Thread qw(async); $a = 4; @@ -513,29 +513,29 @@ the lock. Using lock() is straightforward: print "\$a is $a\n"; lock() blocks the thread until the variable being locked is -available. When lock() returns, your thread can be sure that no other +available. When lock() returns, your thread can be sure that no other thread can lock that variable until the innermost block containing the lock exits. It's important to note that locks don't prevent access to the variable -in question, only lock attempts. This is in keeping with Perl's +in question, only lock attempts. This is in keeping with Perl's longstanding tradition of courteous programming, and the advisory file -locking that flock() gives you. Locked subroutines behave differently, -however. We'll cover that later in the article. +locking that flock() gives you. Locked subroutines behave differently, +however. We'll cover that later in the article. -You may lock arrays and hashes as well as scalars. Locking an array, +You may lock arrays and hashes as well as scalars. Locking an array, though, will not block subsequent locks on array elements, just lock attempts on the array itself. Finally, locks are recursive, which means it's okay for a thread to -lock a variable more than once. The lock will last until the outermost +lock a variable more than once. The lock will last until the outermost lock() on the variable goes out of scope. =head2 Thread Pitfall: Deadlocks -Locks are a handy tool to synchronize access to data. Using them -properly is the key to safe shared data. Unfortunately, locks aren't -without their dangers. Consider the following code: +Locks are a handy tool to synchronize access to data. Using them +properly is the key to safe shared data. Unfortunately, locks aren't +without their dangers. Consider the following code: use Thread qw(async yield); $a = 4; @@ -553,34 +553,34 @@ without their dangers. Consider the following code: lock ($a); }; -This program will probably hang until you kill it. The only way it +This program will probably hang until you kill it. The only way it won't hang is if one of the two async() routines acquires both locks -first. A guaranteed-to-hang version is more complicated, but the +first. A guaranteed-to-hang version is more complicated, but the principle is the same. The first thread spawned by async() will grab a lock on $a then, a -second or two later, try to grab a lock on $b. Meanwhile, the second -thread grabs a lock on $b, then later tries to grab a lock on $a. The +second or two later, try to grab a lock on $b. Meanwhile, the second +thread grabs a lock on $b, then later tries to grab a lock on $a. The second lock attempt for both threads will block, each waiting for the other to release its lock. This condition is called a deadlock, and it occurs whenever two or more threads are trying to get locks on resources that the others -own. Each thread will block, waiting for the other to release a lock -on a resource. That never happens, though, since the thread with the +own. Each thread will block, waiting for the other to release a lock +on a resource. That never happens, though, since the thread with the resource is itself waiting for a lock to be released. -There are a number of ways to handle this sort of problem. The best +There are a number of ways to handle this sort of problem. The best way is to always have all threads acquire locks in the exact same -order. If, for example, you lock variables $a, $b, and $c, always lock -$a before $b, and $b before $c. It's also best to hold on to locks for +order. If, for example, you lock variables $a, $b, and $c, always lock +$a before $b, and $b before $c. It's also best to hold on to locks for as short a period of time to minimize the risks of deadlock. =head2 Queues: Passing Data Around A queue is a special thread-safe object that lets you put data in one end and take it out the other without having to worry about -synchronization issues. They're pretty straightforward, and look like +synchronization issues. They're pretty straightforward, and look like this: use Thread qw(async); @@ -599,13 +599,13 @@ this: sleep 10; $DataQueue->enqueue(undef); -You create the queue with new Thread::Queue. Then you can add lists of +You create the queue with new Thread::Queue. Then you can add lists of scalars onto the end with enqueue(), and pop scalars off the front of -it with dequeue(). A queue has no fixed size, and can grow as needed +it with dequeue(). A queue has no fixed size, and can grow as needed to hold everything pushed on to it. If a queue is empty, dequeue() blocks until another thread enqueues -something. This makes queues ideal for event loops and other +something. This makes queues ideal for event loops and other communications between threads. =head1 Threads And Code @@ -617,10 +617,10 @@ entire subroutines. =head2 Semaphores: Synchronizing Data Access -Semaphores are a kind of generic locking mechanism. Unlike lock, which +Semaphores are a kind of generic locking mechanism. Unlike lock, which gets a lock on a particular scalar, Perl doesn't associate any particular thing with a semaphore so you can use them to control -access to anything you like. In addition, semaphores can allow more +access to anything you like. In addition, semaphores can allow more than one thread to access a resource at once, though by default semaphores only allow one thread access at a time. @@ -630,7 +630,7 @@ semaphores only allow one thread access at a time. Semaphores have two methods, down and up. down decrements the resource count, while up increments it. down calls will block if the -semaphore's current count would decrement below zero. This program +semaphore's current count would decrement below zero. This program gives a quick demonstration: use Thread qw(yield); @@ -659,20 +659,20 @@ gives a quick demonstration: } } -The three invocations of the subroutine all operate in sync. The +The three invocations of the subroutine all operate in sync. The semaphore, though, makes sure that only one thread is accessing the global variable at once. =item Advanced Semaphores By default, semaphores behave like locks, letting only one thread -down() them at a time. However, there are other uses for semaphores. +down() them at a time. However, there are other uses for semaphores. Each semaphore has a counter attached to it. down() decrements the -counter and up() increments the counter. By default, semaphores are +counter and up() increments the counter. By default, semaphores are created with the counter set to one, down() decrements by one, and -up() increments by one. If down() attempts to decrement the counter -below zero, it blocks until the counter is large enough. Note that +up() increments by one. If down() attempts to decrement the counter +below zero, it blocks until the counter is large enough. Note that while a semaphore can be created with a starting count of zero, any up() or down() always changes the counter by at least one. $semaphore->down(0) is the same as $semaphore->down(1). @@ -680,21 +680,21 @@ one. $semaphore->down(0) is the same as $semaphore->down(1). The question, of course, is why would you do something like this? Why create a semaphore with a starting count that's not one, or why decrement/increment it by more than one? The answer is resource -availability. Many resources that you want to manage access for can be +availability. Many resources that you want to manage access for can be safely used by more than one thread at once. -For example, let's take a GUI driven program. It has a semaphore that +For example, let's take a GUI driven program. It has a semaphore that it uses to synchronize access to the display, so only one thread is -ever drawing at once. Handy, but of course you don't want any thread -to start drawing until things are properly set up. In this case, you +ever drawing at once. Handy, but of course you don't want any thread +to start drawing until things are properly set up. In this case, you can create a semaphore with a counter set to zero, and up it when things are ready for drawing. Semaphores with counters greater than one are also useful for -establishing quotas. Say, for example, that you have a number of -threads that can do I/O at once. You don't want all the threads +establishing quotas. Say, for example, that you have a number of +threads that can do I/O at once. You don't want all the threads reading or writing at once though, since that can potentially swamp -your I/O channels, or deplete your process' quota of filehandles. You +your I/O channels, or deplete your process' quota of filehandles. You can use a semaphore initialized to the number of concurrent I/O requests (or open files) that you want at any one time, and have your threads quietly block and unblock themselves. @@ -707,14 +707,14 @@ thread needs to check out or return a number of resources at once. =head2 Attributes: Restricting Access To Subroutines In addition to synchronizing access to data or resources, you might -find it useful to synchronize access to subroutines. You may be +find it useful to synchronize access to subroutines. You may be accessing a singular machine resource (perhaps a vector processor), or find it easier to serialize calls to a particular subroutine than to have a set of locks and sempahores. -One of the additions to Perl 5.005 is subroutine attributes. The +One of the additions to Perl 5.005 is subroutine attributes. The Thread package uses these to provide several flavors of -serialization. It's important to remember that these attributes are +serialization. It's important to remember that these attributes are used in the compilation phase of your program so you can't change a subroutine's behavior while your program is actually running. @@ -727,9 +727,9 @@ The basic subroutine lock looks like this: } This ensures that only one thread will be executing this subroutine at -any one time. Once a thread calls this subroutine, any other thread +any one time. Once a thread calls this subroutine, any other thread that calls it will block until the thread in the subroutine exits -it. A more elaborate example looks like this: +it. A more elaborate example looks like this: use Thread qw(yield); @@ -760,10 +760,10 @@ can see that only one thread is in it at any one time. =head2 Methods Locking an entire subroutine can sometimes be overkill, especially -when dealing with Perl objects. When calling a method for an object, +when dealing with Perl objects. When calling a method for an object, for example, you want to serialize calls to a method, so that only one thread will be in the subroutine for a particular object, but threads -calling that subroutine for a different object aren't blocked. The +calling that subroutine for a different object aren't blocked. The method attribute indicates whether the subroutine is really a method. use Thread; @@ -817,25 +817,25 @@ thread is ever in one_at_a_time() at once. =head2 Locking A Subroutine -You can lock a subroutine as you would lock a variable. Subroutine +You can lock a subroutine as you would lock a variable. Subroutine locks work the same as a C<use attrs qw(locked)> in the subroutine, and block all access to the subroutine for other threads until the -lock goes out of scope. When the subroutine isn't locked, any number +lock goes out of scope. When the subroutine isn't locked, any number of threads can be in it at once, and getting a lock on a subroutine -doesn't affect threads already in the subroutine. Getting a lock on a +doesn't affect threads already in the subroutine. Getting a lock on a subroutine looks like this: lock(\&sub_to_lock); -Simple enough. Unlike use attrs, which is a compile time option, +Simple enough. Unlike use attrs, which is a compile time option, locking and unlocking a subroutine can be done at runtime at your -discretion. There is some runtime penalty to using lock(\&sub) instead +discretion. There is some runtime penalty to using lock(\&sub) instead of use attrs qw(locked), so make sure you're choosing the proper method to do the locking. You'd choose lock(\&sub) when writing modules and code to run on both threaded and unthreaded Perl, especially for code that will run on -5.004 or earlier Perls. In that case, it's useful to have subroutines +5.004 or earlier Perls. In that case, it's useful to have subroutines that should be serialized lock themselves if they're running threaded, like so: @@ -855,20 +855,20 @@ version of Perl you're running. We've covered the workhorse parts of Perl's threading package, and with these tools you should be well on your way to writing threaded -code and packages. There are a few useful little pieces that didn't +code and packages. There are a few useful little pieces that didn't really fit in anyplace else. =head2 What Thread Am I In? The Thread->self method provides your program with a way to get an -object representing the thread it's currently in. You can use this +object representing the thread it's currently in. You can use this object in the same way as the ones returned from the thread creation. =head2 Thread IDs tid() is a thread object method that returns the thread ID of the -thread the object represents. Thread IDs are integers, with the main -thread in a program being 0. Currently Perl assigns a unique tid to +thread the object represents. Thread IDs are integers, with the main +thread in a program being 0. Currently Perl assigns a unique tid to every thread ever created in your program, assigning the first thread to be created a tid of 1, and increasing the tid by 1 for each new thread that's created. @@ -881,7 +881,7 @@ if the objects represent the same thread, and false if they don't. =head2 What Threads Are Running? Thread->list returns a list of thread objects, one for each thread -that's currently running. Handy for a number of things, including +that's currently running. Handy for a number of things, including cleaning up at the end of your program: # Loop through all the threads @@ -892,14 +892,14 @@ cleaning up at the end of your program: } } -The example above is just for illustration. It isn't strictly +The example above is just for illustration. It isn't strictly necessary to join all the threads you create, since Perl detaches all the threads before it exits. =head1 A Complete Example Confused yet? It's time for an example program to show some of the -things we've covered. This program finds prime numbers using threads. +things we've covered. This program finds prime numbers using threads. 1 #!/usr/bin/perl -w 2 # prime-pthread, courtesy of Tom Christiansen @@ -936,12 +936,12 @@ things we've covered. This program finds prime numbers using threads. 33 $kid->join() if $kid; 34 } -This program uses the pipeline model to generate prime numbers. Each +This program uses the pipeline model to generate prime numbers. Each thread in the pipeline has an input queue that feeds numbers to be checked, a prime number that it's responsible for, and an output queue -that it funnels numbers that have failed the check into. If the thread +that it funnels numbers that have failed the check into. If the thread has a number that's failed its check and there's no child thread, then -the thread must have found a new prime number. In that case, a new +the thread must have found a new prime number. In that case, a new child thread is created for that prime and stuck on the end of the pipeline. @@ -952,20 +952,20 @@ number is, it's a number that's only evenly divisible by itself and 1) The bulk of the work is done by the check_num() subroutine, which takes a reference to its input queue and a prime number that it's -responsible for. After pulling in the input queue and the prime that +responsible for. After pulling in the input queue and the prime that the subroutine's checking (line 20), we create a new queue (line 22) and reserve a scalar for the thread that we're likely to create later (line 21). The while loop from lines 23 to line 31 grabs a scalar off the input queue and checks against the prime this thread is responsible -for. Line 24 checks to see if there's a remainder when we modulo the -number to be checked against our prime. If there is one, the number +for. Line 24 checks to see if there's a remainder when we modulo the +number to be checked against our prime. If there is one, the number must not be evenly divisible by our prime, so we need to either pass it on to the next thread if we've created one (line 26) or create a new thread if we haven't. -The new thread creation is line 29. We pass on to it a reference to +The new thread creation is line 29. We pass on to it a reference to the queue we've created, and the prime number we've found. Finally, once the loop terminates (because we got a 0 or undef in the @@ -975,18 +975,18 @@ child and wait for it to exit if we've created a child (Lines 32 and Meanwhile, back in the main thread, we create a queue (line 9) and the initial child thread (line 10), and pre-seed it with the first prime: -2. Then we queue all the numbers from 3 to 1000 for checking (lines +2. Then we queue all the numbers from 3 to 1000 for checking (lines 12-14), then queue a die notice (line 16) and wait for the first child -thread to terminate (line 17). Because a child won't die until its +thread to terminate (line 17). Because a child won't die until its child has died, we know that we're done once we return from the join. -That's how it works. It's pretty simple; as with many Perl programs, +That's how it works. It's pretty simple; as with many Perl programs, the explanation is much longer than the program. =head1 Conclusion A complete thread tutorial could fill a book (and has, many times), -but this should get you well on your way. The final authority on how +but this should get you well on your way. The final authority on how Perl's threads behave is the documention bundled with the Perl distribution, but with what we've covered in this article, you should be well on your way to becoming a threaded Perl expert. @@ -1046,7 +1046,7 @@ France, September 1992, Yves Bekkers and Jacques Cohen, eds. Springer, Thanks (in no particular order) to Chaim Frenkel, Steve Fink, Gurusamy Sarathy, Ilya Zakharevich, Benjamin Sugars, Jürgen Christoffel, Joshua Pritikin, and Alan Burlison, for their help in reality-checking and -polishing this article. Big thanks to Tom Christiansen for his rewrite +polishing this article. Big thanks to Tom Christiansen for his rewrite of the prime number generator. =head1 AUTHOR diff --git a/pod/perltie.pod b/pod/perltie.pod index 581b4abd65..5611174669 100644 --- a/pod/perltie.pod +++ b/pod/perltie.pod @@ -834,7 +834,7 @@ destructor (DESTROY) is called, which is normal for objects that have no more valid references; and thus the file is closed. In the second example, however, we have stored another reference to -the tied object in C<$x>. That means that when untie() gets called +the tied object in $x. That means that when untie() gets called there will still be a valid reference to the object in existence, so the destructor is not called at that time, and thus the file is not closed. The reason there is no output is because the file buffers diff --git a/pod/perltoc.pod b/pod/perltoc.pod index 9dc0b36d91..b17a889959 100644 --- a/pod/perltoc.pod +++ b/pod/perltoc.pod @@ -17,12 +17,17 @@ through to locate the proper section you're looking for. =item DESCRIPTION -Many usability enhancements, Simplified grammar, Lexical scoping, -Arbitrarily nested data structures, Modularity and reusability, -Object-oriented programming, Embeddable and Extensible, POSIX compliant, -Package constructors and destructors, Multiple simultaneous DBM -implementations, Subroutine definitions may now be autoloaded, Regular -expression enhancements, Innumerable Unbundled Modules, Compilability +modularity and reusability using innumerable modules, embeddable and +extensible, roll-your-own magic variables (including multiple simultaneous +DBM implementations), subroutines can now be overridden, autoloaded, and +prototyped, arbitrarily nested data structures and anonymous functions, +object-oriented programming, compilability into C code or Perl bytecode, +support for light-weight processes (threads), support for +internationalization, localization, and Unicode, lexical scoping, regular +expression enhancements, enhanced debugger and interactive Perl +environment, with integrated editor support, POSIX 1003.1 compliant library + +=item AVAILABILITY =item ENVIRONMENT @@ -38,16 +43,227 @@ expression enhancements, Innumerable Unbundled Modules, Compilability =item NOTES -=head2 perlfaq - frequently asked questions about Perl ($Date: 1998/07/20 -23:12:17 $) +=head2 perlfaq - frequently asked questions about Perl ($Date: 1999/05/23 +20:38:02 $) =item DESCRIPTION perlfaq: Structural overview of the FAQ, L<perlfaq1>: General Questions -About Perl, L<perlfaq2>: Obtaining and Learning about Perl, L<perlfaq3>: -Programming Tools, L<perlfaq4>: Data Manipulation, L<perlfaq5>: Files and -Formats, L<perlfaq6>: Regexps, L<perlfaq7>: General Perl Language Issues, -L<perlfaq8>: System Interaction, L<perlfaq9>: Networking +About Perl, What is Perl?, Who supports Perl? Who develops it? Why is it +free?, Which version of Perl should I use?, What are perl4 and perl5?, What +is perl6?, How stable is Perl?, Is Perl difficult to learn?, How does Perl +compare with other languages like Java, Python, REXX, Scheme, or Tcl?, Can +I do [task] in Perl?, When shouldn't I program in Perl?, What's the +difference between "perl" and "Perl"?, Is it a Perl program or a Perl +script?, What is a JAPH?, Where can I get a list of Larry Wall witticisms?, +How can I convince my sysadmin/supervisor/employees to use version +(5/5.005/Perl instead of some other language)?, L<perlfaq2>: Obtaining and +Learning about Perl, What machines support Perl? Where do I get it?, How +can I get a binary version of Perl?, I don't have a C compiler on my +system. How can I compile perl?, I copied the Perl binary from one machine +to another, but scripts don't work, I grabbed the sources and tried to +compile but gdbm/dynamic loading/malloc/linking/... failed. How do I make +it work?, What modules and extensions are available for Perl? What is +CPAN? What does CPAN/src/... mean?, Is there an ISO or ANSI certified +version of Perl?, Where can I get information on Perl?, What are the Perl +newsgroups on USENET? Where do I post questions?, Where should I post +source code?, Perl Books, Perl in Magazines, Perl on the Net: FTP and WWW +Access, What mailing lists are there for perl?, Archives of +comp.lang.perl.misc, Where can I buy a commercial version of Perl?, Where +do I send bug reports?, What is perl.com?, L<perlfaq3>: Programming Tools, +How do I do (anything)?, How can I use Perl interactively?, Is there a Perl +shell?, How do I debug my Perl programs?, How do I profile my Perl +programs?, How do I cross-reference my Perl programs?, Is there a +pretty-printer (formatter) for Perl?, Is there a ctags for Perl?, Is there +an IDE or Windows Perl Editor?, Where can I get Perl macros for vi?, Where +can I get perl-mode for emacs?, How can I use curses with Perl?, How can I +use X or Tk with Perl?, How can I generate simple menus without using CGI +or Tk?, What is undump?, How can I make my Perl program run faster?, How +can I make my Perl program take less memory?, Is it unsafe to return a +pointer to local data?, How can I free an array or hash so my program +shrinks?, How can I make my CGI script more efficient?, How can I hide the +source for my Perl program?, How can I compile my Perl program into byte +code or C?, How can I compile Perl into Java?, How can I get C<#!perl> to +work on [MS-DOS,NT,...]?, Can I write useful perl programs on the command +line?, Why don't perl one-liners work on my DOS/Mac/VMS system?, Where can +I learn about CGI or Web programming in Perl?, Where can I learn about +object-oriented Perl programming?, Where can I learn about linking C with +Perl? [h2xs, xsubpp], I've read perlembed, perlguts, etc., but I can't +embed perl inmy C program, what am I doing wrong?, When I tried to run my +script, I got this message. What does itmean?, What's MakeMaker?, +L<perlfaq4>: Data Manipulation, Why am I getting long decimals (eg, +19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?, +Why isn't my octal data interpreted correctly?, Does Perl have a round() +function? What about ceil() and floor()? Trig functions?, How do I +convert bits into ints?, Why doesn't & work the way I want it to?, How do I +multiply matrices?, How do I perform an operation on a series of integers?, +How can I output Roman numerals?, Why aren't my random numbers random?, How +do I find the week-of-the-year/day-of-the-year?, How do I find the current +century or millennium?, How can I compare two dates and find the +difference?, How can I take a string and turn it into epoch seconds?, How +can I find the Julian Day?, How do I find yesterday's date?, Does Perl have +a year 2000 problem? Is Perl Y2K compliant?, How do I validate input?, How +do I unescape a string?, How do I remove consecutive pairs of characters?, +How do I expand function calls in a string?, How do I find matching/nesting +anything?, How do I reverse a string?, How do I expand tabs in a string?, +How do I reformat a paragraph?, How can I access/change the first N letters +of a string?, How do I change the Nth occurrence of something?, How can I +count the number of occurrences of a substring within a string?, How do I +capitalize all the words on one line?, How can I split a [character] +delimited string except when inside[character]? (Comma-separated files), +How do I strip blank space from the beginning/end of a string?, How do I +pad a string with blanks or pad a number with zeroes?, How do I extract +selected columns from a string?, How do I find the soundex value of a +string?, How can I expand variables in text strings?, What's wrong with +always quoting "$vars"?, Why don't my E<lt>E<lt>HERE documents work?, What +is the difference between a list and an array?, What is the difference +between $array[1] and @array[1]?, How can I remove duplicate elements from +a list or array?, How can I tell whether a list or array contains a certain +element?, How do I compute the difference of two arrays? How do I compute +the intersection of two arrays?, How do I test whether two arrays or hashes +are equal?, How do I find the first array element for which a condition is +true?, How do I handle linked lists?, How do I handle circular lists?, How +do I shuffle an array randomly?, How do I process/modify each element of an +array?, How do I select a random element from an array?, How do I permute N +elements of a list?, How do I sort an array by (anything)?, How do I +manipulate arrays of bits?, Why does defined() return true on empty arrays +and hashes?, How do I process an entire hash?, What happens if I add or +remove keys from a hash while iterating over it?, How do I look up a hash +element by value?, How can I know how many entries are in a hash?, How do I +sort a hash (optionally by value instead of key)?, How can I always keep my +hash sorted?, What's the difference between "delete" and "undef" with +hashes?, Why don't my tied hashes make the defined/exists distinction?, How +do I reset an each() operation part-way through?, How can I get the unique +keys from two hashes?, How can I store a multidimensional array in a DBM +file?, How can I make my hash remember the order I put elements into it?, +Why does passing a subroutine an undefined element in a hash create it?, +How can I make the Perl equivalent of a C structure/C++ class/hash or array +of hashes or arrays?, How can I use a reference as a hash key?, How do I +handle binary data correctly?, How do I determine whether a scalar is a +number/whole/integer/float?, How do I keep persistent data across program +calls?, How do I print out or copy a recursive data structure?, How do I +define methods for every class/object?, How do I verify a credit card +checksum?, How do I pack arrays of doubles or floats for XS code?, +L<perlfaq5>: Files and Formats, How do I flush/unbuffer an output +filehandle? Why must I do this?, How do I change one line in a file/delete +a line in a file/insert a line in the middle of a file/append to the +beginning of a file?, How do I count the number of lines in a file?, How do +I make a temporary file name?, How can I manipulate fixed-record-length +files?, How can I make a filehandle local to a subroutine? How do I pass +filehandles between subroutines? How do I make an array of filehandles?, +How can I use a filehandle indirectly?, How can I set up a footer format to +be used with write()?, How can I write() into a string?, How can I output +my numbers with commas added?, How can I translate tildes (~) in a +filename?, How come when I open a file read-write it wipes it out?, Why do +I sometimes get an "Argument list too long" when I use E<lt>*E<gt>?, Is +there a leak/bug in glob()?, How can I open a file with a leading "E<gt>" +or trailing blanks?, How can I reliably rename a file?, How can I lock a +file?, Why can't I just open(FH, ">file.lock")?, I still don't get locking. + I just want to increment the number in the file. How can I do this?, How +do I randomly update a binary file?, How do I get a file's timestamp in +perl?, How do I set a file's timestamp in perl?, How do I print to more +than one file at once?, How can I read in an entire file all at once?, How +can I read in a file by paragraphs?, How can I read a single character from +a file? From the keyboard?, How can I tell whether there's a character +waiting on a filehandle?, How do I do a C<tail -f> in perl?, How do I dup() +a filehandle in Perl?, How do I close a file descriptor by number?, Why +can't I use "C:\temp\foo" in DOS paths? What doesn't `C:\temp\foo.exe` +work?, Why doesn't glob("*.*") get all the files?, Why does Perl let me +delete read-only files? Why does C<-i> clobber protected files? Isn't +this a bug in Perl?, How do I select a random line from a file?, Why do I +get weird spaces when I print an array of lines?, L<perlfaq6>: Regexps, How +can I hope to use regular expressions without creating illegible and +unmaintainable code?, I'm having trouble matching over more than one line. +What's wrong?, How can I pull out lines between two patterns that are +themselves on different lines?, I put a regular expression into $/ but it +didn't work. What's wrong?, How do I substitute case insensitively on the +LHS, but preserving case on the RHS?, How can I make C<\w> match national +character sets?, How can I match a locale-smart version of C</[a-zA-Z]/>?, +How can I quote a variable to use in a regex?, What is C</o> really for?, +How do I use a regular expression to strip C style comments from a file?, +Can I use Perl regular expressions to match balanced text?, What does it +mean that regexes are greedy? How can I get around it?, How do I process +each word on each line?, How can I print out a word-frequency or +line-frequency summary?, How can I do approximate matching?, How do I +efficiently match many regular expressions at once?, Why don't +word-boundary searches with C<\b> work for me?, Why does using $&, $`, or +$' slow my program down?, What good is C<\G> in a regular expression?, Are +Perl regexes DFAs or NFAs? Are they POSIX compliant?, What's wrong with +using grep or map in a void context?, How can I match strings with +multibyte characters?, How do I match a pattern that is supplied by the +user?, L<perlfaq7>: General Perl Language Issues, Can I get a BNF/yacc/RE +for the Perl language?, What are all these $@%&* punctuation signs, and how +do I know when to use them?, Do I always/never have to quote my strings or +use semicolons and commas?, How do I skip some return values?, How do I +temporarily block warnings?, What's an extension?, Why do Perl operators +have different precedence than C operators?, How do I declare/create a +structure?, How do I create a module?, How do I create a class?, How can I +tell if a variable is tainted?, What's a closure?, What is variable suicide +and how can I prevent it?, How can I pass/return a {Function, FileHandle, +Array, Hash, Method, Regex}?, How do I create a static variable?, What's +the difference between dynamic and lexical (static) scoping? Between +local() and my()?, How can I access a dynamic variable while a similarly +named lexical is in scope?, What's the difference between deep and shallow +binding?, Why doesn't "my($foo) = E<lt>FILEE<gt>;" work right?, How do I +redefine a builtin function, operator, or method?, What's the difference +between calling a function as &foo and foo()?, How do I create a switch or +case statement?, How can I catch accesses to undefined +variables/functions/methods?, Why can't a method included in this same file +be found?, How can I find out my current package?, How can I comment out a +large block of perl code?, How do I clear a package?, How can I use a +variable as a variable name?, L<perlfaq8>: System Interaction, How do I +find out which operating system I'm running under?, How come exec() doesn't +return?, How do I do fancy stuff with the keyboard/screen/mouse?, How do I +print something out in color?, How do I read just one key without waiting +for a return key?, How do I check whether input is ready on the keyboard?, +How do I clear the screen?, How do I get the screen size?, How do I ask the +user for a password?, How do I read and write the serial port?, How do I +decode encrypted password files?, How do I start a process in the +background?, How do I trap control characters/signals?, How do I modify the +shadow password file on a Unix system?, How do I set the time and date?, +How can I sleep() or alarm() for under a second?, How can I measure time +under a second?, How can I do an atexit() or setjmp()/longjmp()? (Exception +handling), Why doesn't my sockets program work under System V (Solaris)? +What does the error message "Protocol not supported" mean?, How can I call +my system's unique C functions from Perl?, Where do I get the include files +to do ioctl() or syscall()?, Why do setuid perl scripts complain about +kernel problems?, How can I open a pipe both to and from a command?, Why +can't I get the output of a command with system()?, How can I capture +STDERR from an external command?, Why doesn't open() return an error when a +pipe open fails?, What's wrong with using backticks in a void context?, How +can I call backticks without shell processing?, Why can't my script read +from STDIN after I gave it EOF (^D on Unix, ^Z on MS-DOS)?, How can I +convert my shell script to perl?, Can I use perl to run a telnet or ftp +session?, How can I write expect in Perl?, Is there a way to hide perl's +command line from programs such as "ps"?, I {changed directory, modified my +environment} in a perl script. How come the change disappeared when I +exited the script? How do I get my changes to be visible?, How do I close +a process's filehandle without waiting for it to complete?, How do I fork a +daemon process?, How do I make my program run with sh and csh?, How do I +find out if I'm running interactively or not?, How do I timeout a slow +event?, How do I set CPU limits?, How do I avoid zombies on a Unix system?, +How do I use an SQL database?, How do I make a system() exit on control-C?, +How do I open a file without blocking?, How do I install a module from +CPAN?, What's the difference between require and use?, How do I keep my own +module/library directory?, How do I add the directory my program lives in +to the module/library search path?, How do I add a directory to my include +path at runtime?, What is socket.ph and where do I get it?, L<perlfaq9>: +Networking, My CGI script runs from the command line but not the browser. +(500 Server Error), How can I get better error messages from a CGI +program?, How do I remove HTML from a string?, How do I extract URLs?, How +do I download a file from the user's machine? How do I open a file on +another machine?, How do I make a pop-up menu in HTML?, How do I fetch an +HTML file?, How do I automate an HTML form submission?, How do I decode or +create those %-encodings on the web?, How do I redirect to another page?, +How do I put a password on my web pages?, How do I edit my .htpasswd and +.htgroup files with Perl?, How do I make sure users can't enter values into +a form that cause my CGI script to do bad things?, How do I parse a mail +header?, How do I decode a CGI form?, How do I check a valid mail address?, +How do I decode a MIME/BASE64 string?, How do I return the user's mail +address?, How do I send mail?, How do I read mail?, How do I find out my +hostname/domainname/IP address?, How do I fetch a news article or the +active newsgroups?, How do I fetch/put an FTP file?, How can I do RPC in +Perl? =over @@ -74,11 +290,11 @@ authors =item Changes -24/April/97, 23/April/97, 25/March/97, 18/March/97, 17/March/97 Version, -Initial Release: 11/March/97 +23/May/99, 13/April/99, 7/January/99, 22/June/98, 24/April/97, 23/April/97, +25/March/97, 18/March/97, 17/March/97 Version, Initial Release: 11/March/97 -=head2 perlfaq1 - General Questions About Perl ($Revision: 1.14 $, $Date: -1998/06/14 22:15:25 $) +=head2 perlfaq1 - General Questions About Perl ($Revision: 1.23 $, $Date: +1999/05/23 16:08:30 $) =item DESCRIPTION @@ -92,6 +308,8 @@ Initial Release: 11/March/97 =item What are perl4 and perl5? +=item What is perl6? + =item How stable is Perl? =item Is Perl difficult to learn? @@ -112,14 +330,14 @@ Scheme, or Tcl? =item Where can I get a list of Larry Wall witticisms? =item How can I convince my sysadmin/supervisor/employees to use version -(5/5.004/Perl instead of some other language)? +(5/5.005/Perl instead of some other language)? =back =item AUTHOR AND COPYRIGHT -=head2 perlfaq2 - Obtaining and Learning about Perl ($Revision: 1.24 $, -$Date: 1998/07/20 23:40:28 $) +=head2 perlfaq2 - Obtaining and Learning about Perl ($Revision: 1.31 $, +$Date: 1999/04/14 03:46:19 $) =item DESCRIPTION @@ -137,7 +355,7 @@ don't work. =item I grabbed the sources and tried to compile but gdbm/dynamic loading/malloc/linking/... failed. How do I make it work? -=item What modules and extensions are available for Perl? What is CPAN? +=item What modules and extensions are available for Perl? What is CPAN? What does CPAN/src/... mean? =item Is there an ISO or ANSI certified version of Perl? @@ -150,9 +368,10 @@ What does CPAN/src/... mean? =item Perl Books -References, Tutorials -*Learning Perl [2nd edition] -by Randal L. Schwartz and Tom Christiansen, Task-Oriented, Special Topics +References, Tutorials + *Learning Perl [2nd edition] + by Randal L. Schwartz and Tom Christiansen + with foreword by Larry Wall, Task-Oriented, Special Topics =item Perl in Magazines @@ -160,24 +379,20 @@ by Randal L. Schwartz and Tom Christiansen, Task-Oriented, Special Topics =item What mailing lists are there for perl? -MacPerl, Perl5-Porters, NTPerl, Perl-Packrats - =item Archives of comp.lang.perl.misc =item Where can I buy a commercial version of Perl? =item Where do I send bug reports? -=item What is perl.com? perl.org? The Perl Institute? - -=item How do I learn about object-oriented Perl programming? +=item What is perl.com? =back =item AUTHOR AND COPYRIGHT -=head2 perlfaq3 - Programming Tools ($Revision: 1.28 $, $Date: 1998/07/16 -22:08:49 $) +=head2 perlfaq3 - Programming Tools ($Revision: 1.38 $, $Date: 1999/05/23 +16:08:30 $) =item DESCRIPTION @@ -199,6 +414,8 @@ MacPerl, Perl5-Porters, NTPerl, Perl-Packrats =item Is there a ctags for Perl? +=item Is there an IDE or Windows Perl Editor? + =item Where can I get Perl macros for vi? =item Where can I get perl-mode for emacs? @@ -225,6 +442,8 @@ MacPerl, Perl5-Porters, NTPerl, Perl-Packrats =item How can I compile my Perl program into byte code or C? +=item How can I compile Perl into Java? + =item How can I get C<#!perl> to work on [MS-DOS,NT,...]? =item Can I write useful perl programs on the command line? @@ -249,8 +468,8 @@ mean? =item AUTHOR AND COPYRIGHT -=head2 perlfaq4 - Data Manipulation ($Revision: 1.25 $, $Date: 1998/07/16 -22:49:55 $) +=head2 perlfaq4 - Data Manipulation ($Revision: 1.49 $, $Date: 1999/05/23 +20:37:49 $) =item DESCRIPTION @@ -263,11 +482,13 @@ numbers I should be getting (eg, 19.95)? =item Why isn't my octal data interpreted correctly? -=item Does perl have a round function? What about ceil() and floor()? +=item Does Perl have a round() function? What about ceil() and floor()? Trig functions? =item How do I convert bits into ints? +=item Why doesn't & work the way I want it to? + =item How do I multiply matrices? =item How do I perform an operation on a series of integers? @@ -284,12 +505,16 @@ Trig functions? =item How do I find the week-of-the-year/day-of-the-year? +=item How do I find the current century or millennium? + =item How can I compare two dates and find the difference? =item How can I take a string and turn it into epoch seconds? =item How can I find the Julian Day? +=item How do I find yesterday's date? + =item Does Perl have a year 2000 problem? Is Perl Y2K compliant? =back @@ -328,6 +553,8 @@ string? =item How do I strip blank space from the beginning/end of a string? +=item How do I pad a string with blanks or pad a number with zeroes? + =item How do I extract selected columns from a string? =item How do I find the soundex value of a string? @@ -336,7 +563,7 @@ string? =item What's wrong with always quoting "$vars"? -=item Why don't my <<HERE documents work? +=item Why don't my E<lt>E<lt>HERE documents work? 1. There must be no space after the << part, 2. There (probably) should be a semicolon at the end, 3. You can't (easily) have any space in front of @@ -348,9 +575,11 @@ the tag =over +=item What is the difference between a list and an array? + =item What is the difference between $array[1] and @array[1]? -=item How can I extract just the unique elements of an array? +=item How can I remove duplicate elements from a list or array? a) If @in is sorted, and you want @out to be sorted:(this assumes all true values in the array), b) If you don't know whether @in is sorted:, c) Like @@ -363,6 +592,8 @@ integers: =item How do I compute the difference of two arrays? How do I compute the intersection of two arrays? +=item How do I test whether two arrays or hashes are equal? + =item How do I find the first array element for which a condition is true? =item How do I handle linked lists? @@ -440,12 +671,14 @@ array of hashes or arrays? =item How do I verify a credit card checksum? +=item How do I pack arrays of doubles or floats for XS code? + =back =item AUTHOR AND COPYRIGHT -=head2 perlfaq5 - Files and Formats ($Revision: 1.24 $, $Date: 1998/07/05 -15:07:20 $) +=head2 perlfaq5 - Files and Formats ($Revision: 1.38 $, $Date: 1999/05/23 +16:08:30 $) =item DESCRIPTION @@ -477,7 +710,8 @@ filehandles between subroutines? How do I make an array of filehandles? =item How come when I open a file read-write it wipes it out? -=item Why do I sometimes get an "Argument list too long" when I use <*>? +=item Why do I sometimes get an "Argument list too long" when I use +E<lt>*E<gt>? =item Is there a leak/bug in glob()? @@ -487,7 +721,7 @@ filehandles between subroutines? How do I make an array of filehandles? =item How can I lock a file? -=item What can't I just open(FH, ">file.lock")? +=item Why can't I just open(FH, ">file.lock")? =item I still don't get locking. I just want to increment the number in the file. How can I do this? @@ -500,11 +734,13 @@ the file. How can I do this? =item How do I print to more than one file at once? +=item How can I read in an entire file all at once? + =item How can I read in a file by paragraphs? =item How can I read a single character from a file? From the keyboard? -=item How can I tell if there's a character waiting on a filehandle? +=item How can I tell whether there's a character waiting on a filehandle? =item How do I do a C<tail -f> in perl? @@ -522,11 +758,13 @@ protected files? Isn't this a bug in Perl? =item How do I select a random line from a file? +=item Why do I get weird spaces when I print an array of lines? + =back =item AUTHOR AND COPYRIGHT -=head2 perlfaq6 - Regexps ($Revision: 1.22 $, $Date: 1998/07/16 14:01:07 $) +=head2 perlfaq6 - Regexes ($Revision: 1.27 $, $Date: 1999/05/23 16:08:30 $) =item DESCRIPTION @@ -535,8 +773,7 @@ protected files? Isn't this a bug in Perl? =item How can I hope to use regular expressions without creating illegible and unmaintainable code? -Comments Outside the Regexp, Comments Inside the Regexp, Different -Delimiters +Comments Outside the Regex, Comments Inside the Regex, Different Delimiters =item I'm having trouble matching over more than one line. What's wrong? @@ -552,7 +789,7 @@ case on the RHS? =item How can I match a locale-smart version of C</[a-zA-Z]/>? -=item How can I quote a variable to use in a regexp? +=item How can I quote a variable to use in a regex? =item What is C</o> really for? @@ -561,7 +798,7 @@ file? =item Can I use Perl regular expressions to match balanced text? -=item What does it mean that regexps are greedy? How can I get around it? +=item What does it mean that regexes are greedy? How can I get around it? =item How do I process each word on each line? @@ -577,18 +814,20 @@ file? =item What good is C<\G> in a regular expression? -=item Are Perl regexps DFAs or NFAs? Are they POSIX compliant? +=item Are Perl regexes DFAs or NFAs? Are they POSIX compliant? =item What's wrong with using grep or map in a void context? =item How can I match strings with multibyte characters? +=item How do I match a pattern that is supplied by the user? + =back =item AUTHOR AND COPYRIGHT -=head2 perlfaq7 - Perl Language Issues ($Revision: 1.21 $, $Date: -1998/06/22 15:20:07 $) +=head2 perlfaq7 - Perl Language Issues ($Revision: 1.28 $, $Date: +1999/05/23 20:36:18 $) =item DESCRIPTION @@ -596,7 +835,7 @@ file? =item Can I get a BNF/yacc/RE for the Perl language? -=item What are all these $@%* punctuation signs, and how do I know when to +=item What are all these $@%&* punctuation signs, and how do I know when to use them? =item Do I always/never have to quote my strings or use semicolons and @@ -623,14 +862,14 @@ commas? =item What is variable suicide and how can I prevent it? =item How can I pass/return a {Function, FileHandle, Array, Hash, Method, -Regexp}? +Regex}? -Passing Variables and Functions, Passing Filehandles, Passing Regexps, +Passing Variables and Functions, Passing Filehandles, Passing Regexes, Passing Methods =item How do I create a static variable? -=item What's the difference between dynamic and lexical (static) scoping? +=item What's the difference between dynamic and lexical (static) scoping? Between local() and my()? =item How can I access a dynamic variable while a similarly named lexical @@ -638,7 +877,7 @@ is in scope? =item What's the difference between deep and shallow binding? -=item Why doesn't "my($foo) = <FILE>;" work right? +=item Why doesn't "my($foo) = E<lt>FILEE<gt>;" work right? =item How do I redefine a builtin function, operator, or method? @@ -654,12 +893,16 @@ is in scope? =item How can I comment out a large block of perl code? +=item How do I clear a package? + +=item How can I use a variable as a variable name? + =back =item AUTHOR AND COPYRIGHT -=head2 perlfaq8 - System Interaction ($Revision: 1.25 $, $Date: 1998/07/05 -15:07:20 $) +=head2 perlfaq8 - System Interaction ($Revision: 1.39 $, $Date: 1999/05/23 +18:37:57 $) =item DESCRIPTION @@ -767,7 +1010,7 @@ complete? =item How do I open a file without blocking? -=item How do I install a CPAN module? +=item How do I install a module from CPAN? =item What's the difference between require and use? @@ -778,11 +1021,13 @@ search path? =item How do I add a directory to my include path at runtime? +=item What is socket.ph and where do I get it? + =back =item AUTHOR AND COPYRIGHT -=head2 perlfaq9 - Networking ($Revision: 1.20 $, $Date: 1998/06/22 18:31:09 +=head2 perlfaq9 - Networking ($Revision: 1.26 $, $Date: 1999/05/23 16:08:30 $) =item DESCRIPTION @@ -844,35 +1089,25 @@ CGI script to do bad things? =item AUTHOR AND COPYRIGHT -=head2 perldelta - what's new for perl5.005 +=head2 perldelta - what's new for perl5.006 (as of 5.005_56) =item DESCRIPTION -=item About the new versioning system - =item Incompatible Changes =over -=item WARNING: This version is not binary compatible with Perl 5.004. - -=item Default installation structure has changed +=item Perl Source Incompatibilities -=item Perl Source Compatibility +=item C Source Incompatibilities -=item C Source Compatibility +C<PERL_POLLUTE>, C<PERL_POLLUTE_MALLOC>, C<PL_na> and C<dTHR> Issues -Core sources now require ANSI C compiler, All Perl global variables must -now be referenced with an explicit prefix, Enabling threads has source -compatibility issues +=item Compatible C Source API Changes -=item Binary Compatibility +C<PATCHLEVEL> is now C<PERL_VERSION> -=item Security fixes may affect compatibility - -=item Relaxed new mandatory warnings introduced in 5.004 - -=item Licensing +=item Binary Incompatibilities =back @@ -880,105 +1115,55 @@ compatibility issues =over -=item Threads - -=item Compiler - -=item Regular Expressions - -Many new and improved optimizations, Many bug fixes, New regular expression -constructs, New operator for precompiled regular expressions, Other -improvements, Incompatible changes - -=item Improved malloc() +=item Unicode and UTF-8 support -=item Quicksort is internally implemented +=item Lexically scoped warning categories -=item Reliable signals +=item Binary numbers supported -=item Reliable stack pointers +=item syswrite() ease-of-use -=item More generous treatment of carriage returns +=item 64-bit support -=item Memory leaks +=item Better syntax checks on parenthesized unary operators -=item Better support for multiple interpreters +=item Improved C<qw//> operator -=item Behavior of local() on array and hash elements is now well-defined +=item pack() format 'Z' supported -=item C<%!> is transparently tied to the L<Errno> module +=item pack() format modifier '!' supported -=item Pseudo-hashes are supported +=item $^X variables may now have names longer than one character -=item C<EXPR foreach EXPR> is supported - -=item Keywords can be globally overridden - -=item C<$^E> is meaningful on Win32 - -=item C<foreach (1..1000000)> optimized - -=item C<Foo::> can be used as implicitly quoted package name - -=item C<exists $Foo::{Bar::}> tests existence of a package - -=item Better locale support - -=item Experimental support for 64-bit platforms - -=item prototype() returns useful results on builtins - -=item Extended support for exception handling - -=item Re-blessing in DESTROY() supported for chaining DESTROY() methods - -=item All C<printf> format conversions are handled internally - -=item New C<INIT> keyword - -=item New C<lock> keyword - -=item New C<qr//> operator - -=item C<our> is now a reserved word - -=item Tied arrays are now fully supported +=back -=item Tied handles support is better +=item Significant bug fixes -=item 4th argument to substr +=over -=item Negative LENGTH argument to splice +=item E<lt>HANDLEE<gt> on empty files -=item Magic lvalues are now more magical +=item C<eval '...'> improvements -=item E<lt>E<gt> now reads in records +=item Automatic flushing of output buffers =back =item Supported Platforms -=over - -=item New Platforms - -=item Changes in existing support - -=back +=item New tests =item Modules and Pragmata =over -=item New Modules - -B, Data::Dumper, Errno, File::Spec, ExtUtils::Installed, -ExtUtils::Packlist, Fatal, IPC::SysV, Test, Tie::Array, Tie::Handle, -Thread, attrs, fields, re +=item Modules -=item Changes in existing modules +Dumpvalue, Benchmark, Devel::Peek, Fcntl, File::Spec, +File::Spec::Functions, Math::BigInt, Math::Complex, Math::Trig, SDBM_File, +Time::Local, Win32, DBM Filters -CGI, POSIX, DB_File, MakeMaker, CPAN, Cwd, Benchmark +=item Pragmata =back @@ -986,30 +1171,16 @@ CGI, POSIX, DB_File, MakeMaker, CPAN, Cwd, Benchmark =item Documentation Changes +perlopentut.pod, perlreftut.pod, perltootc.pod + =item New Diagnostics -Ambiguous call resolved as CORE::%s(), qualify as such or use &, Bad index -while coercing array into hash, Bareword "%s" refers to nonexistent -package, Can't call method "%s" on an undefined value, Can't coerce array -into hash, Can't goto subroutine from an eval-string, Can't localize -pseudo-hash element, Can't use %%! because Errno.pm is not available, -Cannot find an opnumber for "%s", Character class syntax [. .] is reserved -for future extensions, Character class syntax [: :] is reserved for future -extensions, Character class syntax [= =] is reserved for future extensions, -%s: Eval-group in insecure regular expression, %s: Eval-group not allowed, -use re 'eval', %s: Eval-group not allowed at run time, Explicit blessing to -'' (assuming package main), Illegal hex digit ignored, No such array field, -No such field "%s" in variable %s of type %s, Out of memory during -ridiculously large request, Range iterator outside integer range, Recursive -inheritance detected while looking for method '%s' in package '%s', -Reference found where even-sized list expected, Undefined value assigned to -typeglob, Use of reserved word "%s" is deprecated, perl: warning: Setting -locale failed +/%s/: Unrecognized escape \\%c passed through, Unrecognized escape \\%c +passed through, Missing command in piped open =item Obsolete Diagnostics -Can't mktemp(), Can't write to temp file for B<-e>: %s, Cannot open -temporary file +=item Configuration Changes =item BUGS @@ -1033,10 +1204,14 @@ temporary file =item List value constructors +=item Slices + =item Typeglobs and Filehandles =back +=item SEE ALSO + =head2 perlsyn - Perl syntax =item DESCRIPTION @@ -1131,8 +1306,8 @@ unary &, unary *, (TYPE) ?PATTERN?, m/PATTERN/cgimosx, /PATTERN/cgimosx, q/STRING/, C<'STRING'>, qq/STRING/, "STRING", qr/STRING/imosx, qx/STRING/, `STRING`, qw/STRING/, -s/PATTERN/REPLACEMENT/egimosx, tr/SEARCHLIST/REPLACEMENTLIST/cds, -y/SEARCHLIST/REPLACEMENTLIST/cds +s/PATTERN/REPLACEMENT/egimosx, tr/SEARCHLIST/REPLACEMENTLIST/cdsUC, +y/SEARCHLIST/REPLACEMENTLIST/cdsUC =item Gory details of parsing quoted constructs @@ -1166,25 +1341,29 @@ i, m, s, x =item Regular Expressions -C<(?#text)>, C<(?:pattern)>, C<(?imsx-imsx:pattern)>, C<(?=pattern)>, -C<(?!pattern)>, C<(?E<lt>=pattern)>, C<(?<!pattern)>, C<(?{ code })>, -C<(?E<gt>pattern)>, C<(?(condition)yes-pattern|no-pattern)>, -C<(?(condition)yes-pattern)>, C<(?imsx-imsx)> +=item Extended Patterns + +C<(?#text)>, C<(?imsx-imsx)>, C<(?:pattern)>, C<(?imsx-imsx:pattern)>, +C<(?=pattern)>, C<(?!pattern)>, C<(?E<lt>=pattern)>, C<(?<!pattern)>, C<(?{ +code })>, C<(?p{ code })>, C<(?E<gt>pattern)>, +C<(?(condition)yes-pattern|no-pattern)>, C<(?(condition)yes-pattern)> =item Backtracking =item Version 8 Regular Expressions -=item WARNING on \1 vs $1 +=item Warning on \1 vs $1 =item Repeated patterns matching zero-length substring =item Creating custom RE engines -=item SEE ALSO - =back +=item BUGS + +=item SEE ALSO + =head2 perlrun - how to execute the Perl interpreter =item SYNOPSIS @@ -1195,11 +1374,11 @@ C<(?(condition)yes-pattern)>, C<(?imsx-imsx)> =item #! and quoting on non-Unix systems -OS/2, MS-DOS, Win95/NT, Macintosh +OS/2, MS-DOS, Win95/NT, Macintosh, VMS =item Location of Perl -=item Switches +=item Command Switches B<-0>[I<digits>], B<-a>, B<-c>, B<-d>, B<-d:>I<foo>, B<-D>I<letters>, B<-D>I<number>, B<-e> I<commandline>, B<-F>I<pattern>, B<-h>, @@ -1213,7 +1392,7 @@ B<-T>, B<-u>, B<-U>, B<-v>, B<-V>, B<-V:>I<name>, B<-w>, B<-x> I<directory> =item ENVIRONMENT HOME, LOGDIR, PATH, PERL5LIB, PERL5OPT, PERLLIB, PERL5DB, PERL5SHELL -(specific to WIN32 port), PERL_DEBUG_MSTATS, PERL_DESTRUCT_LEVEL +(specific to the Win32 port), PERL_DEBUG_MSTATS, PERL_DESTRUCT_LEVEL =head2 perlfunc - Perl builtin functions @@ -1235,6 +1414,8 @@ communication functions, Fetching user and group info, Fetching network info, Time-related functions, Functions new in perl5, Functions obsoleted in perl5 +=item Portability + =item Alphabetical Listing of Perl Functions I<-X> FILEHANDLE, I<-X> EXPR, I<-X>, abs VALUE, abs, accept @@ -1244,10 +1425,10 @@ chdir EXPR, chmod LIST, chomp VARIABLE, chomp LIST, chomp, chop VARIABLE, chop LIST, chop, chown LIST, chr NUMBER, chr, chroot FILENAME, chroot, close FILEHANDLE, close, closedir DIRHANDLE, connect SOCKET,NAME, continue BLOCK, cos EXPR, crypt PLAINTEXT,SALT, dbmclose HASH, dbmopen -HASH,DBNAME,MODE, defined EXPR, defined, delete EXPR, die LIST, do BLOCK, -do SUBROUTINE(LIST), do EXPR, dump LABEL, each HASH, eof FILEHANDLE, eof -(), eof, eval EXPR, eval BLOCK, exec LIST, exec PROGRAM LIST, exists EXPR, -exit EXPR, exp EXPR, exp, fcntl FILEHANDLE,FUNCTION,SCALAR, fileno +HASH,DBNAME,MASK, defined EXPR, defined, delete EXPR, die LIST, do BLOCK, +do SUBROUTINE(LIST), do EXPR, dump LABEL, dump, each HASH, eof FILEHANDLE, +eof (), eof, eval EXPR, eval BLOCK, exec LIST, exec PROGRAM LIST, exists +EXPR, exit EXPR, exp EXPR, exp, fcntl FILEHANDLE,FUNCTION,SCALAR, fileno FILEHANDLE, flock FILEHANDLE,OPERATION, fork, format, formline PICTURE,LIST, getc FILEHANDLE, getc, getlogin, getpeername SOCKET, getpgrp PID, getppid, getpriority WHICH,WHO, getpwnam NAME, getgrnam NAME, @@ -1263,19 +1444,19 @@ goto &NAME, grep BLOCK LIST, grep EXPR,LIST, hex EXPR, hex, import, index STR,SUBSTR,POSITION, index STR,SUBSTR, int EXPR, int, ioctl FILEHANDLE,FUNCTION,SCALAR, join EXPR,LIST, keys HASH, kill LIST, last LABEL, last, lc EXPR, lc, lcfirst EXPR, lcfirst, length EXPR, length, link -OLDFILE,NEWFILE, listen SOCKET,QUEUESIZE, local EXPR, localtime EXPR, log -EXPR, log, lstat FILEHANDLE, lstat EXPR, lstat, m//, map BLOCK LIST, map -EXPR,LIST, mkdir FILENAME,MODE, msgctl ID,CMD,ARG, msgget KEY,FLAGS, msgsnd -ID,MSG,FLAGS, msgrcv ID,VAR,SIZE,TYPE,FLAGS, my EXPR, next LABEL, next, no -Module LIST, oct EXPR, oct, open FILEHANDLE,EXPR, open FILEHANDLE, opendir -DIRHANDLE,EXPR, ord EXPR, ord, pack TEMPLATE,LIST, package, package +OLDFILE,NEWFILE, listen SOCKET,QUEUESIZE, local EXPR, localtime EXPR, lock, +log EXPR, log, lstat FILEHANDLE, lstat EXPR, lstat, m//, map BLOCK LIST, +map EXPR,LIST, mkdir FILENAME,MASK, msgctl ID,CMD,ARG, msgget KEY,FLAGS, +msgsnd ID,MSG,FLAGS, msgrcv ID,VAR,SIZE,TYPE,FLAGS, my EXPR, next LABEL, +next, no Module LIST, oct EXPR, oct, open FILEHANDLE,EXPR, open FILEHANDLE, +opendir DIRHANDLE,EXPR, ord EXPR, ord, pack TEMPLATE,LIST, package, package NAMESPACE, pipe READHANDLE,WRITEHANDLE, pop ARRAY, pop, pos SCALAR, pos, print FILEHANDLE LIST, print LIST, print, printf FILEHANDLE FORMAT, LIST, printf FORMAT, LIST, prototype FUNCTION, push ARRAY,LIST, q/STRING/, qq/STRING/, qr/STRING/, qx/STRING/, qw/STRING/, quotemeta EXPR, quotemeta, rand EXPR, rand, read FILEHANDLE,SCALAR,LENGTH,OFFSET, read FILEHANDLE,SCALAR,LENGTH, readdir DIRHANDLE, readline EXPR, readlink EXPR, -readlink, readpipe EXPR, recv SOCKET,SCALAR,LEN,FLAGS, redo LABEL, redo, +readlink, readpipe EXPR, recv SOCKET,SCALAR,LENGTH,FLAGS, redo LABEL, redo, ref EXPR, ref, rename OLDNAME,NEWNAME, require EXPR, require, reset EXPR, reset, return EXPR, return, reverse LIST, rewinddir DIRHANDLE, rindex STR,SUBSTR,POSITION, rindex STR,SUBSTR, rmdir FILENAME, rmdir, s///, scalar @@ -1298,14 +1479,14 @@ sysopen FILEHANDLE,FILENAME,MODE, sysopen FILEHANDLE,FILENAME,MODE,PERMS, sysread FILEHANDLE,SCALAR,LENGTH,OFFSET, sysread FILEHANDLE,SCALAR,LENGTH, sysseek FILEHANDLE,POSITION,WHENCE, system LIST, system PROGRAM LIST, syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET, syswrite -FILEHANDLE,SCALAR,LENGTH, tell FILEHANDLE, tell, telldir DIRHANDLE, tie -VARIABLE,CLASSNAME,LIST, tied VARIABLE, time, times, tr///, truncate -FILEHANDLE,LENGTH, truncate EXPR,LENGTH, uc EXPR, uc, ucfirst EXPR, -ucfirst, umask EXPR, umask, undef EXPR, undef, unlink LIST, unlink, unpack -TEMPLATE,EXPR, untie VARIABLE, unshift ARRAY,LIST, use Module LIST, use -Module, use Module VERSION LIST, use VERSION, utime LIST, values HASH, vec -EXPR,OFFSET,BITS, wait, waitpid PID,FLAGS, wantarray, warn LIST, write -FILEHANDLE, write EXPR, write, y/// +FILEHANDLE,SCALAR,LENGTH, syswrite FILEHANDLE,SCALAR, tell FILEHANDLE, +tell, telldir DIRHANDLE, tie VARIABLE,CLASSNAME,LIST, tied VARIABLE, time, +times, tr///, truncate FILEHANDLE,LENGTH, truncate EXPR,LENGTH, uc EXPR, +uc, ucfirst EXPR, ucfirst, umask EXPR, umask, undef EXPR, undef, unlink +LIST, unlink, unpack TEMPLATE,EXPR, untie VARIABLE, unshift ARRAY,LIST, use +Module LIST, use Module, use Module VERSION LIST, use VERSION, utime LIST, +values HASH, vec EXPR,OFFSET,BITS, wait, waitpid PID,FLAGS, wantarray, warn +LIST, write FILEHANDLE, write EXPR, write, y/// =back @@ -1318,31 +1499,36 @@ FILEHANDLE, write EXPR, write, y/// =item Predefined Names $ARG, $_, $E<lt>I<digits>E<gt>, $MATCH, $&, $PREMATCH, $`, $POSTMATCH, $', -$LAST_PAREN_MATCH, $+, $MULTILINE_MATCHING, $*, input_line_number HANDLE -EXPR, $INPUT_LINE_NUMBER, $NR, $, input_record_separator HANDLE EXPR, -$INPUT_RECORD_SEPARATOR, $RS, $/, autoflush HANDLE EXPR, $OUTPUT_AUTOFLUSH, -$|, output_field_separator HANDLE EXPR, $OUTPUT_FIELD_SEPARATOR, $OFS, $,, -output_record_separator HANDLE EXPR, $OUTPUT_RECORD_SEPARATOR, $ORS, $\, -$LIST_SEPARATOR, $", $SUBSCRIPT_SEPARATOR, $SUBSEP, $;, $OFMT, $#, -format_page_number HANDLE EXPR, $FORMAT_PAGE_NUMBER, $%, -format_lines_per_page HANDLE EXPR, $FORMAT_LINES_PER_PAGE, $=, -format_lines_left HANDLE EXPR, $FORMAT_LINES_LEFT, $-, format_name HANDLE -EXPR, $FORMAT_NAME, $~, format_top_name HANDLE EXPR, $FORMAT_TOP_NAME, $^, +$LAST_PAREN_MATCH, $+, @+, $MULTILINE_MATCHING, $*, input_line_number +HANDLE EXPR, $INPUT_LINE_NUMBER, $NR, $, input_record_separator HANDLE +EXPR, $INPUT_RECORD_SEPARATOR, $RS, $/, autoflush HANDLE EXPR, +$OUTPUT_AUTOFLUSH, $|, output_field_separator HANDLE EXPR, +$OUTPUT_FIELD_SEPARATOR, $OFS, $,, output_record_separator HANDLE EXPR, +$OUTPUT_RECORD_SEPARATOR, $ORS, $\, $LIST_SEPARATOR, $", +$SUBSCRIPT_SEPARATOR, $SUBSEP, $;, $OFMT, $#, format_page_number HANDLE +EXPR, $FORMAT_PAGE_NUMBER, $%, format_lines_per_page HANDLE EXPR, +$FORMAT_LINES_PER_PAGE, $=, format_lines_left HANDLE EXPR, +$FORMAT_LINES_LEFT, $-, @-, format_name HANDLE EXPR, $FORMAT_NAME, $~, +format_top_name HANDLE EXPR, $FORMAT_TOP_NAME, $^, format_line_break_characters HANDLE EXPR, $FORMAT_LINE_BREAK_CHARACTERS, $:, format_formfeed HANDLE EXPR, $FORMAT_FORMFEED, $^L, $ACCUMULATOR, $^A, $CHILD_ERROR, $?, $OS_ERROR, $ERRNO, $!, $EXTENDED_OS_ERROR, $^E, $EVAL_ERROR, $@, $PROCESS_ID, $PID, $$, $REAL_USER_ID, $UID, $<, $EFFECTIVE_USER_ID, $EUID, $>, $REAL_GROUP_ID, $GID, $(, $EFFECTIVE_GROUP_ID, $EGID, $), $PROGRAM_NAME, $0, $[, $PERL_VERSION, $], -$DEBUGGING, $^D, $SYSTEM_FD_MAX, $^F, $^H, $INPLACE_EDIT, $^I, $^M, -$OSNAME, $^O, $PERLDB, $^P, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, $^R, $^S, -$BASETIME, $^T, $WARNING, $^W, $EXECUTABLE_NAME, $^X, $ARGV, @ARGV, @INC, -@_, %INC, %ENV $ENV{expr}, %SIG $SIG{expr} +$COMPILING, $^C, $DEBUGGING, $^D, $SYSTEM_FD_MAX, $^F, $^H, $INPLACE_EDIT, +$^I, $^M, $OSNAME, $^O, $PERLDB, $^P, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, +$^R, $^S, $BASETIME, $^T, $WARNING, $^W, $EXECUTABLE_NAME, $^X, $ARGV, +@ARGV, @INC, @_, %INC, %ENV, $ENV{expr}, %SIG, $SIG{expr} =item Error Indicators +=item Technical Note on the Syntax of Variable Names + =back +=item BUGS + =head2 perlsub - Perl subroutines =item SYNOPSIS @@ -1351,7 +1537,7 @@ $BASETIME, $^T, $WARNING, $^W, $EXECUTABLE_NAME, $^X, $ARGV, @ARGV, @INC, =over -=item Private Variables via C<my()> +=item Private Variables via my() =item Persistent Private Variables @@ -1361,9 +1547,9 @@ $BASETIME, $^T, $WARNING, $^W, $EXECUTABLE_NAME, $^X, $ARGV, @ARGV, @INC, =item When to Still Use local() -1. You need to give a global variable a temporary value, especially C<$_>, -2. You need to create a local file or directory handle or a local function, -3. You want to temporarily change just one element of an array or hash +1. You need to give a global variable a temporary value, especially $_, 2. +You need to create a local file or directory handle or a local function, 3. +You want to temporarily change just one element of an array or hash =item Pass by Reference @@ -1371,7 +1557,7 @@ $BASETIME, $^T, $WARNING, $^W, $EXECUTABLE_NAME, $^X, $ARGV, @ARGV, @INC, =item Constant Functions -=item Overriding Builtin Functions +=item Overriding Built-in Functions =item Autoloading @@ -1409,29 +1595,42 @@ $BASETIME, $^T, $WARNING, $^W, $EXECUTABLE_NAME, $^X, $ARGV, @ARGV, @INC, =item Pragmatic Modules -use autouse MODULE => qw(sub1 sub2 sub3), blib, diagnostics, integer, less, -lib, locale, ops, overload, re, sigtrap, strict, subs, vmsish, vars +attrs, autouse, base, blib, constant, diagnostics, fields, filetest, +integer, less, lib, locale, ops, overload, re, sigtrap, strict, subs, utf8, +vars, vmsish, warning =item Standard Modules -AnyDBM_File, AutoLoader, AutoSplit, Benchmark, CPAN, CPAN::FirstTime, -CPAN::Nox, Carp, Class::Struct, Config, Cwd, DB_File, Devel::SelfStubber, -DirHandle, DynaLoader, English, Env, Exporter, ExtUtils::Embed, -ExtUtils::Install, ExtUtils::Liblist, ExtUtils::MM_OS2, ExtUtils::MM_Unix, -ExtUtils::MM_VMS, ExtUtils::MakeMaker, ExtUtils::Manifest, -ExtUtils::Mkbootstrap, ExtUtils::Mksymlists, ExtUtils::testlib, Fatal, -Fcntl, File::Basename, File::CheckTree, File::Compare, File::Copy, -File::Find, File::Path, File::stat, FileCache, FileHandle, FindBin, -GDBM_File, Getopt::Long, Getopt::Std, I18N::Collate, IO, IO::File, -IO::Handle, IO::Pipe, IO::Seekable, IO::Select, IO::Socket, IPC::Open2, -IPC::Open3, Math::BigFloat, Math::BigInt, Math::Complex, Math::Trig, -NDBM_File, Net::Ping, Net::hostent, Net::netent, Net::protoent, -Net::servent, Opcode, Pod::Text, POSIX, SDBM_File, Safe, Search::Dict, -SelectSaver, SelfLoader, Shell, Socket, Symbol, Sys::Hostname, Sys::Syslog, -Term::Cap, Term::Complete, Term::ReadLine, Test::Harness, Text::Abbrev, -Text::ParseWords, Text::Soundex, Text::Tabs, Text::Wrap, Tie::Hash, -Tie::RefHash, Tie::Scalar, Tie::SubstrHash, Time::Local, Time::gmtime, -Time::localtime, Time::tm, UNIVERSAL, User::grent, User::pwent +AnyDBM_File, AutoLoader, AutoSplit, B, B::Asmdata, B::Assembler, B::Bblock, +B::Bytecode, B::C, B::CC, B::Debug, B::Deparse, B::Disassembler, B::Lint, +B::Showlex, B::Stackobj, B::Terse, B::Xref, Benchmark, CGI, CGI::Apache, +CGI::Carp, CGI::Cookie, CGI::Fast, CGI::Push, CGI::Switch, CPAN, +CPAN::FirstTime, CPAN::Nox, Carp, Class::Struct, Config, Cwd, DB, DB_File, +Data::Dumper, Devel::Peek, Devel::SelfStubber, DirHandle, Dumpvalue, +DynaLoader, English, Env, Errno, Exporter, ExtUtils::Command, +ExtUtils::Embed, ExtUtils::Install, ExtUtils::Installed, ExtUtils::Liblist, +ExtUtils::MM_OS2, ExtUtils::MM_Unix, ExtUtils::MM_VMS, ExtUtils::MM_Win32, +ExtUtils::MakeMaker, ExtUtils::Manifest, ExtUtils::Miniperl, +ExtUtils::Mkbootstrap, ExtUtils::Mksymlists, ExtUtils::Packlist, +ExtUtils::testlib, Fatal, Fcntl, File::Basename, File::Compare, File::Copy, +File::DosGlob, File::Find, File::Path, File::Spec, File::Spec::Functions, +File::Spec::Mac, File::Spec::OS2, File::Spec::Unix, File::Spec::VMS, +File::Spec::Win32, File::stat, FileCache, FileHandle, FindBin, GDBM_File, +Getopt::Long, Getopt::Std, I18N::Collate, IO, IO::Dir, IO::File, +IO::Handle, IO::Pipe, IO::Poll, IO::Seekable, IO::Select, IO::Socket, +IO::Socket::INET, IO::Socket::UNIX, IPC::Msg, IPC::Open2, IPC::Open3, +IPC::Semaphore, IPC::SysV, Math::BigFloat, Math::BigInt, Math::Complex, +Math::Trig, NDBM_File, Net::Ping, Net::hostent, Net::netent, Net::protoent, +Net::servent, O, Opcode, POSIX, Pod::Html, Pod::Text, SDBM_File, Safe, +Search::Dict, SelectSaver, SelfLoader, Shell, Socket, Symbol, +Sys::Hostname, Sys::Syslog, Term::Cap, Term::Complete, Term::ReadLine, +Test, Test::Harness, Text::Abbrev, Text::ParseWords, Text::Soundex, +Text::Tabs -- expand and unexpand tabs per the unix expand(1) and +unexpand(1), Text::Wrap, Thread, Thread::Queue, Thread::Semaphore, +Thread::Signal, Thread::Specific, Tie::Array, Tie::Handle, Tie::Hash, +Tie::StdHash, Tie::RefHash, Tie::Scalar, Tie::StdScalar, Tie::SubstrHash, +Time::Local, Time::gmtime, Time::localtime, Time::tm, UNIVERSAL, +User::grent, User::pwent =item Extension Modules @@ -1557,7 +1756,7 @@ localization) =item Permanently fixing your locale configuration -=item Permanently fixing system locale configuration +=item Fixing system locale configuration =item The localeconv function @@ -1596,8 +1795,8 @@ isxdigit()): =item ENVIRONMENT -PERL_BADLANG, LC_ALL, LC_CTYPE, LC_COLLATE, LC_MONETARY, LC_NUMERIC, -LC_TIME, LANG +PERL_BADLANG, LC_ALL, LANGUAGE, LC_CTYPE, LC_COLLATE, LC_MONETARY, +LC_NUMERIC, LC_TIME, LANG =item NOTES @@ -1633,6 +1832,8 @@ LC_TIME, LANG =head2 perlref - Perl references and nested data structures +=item NOTE + =item DESCRIPTION =over @@ -1655,6 +1856,42 @@ LC_TIME, LANG =item SEE ALSO +=head2 perlreftut - Mark's very short tutorial about references + +=item DESCRIPTION + +=item Who Needs Complicated Data Structures? + +=item The Solution + +=item Syntax + +=over + +=item Making References + +=item Using References + +=back + +=item An Example + +=item Arrow Rule + +=item Solution + +=item The Rest + +=item Summary + +=item Credits + +=over + +=item Distribution Conditions + +=back + =head2 perldsc - Perl Data Structures Cookbook =item DESCRIPTION @@ -1674,39 +1911,39 @@ more elaborate constructs =item CODE EXAMPLES -=item LISTS OF LISTS +=item ARRAYS OF ARRAYS =over -=item Declaration of a LIST OF LISTS +=item Declaration of a ARRAY OF ARRAYS -=item Generation of a LIST OF LISTS +=item Generation of a ARRAY OF ARRAYS -=item Access and Printing of a LIST OF LISTS +=item Access and Printing of a ARRAY OF ARRAYS =back -=item HASHES OF LISTS +=item HASHES OF ARRAYS =over -=item Declaration of a HASH OF LISTS +=item Declaration of a HASH OF ARRAYS -=item Generation of a HASH OF LISTS +=item Generation of a HASH OF ARRAYS -=item Access and Printing of a HASH OF LISTS +=item Access and Printing of a HASH OF ARRAYS =back -=item LISTS OF HASHES +=item ARRAYS OF HASHES =over -=item Declaration of a LIST OF HASHES +=item Declaration of a ARRAY OF HASHES -=item Generation of a LIST OF HASHES +=item Generation of a ARRAY OF HASHES -=item Access and Printing of a LIST OF HASHES +=item Access and Printing of a ARRAY OF HASHES =back @@ -1740,11 +1977,11 @@ more elaborate constructs =item AUTHOR -=head2 perllol, perlLoL - Manipulating Lists of Lists in Perl +=head2 perllol - Manipulating Arrays of Arrays in Perl =item DESCRIPTION -=item Declaration and Access of Lists of Lists +=item Declaration and Access of Arrays of Arrays =item Growing Your Own @@ -1852,6 +2089,54 @@ more elaborate constructs =back +=head2 perltootc - Tom's OO Tutorial for Class Data in Perl + +=item DESCRIPTION + +=item Class Data as Package Variables + +=over + +=item Putting All Your Eggs in One Basket + +=item Inheritance Concerns + +=item The Eponymous Meta-Object + +=item Indirect References to Class Data + +=item Monadic Classes + +=item Translucent Attributes + +=back + +=item Class Data as Lexical Variables + +=over + +=item Privacy and Responsibility + +=item File-Scoped Lexicals + +=item More Inheritance Concerns + +=item Locking the Door and Throwing Away the Key + +=item Translucency Revisited + +=back + +=item NOTES + +=item SEE ALSO + +=item AUTHOR AND COPYRIGHT + +=item ACKNOWLEDGEMENTS + +=item HISTORY + =head2 perlobj - Perl objects =item DESCRIPTION @@ -1866,14 +2151,14 @@ more elaborate constructs =item Method Invocation +=item WARNING + =item Default UNIVERSAL methods isa(CLASS), can(METHOD), VERSION( [NEED] ) =item Destructors -=item WARNING - =item Summary =item Two-Phased Garbage Collection @@ -2021,6 +2306,29 @@ Proto, LocalPort, Listen, Reuse =item SEE ALSO +=head2 perldbmfilter - Perl DBM Filters + +=item SYNOPSIS + +=item DESCRIPTION + +B<filter_store_key>, B<filter_store_value>, B<filter_fetch_key>, +B<filter_fetch_value> + +=over + +=item The Filter + +=item An Example -- the NULL termination problem. + +=item Another Example -- Key is a C int. + +=back + +=item SEE ALSO + +=item AUTHOR + =head2 perldebug - Perl debugging =item DESCRIPTION @@ -2219,7 +2527,7 @@ LIMIT specified =item DESCRIPTION -Not all Perl programs have to be portable, The vast majority of Perl B<is> +Not all Perl programs have to be portable, The vast majority of Perl I<is> portable =item ISSUES @@ -2228,7 +2536,9 @@ portable =item Newlines -=item File Paths +=item Numbers endianness and Width + +=item Files and Filesystems =item System Interaction @@ -2240,6 +2550,10 @@ portable =item Time and Date +=item Character sets and character encoding + +=item Internationalisation + =item System Resources =item Security @@ -2248,10 +2562,10 @@ portable =back -=item CPAN TESTERS +=item CPAN Testers Mailing list: cpan-testers@perl.org, Testing results: -C<http://www.connect.net/gbarr/cpan-test/> +C<http://www.perl.org/cpan-testers/> =item PLATFORMS @@ -2263,29 +2577,38 @@ C<http://www.connect.net/gbarr/cpan-test/> The djgpp environment for DOS, C<http://www.delorie.com/djgpp/>, The EMX environment for DOS, OS/2, etc. -C<emx@iaehv.nl>,C<http://www.leo.org/pub/comp/os/os2/leo/gnu/emx+gcc/index.html>, -C<ftp://hobbes.nmsu.edu/pub/os2/dev/emx>. Build instructions -for Win32, L<perlwin32>, The ActiveState Pages, -C<http://www.activestate.com/> +C<emx@iaehv.nl>,C<http://www.leo.org/pub/comp/os/os2/leo/gnu/emx+gcc/index. +html> or +C<ftp://hobbes.nmsu.edu/pub/os2/dev/emx>, Build instructions for Win32, +L<perlwin32>, The ActiveState Pages, C<http://www.activestate.com/> -=item MacPerl +=item S<Mac OS> -The MacPerl Pages, C<http://www.ptf.com/macperl/>, The MacPerl mailing -list, C<mac-perl-request@iis.ee.ethz.ch> +The MacPerl Pages, C<http://www.macperl.com/>, The MacPerl mailing lists, +C<http://www.macperl.org/>, MacPerl Module Porters, +C<http://pudge.net/mmp/> =item VMS -L<perlvms.pod>, vmsperl list, C<vmsperl-request@newman.upenn.edu>, vmsperl -on the web, C<http://www.sidhe.org/vmsperl/index.html> +L<perlvms.pod>, vmsperl list, C<majordomo@perl.org>, vmsperl on the web, +C<http://www.sidhe.org/vmsperl/index.html> + +=item VOS + +L<README.vos>, VOS mailing list, VOS Perl on the web at +C<http://ftp.stratus.com/pub/vos/vos.html> =item EBCDIC Platforms -perl-mvs list, AS/400 Perl information at C<http://as400.rochester.ibm.com> +perl-mvs list, AS/400 Perl information at +C<http://as400.rochester.ibm.com/> + +=item Acorn RISC OS =item Other perls Atari, Guido Flohr's page C<http://stud.uni-sb.de/~gufl0000/>, HP 300 -MPE/iX C<http://www.cccd.edu/~markb/perlix.html>, Novell Netware +MPE/iX C<http://www.cccd.edu/~markb/perlix.html>, Novell Netware =back @@ -2315,12 +2638,22 @@ KEY,NSEMS,FLAGS, semop KEY,OPSTRING, setpgrp PID,PGRP, setpriority WHICH,WHO,PRIORITY, setsockopt SOCKET,LEVEL,OPTNAME,OPTVAL, shmctl ID,CMD,ARG, shmget KEY,SIZE,FLAGS, shmread ID,VAR,POS,SIZE, shmwrite ID,STRING,POS,SIZE, socketpair SOCKET1,SOCKET2,DOMAIN,TYPE,PROTOCOL, stat -FILEHANDLE, stat EXPR, stat, symlink OLDFILE,NEWFILE, syscall LIST, system -LIST, times, truncate FILEHANDLE,LENGTH, truncate EXPR,LENGTH, umask EXPR, -umask, utime LIST, wait, waitpid PID,FLAGS +FILEHANDLE, stat EXPR, stat, symlink OLDFILE,NEWFILE, syscall LIST, sysopen +FILEHANDLE,FILENAME,MODE,PERMS, system LIST, times, truncate +FILEHANDLE,LENGTH, truncate EXPR,LENGTH, umask EXPR, umask, utime LIST, +wait, waitpid PID,FLAGS =back +=item CHANGES + +v1.42, 22 May 1999Added notes about tests, sprintf/printf, and epoch +offsets. +=item v1.41, 19 May 1999, v1.40, 11 April 1999, v1.39, 11 February 1999, +v1.38, 31 December 1998, v1.37, 19 December 1998, v1.36, 9 September 1998, +v1.35, 13 August 1998, v1.33, 06 August 1998, v1.32, 05 August 1998, v1.30, +03 August 1998, v1.23, 10 July 1998 + =item AUTHORS / CONTRIBUTORS =item VERSION @@ -2675,17 +3008,16 @@ dXSARGS, dXSI32, do_binmode, ENTER, EXTEND, fbm_compile, fbm_instr, FREETMPS, G_ARRAY, G_DISCARD, G_EVAL, GIMME, GIMME_V, G_NOARGS, G_SCALAR, gv_fetchmeth, gv_fetchmethod, gv_fetchmethod_autoload, G_VOID, gv_stashpv, gv_stashsv, GvSV, HEf_SVKEY, HeHASH, HeKEY, HeKLEN, HePV, HeSVKEY, -HeSVKEY_force, HeSVKEY_set, HeVAL, hv_clear, hv_delayfree_ent, hv_delete, -hv_delete_ent, hv_exists, hv_exists_ent, hv_fetch, hv_fetch_ent, -hv_free_ent, hv_iterinit, hv_iterkey, hv_iterkeysv, hv_iternext, -hv_iternextsv, hv_iterval, hv_magic, HvNAME, hv_store, hv_store_ent, -hv_undef, isALNUM, isALPHA, isDIGIT, isLOWER, isSPACE, isUPPER, items, ix, -LEAVE, looks_like_number, MARK, mg_clear, mg_copy, mg_find, mg_free, -mg_get, mg_len, mg_magical, mg_set, Move, PL_na, New, newAV, Newc, -newCONSTSUB, newHV, newRV_inc, newRV_noinc, NEWSV, newSViv, newSVnv, -newSVpv, newSVpvf, newSVpvn, newSVrv, newSVsv, newXS, newXSproto, Newz, -Nullav, Nullch, Nullcv, Nullhv, Nullsv, ORIGMARK, perl_alloc, -perl_call_argv, perl_call_method, perl_call_pv, perl_call_sv, +HeSVKEY_force, HeSVKEY_set, HeVAL, hv_clear, hv_delete, hv_delete_ent, +hv_exists, hv_exists_ent, hv_fetch, hv_fetch_ent, hv_iterinit, hv_iterkey, +hv_iterkeysv, hv_iternext, hv_iternextsv, hv_iterval, hv_magic, HvNAME, +hv_store, hv_store_ent, hv_undef, isALNUM, isALPHA, isDIGIT, isLOWER, +isSPACE, isUPPER, items, ix, LEAVE, looks_like_number, MARK, mg_clear, +mg_copy, mg_find, mg_free, mg_get, mg_len, mg_magical, mg_set, modglobal, +Move, PL_na, New, newAV, Newc, newCONSTSUB, newHV, newRV_inc, newRV_noinc, +NEWSV, newSViv, newSVnv, newSVpv, newSVpvf, newSVpvn, newSVrv, newSVsv, +newXS, newXSproto, Newz, Nullav, Nullch, Nullcv, Nullhv, Nullsv, ORIGMARK, +perl_alloc, perl_call_argv, perl_call_method, perl_call_pv, perl_call_sv, perl_construct, perl_destruct, perl_eval_sv, perl_eval_pv, perl_free, perl_get_av, perl_get_cv, perl_get_hv, perl_get_sv, perl_parse, perl_require_pv, perl_run, POPi, POPl, POPp, POPn, POPs, PUSHMARK, PUSHi, @@ -2694,15 +3026,15 @@ safemalloc, saferealloc, savepv, savepvn, SAVETMPS, SP, SPAGAIN, ST, strEQ, strGE, strGT, strLE, strLT, strNE, strnEQ, strnNE, sv_2mortal, sv_bless, sv_catpv, sv_catpv_mg, sv_catpvn, sv_catpvn_mg, sv_catpvf, sv_catpvf_mg, sv_catsv, sv_catsv_mg, sv_chop, sv_cmp, SvCUR, SvCUR_set, sv_dec, -sv_derived_from, sv_derived_from, SvEND, sv_eq, SvGETMAGIC, SvGROW, -sv_grow, sv_inc, sv_insert, SvIOK, SvIOK_off, SvIOK_on, SvIOK_only, SvIOKp, -sv_isa, sv_isobject, SvIV, SvIVX, SvLEN, sv_len, sv_magic, sv_mortalcopy, +sv_derived_from, SvEND, sv_eq, SvGETMAGIC, SvGROW, sv_grow, sv_inc, +sv_insert, SvIOK, SvIOK_off, SvIOK_on, SvIOK_only, SvIOKp, sv_isa, +sv_isobject, SvIV, SvIVX, SvLEN, sv_len, sv_magic, sv_mortalcopy, sv_newmortal, SvNIOK, SvNIOK_off, SvNIOKp, PL_sv_no, SvNOK, SvNOK_off, SvNOK_on, SvNOK_only, SvNOKp, SvNV, SvNVX, SvOK, SvOOK, SvPOK, SvPOK_off, -SvPOK_on, SvPOK_only, SvPOKp, SvPV, SvPV_force, SvPVX, SvREFCNT, -SvREFCNT_dec, SvREFCNT_inc, SvROK, SvROK_off, SvROK_on, SvRV, SvSETMAGIC, -sv_setiv, sv_setiv_mg, sv_setnv, sv_setnv_mg, sv_setpv, sv_setpv_mg, -sv_setpviv, sv_setpviv_mg, sv_setpvn, sv_setpvn_mg, sv_setpvf, +SvPOK_on, SvPOK_only, SvPOKp, SvPV, SvPV_force, SvPV_nolen, SvPVX, +SvREFCNT, SvREFCNT_dec, SvREFCNT_inc, SvROK, SvROK_off, SvROK_on, SvRV, +SvSETMAGIC, sv_setiv, sv_setiv_mg, sv_setnv, sv_setnv_mg, sv_setpv, +sv_setpv_mg, sv_setpviv, sv_setpviv_mg, sv_setpvn, sv_setpvn_mg, sv_setpvf, sv_setpvf_mg, sv_setref_iv, sv_setref_nv, sv_setref_pv, sv_setref_pvn, SvSetSV, SvSetSV_nosteal, sv_setsv, sv_setsv_mg, sv_setuv, sv_setuv_mg, SvSTASH, SvTAINT, SvTAINTED, SvTAINTED_off, SvTAINTED_on, SVt_IV, SVt_PV, @@ -2726,7 +3058,7 @@ An Error Handler, An Event Driven Program =item THE PERL_CALL FUNCTIONS -B<perl_call_sv>, B<perl_call_pv>, B<perl_call_method>, B<perl_call_argv> +perl_call_sv, perl_call_pv, perl_call_method, perl_call_argv =item FLAG VALUES @@ -2842,6 +3174,14 @@ method, locked =item DESCRIPTION +=head2 attrs - set/get attributes of a subroutine + +=item SYNOPSIS + +=item DESCRIPTION + +method, locked + =head2 autouse - postpone load of modules until a function is used =item SYNOPSIS @@ -2860,6 +3200,8 @@ method, locked =item DESCRIPTION +=item HISTORY + =item SEE ALSO =head2 blib - Use MakeMaker's uninstalled version of a package @@ -2919,6 +3261,25 @@ diagnostics =item SEE ALSO +=head2 filetest - Perl pragma to control the filetest permission operators + +=item SYNOPSIS + + $can_perhaps_read = -r "file"; # use the mode bits + { + use filetest 'access'; # intuit harder + $can_really_read = -r "file"; + } + $can_perhaps_read = -r "file"; # use the mode bits again + +=item DESCRIPTION + +=over + +=item subpragma access + +=back + =head2 integer - Perl pragma to compute arithmetic in integer instead of double @@ -2959,12 +3320,18 @@ operations =item DESCRIPTION +=head2 ops - Perl pragma to restrict unsafe operations when compiling + +=item SYNOPSIS + +=item DESCRIPTION + +=item SEE ALSO + =head2 overload - Package for overloading perl operations =item SYNOPSIS -=item CAVEAT SCRIPTOR - =item DESCRIPTION =over @@ -2977,11 +3344,15 @@ FALSE, TRUE, C<undef> =item Calling Conventions for Unary Operations +=item Calling Conventions for Mutators + +C<++> and C<-->, C<x=> and other assignment versions + =item Overloadable Operations I<Arithmetic operations>, I<Comparison operations>, I<Bit operations>, I<Increment and decrement>, I<Transcendental functions>, I<Boolean, string -and numeric conversion>, I<Special> +and numeric conversion>, I<Iteration>, I<Dereferencing>, I<Special> =item Inheritance and overloading @@ -3010,9 +3381,10 @@ B<Example> I<Assignment forms of arithmetic operations>, I<Conversion operations>, I<Increment and decrement>, C<abs($a)>, I<Unary minus>, I<Negation>, -I<Concatenation>, I<Comparison operations>, I<Copy operator> +I<Concatenation>, I<Comparison operations>, I<Iterator>, I<Dereferencing>, +I<Copy operator> -=item WARNING +=item Losing overloading =item Run-time Overloading @@ -3026,12 +3398,34 @@ integer, float, binary, q, qr =item IMPLEMENTATION +=item Metaphor clash + +=item Cookbook + +=over + +=item Two-face scalars + +=item Two-face references + +=item Symbolic calculator + +=item I<Really> symbolic calculator + +=back + =item AUTHOR =item DIAGNOSTICS =item BUGS +=head2 re - Perl pragma to alter regular expression behaviour + +=item SYNOPSIS + +=item DESCRIPTION + =head2 sigtrap - Perl pragma to enable simple signal handling =item SYNOPSIS @@ -3072,12 +3466,28 @@ C<strict refs>, C<strict vars>, C<strict subs> =item DESCRIPTION +=head2 utf8 - Perl pragma to turn on UTF-8 and Unicode support + +=item SYNOPSIS + +=item DESCRIPTION + +=item CAVEATS + =head2 vars - Perl pragma to predeclare global variable names =item SYNOPSIS =item DESCRIPTION +=head2 warning - Perl pragma to control optional warnings + +=item SYNOPSIS + +=item DESCRIPTION + +C<warning deprecated> + =head1 MODULE DOCUMENTATION =head2 AnyDBM_File - provide framework for multiple DBMs @@ -3200,7 +3610,8 @@ FILL, MAX, OFF, ARRAY, AvFLAGS =item B::CV METHODS -STASH, START, ROOT, GV, FILEGV, DEPTH, PADLIST, OUTSIDE, XSUB, XSUBANY +STASH, START, ROOT, GV, FILEGV, DEPTH, PADLIST, OUTSIDE, XSUB, XSUBANY, +CvFLAGS =item B::HV METHODS @@ -3260,10 +3671,11 @@ label, stash, filegv, cop_seq, arybase, line =item FUNCTIONS EXPORTED BY C<B> -main_cv, main_root, main_start, comppadlist, sv_undef, sv_yes, sv_no, -walkoptree(OP, METHOD), walkoptree_debug(DEBUG), walksymtable(SYMREF, -METHOD, RECURSE), svref_2object(SV), ppname(OPNUM), hash(STR), cast_I32(I), -minus_c, cstring(STR), class(OBJ), threadsv_names, byteload_fh(FILEHANDLE) +main_cv, init_av, main_root, main_start, comppadlist, sv_undef, sv_yes, +sv_no, amagic_generation, walkoptree(OP, METHOD), walkoptree_debug(DEBUG), +walksymtable(SYMREF, METHOD, RECURSE), svref_2object(SV), ppname(OPNUM), +hash(STR), cast_I32(I), minus_c, cstring(STR), class(OBJ), threadsv_names, +byteload_fh(FILEHANDLE) =item AUTHOR @@ -3304,6 +3716,8 @@ B<-ofilename>, B<-->, B<-f>, B<-fcompress-nullops>, B<-fomit-sequence-numbers>, B<-fbypass-nullops>, B<-fstrip-syntax-tree>, B<-On>, B<-D>, B<-Do>, B<-Db>, B<-Da>, B<-DC>, B<-S>, B<-m> +=item EXAMPLES + =item BUGS =item AUTHOR @@ -3373,7 +3787,7 @@ B<-ffreetmps-each-bblock>, B<-ffreetmps-each-loop>, B<-fomit-taint>, B<-On> =item OPTIONS -B<-p>, B<-u>I<PACKAGE>, B<-l>, B<-s>I<LETTERS>, B<C> +B<-l>, B<-p>, B<-q>, B<-u>I<PACKAGE>, B<-s>I<LETTERS>, B<C> =item BUGS @@ -3456,6 +3870,14 @@ C<-oFILENAME>, C<-r>, C<-D[tO]> =item AUTHOR +=head2 Bblock, B::Bblock - Walk basic blocks + +=item SYNOPSIS + +=item DESCRIPTION + +=item AUTHOR + =head2 Benchmark - benchmark running times of code =item SYNOPSIS @@ -3471,8 +3893,8 @@ new, debug =item Standard Exports timeit(COUNT, CODE), timethis ( COUNT, CODE, [ TITLE, [ STYLE ]] ), -timethese ( COUNT, CODEHASHREF, [ STYLE ] ), timediff ( T1, T2 ), timestr ( -TIMEDIFF, [ STYLE, [ FORMAT ] ] ) +timethese ( COUNT, CODEHASHREF, [ STYLE ] ), timediff ( T1, T2 ), timesum ( +T1, T2 ), timestr ( TIMEDIFF, [ STYLE, [ FORMAT ] ] ) =item Optional Exports @@ -3490,6 +3912,34 @@ clearcache ( COUNT ), clearallcache ( ), disablecache ( ), enablecache ( ) =item MODIFICATION HISTORY +=head2 ByteLoader - load byte compiled perl code + +=item SYNOPSIS + +=item DESCRIPTION + +=item AUTHOR + +=item SEE ALSO + +=head2 Bytecode, B::Bytecode - Perl compiler's bytecode backend + +=item SYNOPSIS + +=item DESCRIPTION + +=item OPTIONS + +B<-ofilename>, B<-->, B<-f>, B<-fcompress-nullops>, +B<-fomit-sequence-numbers>, B<-fbypass-nullops>, B<-fstrip-syntax-tree>, +B<-On>, B<-D>, B<-Do>, B<-Db>, B<-Da>, B<-DC>, B<-S>, B<-m> + +=item EXAMPLES + +=item BUGS + +=item AUTHOR + =head2 CGI - Simple Common Gateway Interface Class =item SYNOPSIS @@ -3539,7 +3989,14 @@ B<:standard>, B<:all> =item PRAGMAS --any, -compile, -nph, -autoload, -no_debug, -private_tempfiles +-any, -compile, -nph, -newstyle_urls, -autoload, -no_debug, +-private_tempfiles + +=item SPECIAL FORMS FOR IMPORTING HTML-TAG FUNCTIONS + +1. start_table() (generates a <TABLE> tag), 2. end_table() (generates a +</TABLE> tag), 3. start_ul() (generates a <UL> tag), 4. end_ul() (generates +a </UL> tag) =back @@ -3564,6 +4021,8 @@ B<Parameters:>, 4, 5, 6.. B<-absolute>, B<-relative>, B<-full>, B<-path> (B<-path_info>), B<-query> (B<-query_string>) +=item MIXING POST AND URL PARAMETERS + =back =item CREATING STANDARD HTML ELEMENTS: @@ -3578,6 +4037,8 @@ B<-absolute>, B<-relative>, B<-full>, B<-path> (B<-path_info>), B<-query> =item NON-STANDARD HTML SHORTCUTS +=item PRETTY-PRINTING HTML + =back =item CREATING FILL-OUT FORMS: @@ -3642,12 +4103,12 @@ TOP, BOTTOM or MIDDLE =back -=item NETSCAPE COOKIES +=item HTTP COOKIES 1. an expiration time, 2. a domain, 3. a path, 4. a "secure" flag, B<-name>, B<-value>, B<-path>, B<-domain>, B<-expires>, B<-secure> -=item WORKING WITH NETSCAPE FRAMES +=item WORKING WITH FRAMES 1. Create a <Frameset> document, 2. Specify the destination for the document in the HTTP header, 3. Specify the destination for the document in @@ -3665,7 +4126,7 @@ the <FORM> tag =item FETCHING ENVIRONMENT VARIABLES -B<accept()>, B<raw_cookie()>, B<user_agent()>, B<path_info()>, +B<Accept()>, B<raw_cookie()>, B<user_agent()>, B<path_info()>, B<path_translated()>, B<remote_host()>, B<script_name()>Return the script name as a partial URL, for self-refering scripts, B<referer()>, B<auth_type ()>, B<server_name ()>, B<virtual_host @@ -3680,7 +4141,7 @@ parameters in the B<header()> and B<redirect()> statements: =item Server Push multipart_init() -multipart_init(-boundary=>$boundary);, multipart_start(), multipart_end() + multipart_init(-boundary=>$boundary);, multipart_start(), multipart_end() =item Avoiding Denial of Service Attacks @@ -3696,15 +4157,15 @@ basis>, B<2. Globally for all scripts> Matt Heffron (heffron@falstaff.css.beckman.com), James Taylor (james.taylor@srs.gov), Scott Anguish <sanguish@digifix.com>, Mike Jewell (mlj3u@virginia.edu), Timothy Shimmin (tes@kbs.citri.edu.au), Joergen Haegg -(jh@axis.se), Laurent Delfosse (delfosse@csgrad1.cs.wvu.edu), Richard -Resnick (applepi1@aol.com), Craig Bishop (csb@barwonwater.vic.gov.au), Tony -Curtis (tc@vcpc.univie.ac.at), Tim Bunce (Tim.Bunce@ig.co.uk), Tom -Christiansen (tchrist@convex.com), Andreas Koenig -(k@franz.ww.TU-Berlin.DE), Tim MacKenzie (Tim.MacKenzie@fulcrum.com.au), -Kevin B. Hendricks (kbhend@dogwood.tyler.wm.edu), Stephen Dahmen -(joyfire@inxpress.net), Ed Jordan (ed@fidalgo.net), David Alan Pisoni -(david@cnation.com), Doug MacEachern (dougm@opengroup.org), Robin Houston -(robin@oneworld.org), ...and many many more.. +(jh@axis.se), Laurent Delfosse (delfosse@delfosse.com), Richard Resnick +(applepi1@aol.com), Craig Bishop (csb@barwonwater.vic.gov.au), Tony Curtis +(tc@vcpc.univie.ac.at), Tim Bunce (Tim.Bunce@ig.co.uk), Tom Christiansen +(tchrist@convex.com), Andreas Koenig (k@franz.ww.TU-Berlin.DE), Tim +MacKenzie (Tim.MacKenzie@fulcrum.com.au), Kevin B. Hendricks +(kbhend@dogwood.tyler.wm.edu), Stephen Dahmen (joyfire@inxpress.net), Ed +Jordan (ed@fidalgo.net), David Alan Pisoni (david@cnation.com), Doug +MacEachern (dougm@opengroup.org), Robin Houston (robin@oneworld.org), +...and many many more.. =item A COMPLETE EXAMPLE OF A SIMPLE FORM-BASED SCRIPT @@ -3821,8 +4282,6 @@ B<name()>, B<value()>, B<domain()>, B<path()>, B<expires()> =item INSTALLING CGI::Push SCRIPTS -=item CAVEATS - =item AUTHOR INFORMATION =item BUGS @@ -3866,7 +4325,7 @@ distribution, Signals expand($type,@things), Programming Examples -=item Methods in the four +=item Methods in the four Classes =item Cache Manager @@ -3878,7 +4337,7 @@ expand($type,@things), Programming Examples =item Debugging -=item Floppy, Zip, and all that Jazz +=item Floppy, Zip, Offline Mode =back @@ -3891,7 +4350,9 @@ E<lt>listE<gt> =over -=item CD-ROM support +=item Note on urllist parameter's format + +=item urllist parameter has CD-ROM support =back @@ -3899,6 +4360,12 @@ E<lt>listE<gt> =item EXPORT +=item POPULATE AN INSTALLATION WITH LOTS OF MODULES + +=item WORKING WITH CPAN.pm BEHIND FIREWALLS + +http firewall, ftp firewall, One way visibility, SOCKS, IP Masquerade + =item BUGS =item AUTHOR @@ -3932,6 +4399,8 @@ module =back +=item BUGS + =head2 Class::Struct - declare struct-like datatypes as Perl classes =item SYNOPSIS @@ -3955,12 +4424,273 @@ Example 1, Example 2 =item Author and Modification History +=head2 Config - access Perl configuration information + +=item SYNOPSIS + +=item DESCRIPTION + +myconfig(), config_sh(), config_vars(@names) + +=item EXAMPLE + +=item WARNING + +=item GLOSSARY + +=over + +=item _ + +C<_a>, C<_exe>, C<_o> + +=item a + +C<afs>, C<alignbytes>, C<ansi2knr>, C<aphostname>, C<apiversion>, C<ar>, +C<archlib>, C<archlibexp>, C<archname64>, C<archname>, C<archobjs>, C<awk> + +=item b + +C<baserev>, C<bash>, C<bin>, C<binexp>, C<bison>, C<byacc>, C<byteorder> + +=item c + +C<c>, C<castflags>, C<cat>, C<cc>, C<cccdlflags>, C<ccdlflags>, C<ccflags>, +C<ccsymbols>, C<cf_by>, C<cf_email>, C<cf_time>, C<chgrp>, C<chmod>, +C<chown>, C<clocktype>, C<comm>, C<compress>, C<contains>, C<cp>, C<cpio>, +C<cpp>, C<cpp_stuff>, C<cppccsymbols>, C<cppflags>, C<cpplast>, +C<cppminus>, C<cpprun>, C<cppstdin>, C<cppsymbols>, C<crosscompile>, +C<cryptlib>, C<csh> + +=item d + +C<d_access>, C<d_accessx>, C<d_alarm>, C<d_archlib>, C<d_attribut>, +C<d_bcmp>, C<d_bcopy>, C<d_bsd>, C<d_bsdgetpgrp>, C<d_bsdsetpgrp>, +C<d_bzero>, C<d_casti32>, C<d_castneg>, C<d_charvspr>, C<d_chown>, +C<d_chroot>, C<d_chsize>, C<d_closedir>, C<d_cmsghdr_s>, C<d_const>, +C<d_crypt>, C<d_csh>, C<d_cuserid>, C<d_dbl_dig>, C<d_dbmclose64>, +C<d_dbminit64>, C<d_delete64>, C<d_difftime>, C<d_dirent64_s>, +C<d_dirnamlen>, C<d_dlerror>, C<d_dlopen>, C<d_dlsymun>, C<d_dosuid>, +C<d_drand48proto>, C<d_dup2>, C<d_eaccess>, C<d_endgrent>, C<d_endhent>, +C<d_endnent>, C<d_endpent>, C<d_endpwent>, C<d_endsent>, C<d_eofnblk>, +C<d_eunice>, C<d_fchmod>, C<d_fchown>, C<d_fcntl>, C<d_fd_macros>, +C<d_fd_set>, C<d_fds_bits>, C<d_fetch64>, C<d_fgetpos64>, C<d_fgetpos>, +C<d_firstkey64>, C<d_flexfnam>, C<d_flock64_s>, C<d_flock>, C<d_fopen64>, +C<d_fork>, C<d_fpathconf>, C<d_freopen64>, C<d_fseek64>, C<d_fseeko64>, +C<d_fseeko>, C<d_fsetpos64>, C<d_fsetpos>, C<d_fstat64>, C<d_fstatfs>, +C<d_fstatvfs>, C<d_ftell64>, C<d_ftello64>, C<d_ftello>, C<d_ftime>, +C<d_ftruncate64>, C<d_Gconvert>, C<d_getgrent>, C<d_getgrps>, +C<d_gethbyaddr>, C<d_gethbyname>, C<d_gethent>, C<d_gethname>, +C<d_gethostprotos>, C<d_getlogin>, C<d_getmntent>, C<d_getnbyaddr>, +C<d_getnbyname>, C<d_getnent>, C<d_getnetprotos>, C<d_getpbyname>, +C<d_getpbynumber>, C<d_getpent>, C<d_getpgid>, C<d_getpgrp2>, C<d_getpgrp>, +C<d_getppid>, C<d_getprior>, C<d_getprotoprotos>, C<d_getpwent>, +C<d_getsbyname>, C<d_getsbyport>, C<d_getsent>, C<d_getservprotos>, +C<d_gettimeod>, C<d_gnulibc>, C<d_grpasswd>, C<d_hasmntopt>, C<d_htonl>, +C<d_index>, C<d_inetaton>, C<d_ino64_t>, C<d_int64t>, C<d_iovec_s>, +C<d_isascii>, C<d_killpg>, C<d_lchown>, C<d_link>, C<d_llseek>, +C<d_locconv>, C<d_lockf64>, C<d_lockf>, C<d_longdbl>, C<d_longlong>, +C<d_lseek64>, C<d_lstat64>, C<d_lstat>, C<d_madvise>, C<d_mblen>, +C<d_mbstowcs>, C<d_mbtowc>, C<d_memchr>, C<d_memcmp>, C<d_memcpy>, +C<d_memmove>, C<d_memset>, C<d_mkdir>, C<d_mkfifo>, C<d_mktime>, C<d_mmap>, +C<d_mprotect>, C<d_msg>, C<d_msg_ctrunc>, C<d_msg_dontroute>, C<d_msg_oob>, +C<d_msg_peek>, C<d_msg_proxy>, C<d_msgctl>, C<d_msgget>, C<d_msghdr_s>, +C<d_msgrcv>, C<d_msgsnd>, C<d_msync>, C<d_munmap>, C<d_mymalloc>, +C<d_nextkey64>, C<d_nice>, C<d_off64_t>, C<d_offset_t>, +C<d_old_pthread_create_joinable>, C<d_oldpthreads>, C<d_oldsock>, +C<d_open3>, C<d_open64>, C<d_opendir64>, C<d_pathconf>, C<d_pause>, +C<d_phostname>, C<d_pipe>, C<d_poll>, C<d_portable>, C<d_pthread_yield>, +C<d_pwage>, C<d_pwchange>, C<d_pwclass>, C<d_pwcomment>, C<d_pwexpire>, +C<d_pwgecos>, C<d_pwpasswd>, C<d_pwquota>, C<d_readdir64>, C<d_readdir>, +C<d_readlink>, C<d_readv>, C<d_recvmsg>, C<d_rename>, C<d_rewinddir>, +C<d_rmdir>, C<d_safebcpy>, C<d_safemcpy>, C<d_sanemcmp>, C<d_sched_yield>, +C<d_scm_rights>, C<d_seekdir64>, C<d_seekdir>, C<d_select>, C<d_sem>, +C<d_semctl>, C<d_semctl_semid_ds>, C<d_semctl_semun>, C<d_semget>, +C<d_semop>, C<d_sendmsg>, C<d_setegid>, C<d_seteuid>, C<d_setgrent>, +C<d_setgrps>, C<d_sethent>, C<d_setlinebuf>, C<d_setlocale>, C<d_setnent>, +C<d_setpent>, C<d_setpgid>, C<d_setpgrp2>, C<d_setpgrp>, C<d_setprior>, +C<d_setpwent>, C<d_setregid>, C<d_setresgid>, C<d_setresuid>, +C<d_setreuid>, C<d_setrgid>, C<d_setruid>, C<d_setsent>, C<d_setsid>, +C<d_setvbuf>, C<d_sfio>, C<d_shm>, C<d_shmat>, C<d_shmatprototype>, +C<d_shmctl>, C<d_shmdt>, C<d_shmget>, C<d_sigaction>, C<d_sigsetjmp>, +C<d_socket>, C<d_sockpair>, C<d_stat64>, C<d_statblks>, C<d_statfs>, +C<d_statfsflags>, C<d_statvfs>, C<d_stdio_cnt_lval>, C<d_stdio_ptr_lval>, +C<d_stdio_stream_array>, C<d_stdiobase>, C<d_stdstdio>, C<d_store64>, +C<d_strchr>, C<d_strcoll>, C<d_strctcpy>, C<d_strerrm>, C<d_strerror>, +C<d_strtod>, C<d_strtol>, C<d_strtoul>, C<d_strxfrm>, C<d_suidsafe>, +C<d_symlink>, C<d_syscall>, C<d_sysconf>, C<d_sysernlst>, C<d_syserrlst>, +C<d_system>, C<d_tcgetpgrp>, C<d_tcsetpgrp>, C<d_telldir64>, C<d_telldir>, +C<d_telldirproto>, C<d_time>, C<d_times>, C<d_tmpfile64>, C<d_truncate64>, +C<d_truncate>, C<d_tzname>, C<d_umask>, C<d_uname>, C<d_union_semun>, +C<d_vfork>, C<d_void_closedir>, C<d_voidsig>, C<d_voidtty>, C<d_volatile>, +C<d_vprintf>, C<d_wait4>, C<d_waitpid>, C<d_wcstombs>, C<d_wctomb>, +C<d_writev>, C<d_xenix>, C<date>, C<db_hashtype>, C<db_prefixtype>, +C<defvoidused>, C<direntrytype>, C<dlext>, C<dlsrc>, C<doublesize>, +C<drand01>, C<dynamic_ext> + +=item e + +C<eagain>, C<ebcdic>, C<echo>, C<egrep>, C<emacs>, C<eunicefix>, +C<exe_ext>, C<expr>, C<extensions> + +=item f + +C<fflushall>, C<fflushNULL>, C<find>, C<firstmakefile>, C<flex>, +C<fpostype>, C<freetype>, C<full_ar>, C<full_csh>, C<full_sed> + +=item g + +C<gccversion>, C<gidtype>, C<glibpth>, C<grep>, C<groupcat>, C<groupstype>, +C<gzip> + +=item h + +C<h_fcntl>, C<h_sysfile>, C<hint>, C<hostcat>, C<huge> + +=item i + +C<i_arpainet>, C<i_bsdioctl>, C<i_db>, C<i_dbm>, C<i_dirent>, C<i_dld>, +C<i_dlfcn>, C<i_fcntl>, C<i_float>, C<i_gdbm>, C<i_grp>, C<i_inttypes>, +C<i_limits>, C<i_locale>, C<i_machcthr>, C<i_malloc>, C<i_math>, +C<i_memory>, C<i_mntent>, C<i_ndbm>, C<i_netdb>, C<i_neterrno>, +C<i_netinettcp>, C<i_niin>, C<i_poll>, C<i_pthread>, C<i_pwd>, +C<i_rpcsvcdbm>, C<i_sfio>, C<i_sgtty>, C<i_stdarg>, C<i_stddef>, +C<i_stdlib>, C<i_string>, C<i_sysaccess>, C<i_sysdir>, C<i_sysfile>, +C<i_sysfilio>, C<i_sysin>, C<i_sysioctl>, C<i_sysmman>, C<i_sysmount>, +C<i_sysndir>, C<i_sysparam>, C<i_sysresrc>, C<i_syssecrt>, C<i_sysselct>, +C<i_syssockio>, C<i_sysstat>, C<i_sysstatvfs>, C<i_systime>, C<i_systimek>, +C<i_systimes>, C<i_systypes>, C<i_sysuio>, C<i_sysun>, C<i_syswait>, +C<i_termio>, C<i_termios>, C<i_time>, C<i_unistd>, C<i_utime>, C<i_values>, +C<i_varargs>, C<i_varhdr>, C<i_vfork>, C<ignore_versioned_solibs>, +C<incpath>, C<inews>, C<installarchlib>, C<installbin>, C<installman1dir>, +C<installman3dir>, C<installprivlib>, C<installscript>, C<installsitearch>, +C<installsitelib>, C<installusrbinperl>, C<intsize> + +=item k + +C<known_extensions>, C<ksh> + +=item l + +C<large>, C<ld>, C<lddlflags>, C<ldflags>, C<less>, C<lib_ext>, C<libc>, +C<libperl>, C<libpth>, C<libs>, C<libswanted>, C<line>, C<lint>, +C<lkflags>, C<ln>, C<lns>, C<locincpth>, C<loclibpth>, C<longdblsize>, +C<longlongsize>, C<longsize>, C<lp>, C<lpr>, C<ls>, C<lseeksize>, +C<lseektype> + +=item m + +C<mail>, C<mailx>, C<make>, C<make_set_make>, C<mallocobj>, C<mallocsrc>, +C<malloctype>, C<man1dir>, C<man1direxp>, C<man1ext>, C<man3dir>, +C<man3direxp>, C<man3ext> + +=item M + +C<Mcc>, C<medium>, C<mips_type>, C<mkdir>, C<mmaptype>, C<models>, +C<modetype>, C<more>, C<multiarch>, C<mv>, C<myarchname>, C<mydomain>, +C<myhostname>, C<myuname> + +=item n + +C<n>, C<netdb_hlen_type>, C<netdb_host_type>, C<netdb_name_type>, +C<netdb_net_type>, C<nm>, C<nm_opt>, C<nm_so_opt>, C<nonxs_ext>, C<nroff> + +=item o + +C<o_nonblock>, C<obj_ext>, C<old_pthread_create_joinable>, C<optimize>, +C<orderlib>, C<osname>, C<osvers> + +=item p + +C<package>, C<pager>, C<passcat>, C<patchlevel>, C<path_sep>, C<perl>, +C<perladmin>, C<perlpath>, C<pg>, C<phostname>, C<pidtype>, C<plibpth>, +C<pmake>, C<pr>, C<prefix>, C<prefixexp>, C<privlib>, C<privlibexp>, +C<prototype>, C<ptrsize> + +=item r + +C<randbits>, C<randfunc>, C<randseedtype>, C<ranlib>, C<rd_nodata>, C<rm>, +C<rmail>, C<runnm> + +=item s + +C<sched_yield>, C<scriptdir>, C<scriptdirexp>, C<sed>, C<seedfunc>, +C<selectminbits>, C<selecttype>, C<sendmail>, C<sh>, C<shar>, C<sharpbang>, +C<shmattype>, C<shortsize>, C<shrpenv>, C<shsharp>, C<sig_count>, +C<sig_name>, C<sig_name_init>, C<sig_num>, C<sig_num_init>, C<signal_t>, +C<sitearch>, C<sitearchexp>, C<sitelib>, C<sitelibexp>, C<sizetype>, +C<sleep>, C<smail>, C<small>, C<so>, C<sockethdr>, C<socketlib>, C<sort>, +C<spackage>, C<spitshell>, C<split>, C<src>, C<ssizetype>, C<startperl>, +C<startsh>, C<static_ext>, C<stdchar>, C<stdio_base>, C<stdio_bufsiz>, +C<stdio_cnt>, C<stdio_filbuf>, C<stdio_ptr>, C<stdio_stream_array>, +C<strings>, C<submit>, C<subversion>, C<sysman> + +=item t + +C<tail>, C<tar>, C<tbl>, C<tee>, C<test>, C<timeincl>, C<timetype>, +C<touch>, C<tr>, C<trnl>, C<troff> + +=item u + +C<uidtype>, C<uname>, C<uniq>, C<use64bits>, C<usedl>, C<usemultiplicity>, +C<usemymalloc>, C<usenm>, C<useopcode>, C<useperlio>, C<useposix>, +C<usesfio>, C<useshrplib>, C<usethreads>, C<usevfork>, C<usrinc>, C<uuname> + +=item v + +C<version>, C<vi>, C<voidflags> + +=item x + +C<xlibpth> + +=item z + +C<zcat>, C<zip> + +=back + +=item NOTE + =head2 Cwd, getcwd - get pathname of current working directory =item SYNOPSIS =item DESCRIPTION +=head2 DB - programmatic interface to the Perl debugging API (draft, +subject to +change) + +=item SYNOPSIS + +=item DESCRIPTION + +=over + +=item Global Variables + + $DB::sub, %DB::sub, $DB::single, $DB::signal, $DB::trace, @DB::args, +@DB::dbline, %DB::dbline, $DB::package, $DB::filename, $DB::subname, +$DB::lineno + +=item API Methods + +CLIENT->register(), CLIENT->evalcode(STRING), CLIENT->skippkg('D::hide'), +CLIENT->run(), CLIENT->step(), CLIENT->next(), CLIENT->done() + +=item Client Callback Methods + +CLIENT->init(), CLIENT->prestop([STRING]), CLIENT->stop(), CLIENT->idle(), +CLIENT->poststop([STRING]), CLIENT->evalcode(STRING), CLIENT->cleanup(), +CLIENT->output(LIST) + +=back + +=item BUGS + +=item AUTHOR + =head2 DB_File - Perl5 access to Berkeley DB version 1.x =item SYNOPSIS @@ -4001,6 +4731,10 @@ B<DB_HASH>, B<DB_BTREE>, B<DB_RECNO> =item The get_dup() Method +=item The find_dup() Method + +=item The del_dup() Method + =item Matching Partial Keys =back @@ -4013,7 +4747,7 @@ B<DB_HASH>, B<DB_BTREE>, B<DB_RECNO> =item A Simple Example -=item Extra Methods +=item Extra RECNO Methods B<$X-E<gt>push(list) ;>, B<$value = $X-E<gt>pop ;>, B<$X-E<gt>shift>, B<$X-E<gt>unshift(list) ;>, B<$X-E<gt>length> @@ -4078,8 +4812,8 @@ printing and C<eval> =item Methods -I<PACKAGE>->new(I<ARRAYREF [>, I<ARRAYREF]>), I<$OBJ>->Dump I<or> -I<PACKAGE>->Dump(I<ARRAYREF [>, I<ARRAYREF]>), I<$OBJ>->Dumpxs I<or> +I<PACKAGE>->new(I<ARRAYREF [>, I<ARRAYREF]>), I<$OBJ>->Dump I<or> +I<PACKAGE>->Dump(I<ARRAYREF [>, I<ARRAYREF]>), I<$OBJ>->Dumpxs I<or> I<PACKAGE>->Dumpxs(I<ARRAYREF [>, I<ARRAYREF]>), I<$OBJ>->Seen(I<[HASHREF]>), I<$OBJ>->Values(I<[ARRAYREF]>), I<$OBJ>->Names(I<[ARRAYREF]>), I<$OBJ>->Reset @@ -4118,6 +4852,44 @@ Dumper =item SEE ALSO +=head2 Devel::Peek - A data debugging tool for the XS programmer + +=item SYNOPSIS + +=item DESCRIPTION + +=item EXAMPLES + +=over + +=item A simple scalar string + +=item A simple scalar number + +=item A simple scalar with an extra reference + +=item A reference to a simple scalar + +=item A reference to an array + +=item A reference to a hash + +=item Dumping a large array or hash + +=item A reference to an SV which holds a C pointer + +=item A reference to a subroutine + +=back + +=item EXPORTS + +=item BUGS + +=item AUTHOR + +=item SEE ALSO + =head2 Devel::SelfStubber - generate stubs for a SelfLoading module =item SYNOPSIS @@ -4130,6 +4902,42 @@ Dumper =item DESCRIPTION +=head2 Dumpvalue - provides screen dump of Perl data. + +=item SYNOPSYS + +=item DESCRIPTION + +=over + +=item Creation + +C<arrayDepth>, C<hashDepth>, C<compactDump>, C<veryCompact>, C<globPrint>, +C<DumpDBFiles>, C<DumpPackages>, C<DumpReused>, C<tick>, C<HighBit>, +C<printUndef>, C<UsageOnly>, unctrl, subdump, bareStringify, quoteHighBit, +stopDbSignal + +=item Methods + +dumpValue, dumpValues, dumpvars, set_quote, set_unctrl, compactDump, +veryCompact, set, get + +=back + +=head2 DynaLoader - Dynamically load C libraries into Perl code + +=item SYNOPSIS + +=item DESCRIPTION + +@dl_library_path, @dl_resolve_using, @dl_require_symbols, @dl_librefs, +@dl_modules, dl_error(), $dl_debug, dl_findfile(), dl_expandspec(), +dl_load_file(), dl_loadflags(), dl_find_symbol(), +dl_find_symbol_anywhere(), dl_undef_symbols(), dl_install_xsub(), +bootstrap() + +=item AUTHOR + =head2 English - use nice English (or awk) names for ugly punctuation variables @@ -4145,6 +4953,16 @@ variables =item AUTHOR +=head2 Errno - System errno constants + +=item SYNOPSIS + +=item DESCRIPTION + +=item AUTHOR + +=item COPYRIGHT + =head2 Exporter - Implements default import method for modules =item SYNOPSIS @@ -4256,6 +5074,15 @@ For static extensions, For dynamic extensions, For dynamic extensions =item SEE ALSO +=head2 ExtUtils::MM_Cygwin - methods to override UN*X behaviour in +ExtUtils::MakeMaker + +=item SYNOPSIS + +=item DESCRIPTION + +canonpath, cflags, manifypods, perl_archive + =head2 ExtUtils::MM_OS2 - methods to override UN*X behaviour in ExtUtils::MakeMaker @@ -4297,7 +5124,7 @@ perm_rw (o), perm_rwx (o), pm_to_blib, post_constants (o), post_initialize replace_manpage_separator, static (o), static_lib (o), staticmake (o), subdir_x (o), subdirs (o), test (o), test_via_harness (o), test_via_script (o), tool_autosplit (o), tools_other (o), tool_xsubpp (o), top_targets (o), -writedoc, xs_c (o), xs_o (o), perl_archive, export_list +writedoc, xs_c (o), xs_cpp (o), xs_o (o), perl_archive, export_list =back @@ -4379,21 +5206,23 @@ dist_ci (o), dist_core (o), pasthru (o) =item Using Attributes and Parameters -C, CCFLAGS, CONFIG, CONFIGURE, DEFINE, DIR, DISTNAME, DL_FUNCS, DL_VARS, -EXCLUDE_EXT, EXE_FILES, NO_VC, FIRST_MAKEFILE, FULLPERL, H, IMPORTS, INC, +AUTHOR, ABSTRACT, ABSTRACT_FROM, BINARY_LOCATION, C, CAPI, CCFLAGS, CONFIG, +CONFIGURE, DEFINE, DIR, DISTNAME, DL_FUNCS, DL_VARS, EXCLUDE_EXT, +EXE_FILES, FIRST_MAKEFILE, FULLPERL, FUNCLIST, H, IMPORTS, INC, INCLUDE_EXT, INSTALLARCHLIB, INSTALLBIN, INSTALLDIRS, INSTALLMAN1DIR, -INSTALLMAN3DIR, INSTALLPRIVLIB, INSTALLSCRIPT, INSTALLSITELIB, -INSTALLSITEARCH, INST_ARCHLIB, INST_BIN, INST_EXE, INST_LIB, INST_MAN1DIR, -INST_MAN3DIR, INST_SCRIPT, LDFROM, LIBPERL_A, LIB, LIBS, LINKTYPE, +INSTALLMAN3DIR, INSTALLPRIVLIB, INSTALLSCRIPT, INSTALLSITEARCH, +INSTALLSITELIB, INST_ARCHLIB, INST_BIN, INST_EXE, INST_LIB, INST_MAN1DIR, +INST_MAN3DIR, INST_SCRIPT, LDFROM, LIB, LIBPERL_A, LIBS, LINKTYPE, MAKEAPERL, MAKEFILE, MAN1PODS, MAN3PODS, MAP_TARGET, MYEXTLIB, NAME, -NEEDS_LINKING, NOECHO, NORECURS, OBJECT, OPTIMIZE, PERL, PERLMAINCC, +NEEDS_LINKING, NOECHO, NORECURS, NO_VC, OBJECT, OPTIMIZE, PERL, PERLMAINCC, PERL_ARCHLIB, PERL_LIB, PERL_SRC, PERM_RW, PERM_RWX, PL_FILES, PM, -PMLIBDIRS, PREFIX, PREREQ_PM, SKIP, TYPEMAPS, VERSION, VERSION_FROM, XS, -XSOPT, XSPROTOARG, XS_VERSION +PMLIBDIRS, POLLUTE, PPM_INSTALL_EXEC, PPM_INSTALL_SCRIPT, PREFIX, +PREREQ_PM, SKIP, TYPEMAPS, VERSION, VERSION_FROM, XS, XSOPT, XSPROTOARG, +XS_VERSION =item Additional lowercase attributes -clean, depend, dist, dynamic_lib, installpm, linkext, macro, realclean, +clean, depend, dist, dynamic_lib, linkext, macro, realclean, test, tool_autosplit =item Overriding MakeMaker Methods @@ -4402,14 +5231,18 @@ tool_autosplit =item Distribution Support -make distcheck, make skipcheck, make distclean, make manifest, -make distdir, make tardist, make dist, make uutardist, make + make distcheck, make skipcheck, make distclean, make manifest, + make distdir, make tardist, make dist, make uutardist, make shdist, make zipdist, make ci =item Disabling an extension =back +=item ENVIRONMENT + +PERL_MM_OPT + =item SEE ALSO =item AUTHORS @@ -4435,6 +5268,14 @@ C<Added to MANIFEST:> I<file> =item AUTHOR +=head2 ExtUtils::Miniperl, writemain - write the C code for perlmain.c + +=item SYNOPSIS + +=item DESCRIPTION + +=item SEE ALSO + =head2 ExtUtils::Mkbootstrap - make a bootstrap file for use by DynaLoader =item SYNOPSIS @@ -4448,7 +5289,7 @@ extension =item DESCRIPTION -NAME, DL_FUNCS, DL_VARS, FILE, FUNCLIST, DLBASE +DLBASE, DL_FUNCS, DL_VARS, FILE, FUNCLIST, IMPORTS, NAME =item AUTHOR @@ -4530,7 +5371,7 @@ C<basename>, C<dirname> =over -=item Special behavior if C<syscopy> is defined (VMS and OS/2) +=item Special behaviour if C<syscopy> is defined (OS/2, VMS and Win32) rmscopy($from,$to[,$date_flag]) @@ -4572,8 +5413,6 @@ rmscopy($from,$to[,$date_flag]) =item AUTHORS -=item REVISION - =head2 File::Spec - portably perform operations on file names =item SYNOPSIS @@ -4584,6 +5423,20 @@ rmscopy($from,$to[,$date_flag]) =item AUTHORS +=head2 File::Spec::Functions - portably perform operations on file names + +=item SYNOPSIS + +=item DESCRIPTION + +=over + +=item Exports + +=back + +=item SEE ALSO + =head2 File::Spec::Mac - File::Spec for MacOS =item SYNOPSIS @@ -4592,8 +5445,8 @@ rmscopy($from,$to[,$date_flag]) =item METHODS -canonpath, catdir, catfile, curdir, rootdir, updir, file_name_is_absolute, -path +canonpath, catdir, catfile, curdir, devnull, rootdir, tmpdir, updir, +file_name_is_absolute, path =item SEE ALSO @@ -4611,8 +5464,9 @@ path =item METHODS -canonpath, catdir, catfile, curdir, rootdir, updir, no_upwards, -file_name_is_absolute, path, join, nativename +canonpath, catdir, catfile, curdir, devnull, rootdir, tmpdir, updir, +no_upwards, file_name_is_absolute, path, join, splitpath, splitdir, +catpath, abs2rel, rel2abs =item SEE ALSO @@ -4626,18 +5480,24 @@ file_name_is_absolute, path, join, nativename =item Methods always loaded -catdir, catfile, curdir (override), rootdir (override), updir (override), -path (override), file_name_is_absolute (override) +catdir, catfile, curdir (override), devnull (override), rootdir (override), +tmpdir (override), updir (override), path (override), file_name_is_absolute +(override) =back +=item SEE ALSO + =head2 File::Spec::Win32 - methods for Win32 file specs =item SYNOPSIS =item DESCRIPTION -catfile, canonpath +devnull, tmpdir, catfile, canonpath, splitpath, splitdir, catpath, abs2rel, +rel2abs + +=item SEE ALSO =head2 File::stat - by-name interface to Perl's built-in stat() functions @@ -4681,8 +5541,6 @@ $fh->print, $fh->printf, $fh->getline, $fh->getlines =item COPYRIGHT -=item REVISION - =head2 GDBM_File - Perl5 access to the gdbm library. =item SYNOPSIS @@ -4725,7 +5583,7 @@ options =item CONFIGURATION OPTIONS default, auto_abbrev, getopt_compat, require_order, permute, bundling -(default: reset), bundling_override (default: reset), ignore_case +(default: reset), bundling_override (default: reset), ignore_case (default: set), ignore_case_always (default: reset), pass_through (default: reset), prefix, prefix_pattern, debug (default: reset) @@ -4757,6 +5615,215 @@ locale =item DESCRIPTION +=head2 IO::Dir - supply object methods for directory handles + +=item SYNOPSIS + +=item DESCRIPTION + +new ( [ DIRNAME ] ), open ( DIRNAME ), read (), seek ( POS ), tell (), +rewind (), close (), tie %hash, IO::Dir, DIRNAME [, OPTIONS ] + +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + +=head2 IO::File - supply object methods for filehandles + +=item SYNOPSIS + +=item DESCRIPTION + +=item CONSTRUCTOR + +new ( FILENAME [,MODE [,PERMS]] ), new_tmpfile + +=item METHODS + +open( FILENAME [,MODE [,PERMS]] ) + +=item SEE ALSO + +=item HISTORY + +=head2 IO::Handle - supply object methods for I/O handles + +=item SYNOPSIS + +=item DESCRIPTION + +=item CONSTRUCTOR + +new (), new_from_fd ( FD, MODE ) + +=item METHODS + +$io->fdopen ( FD, MODE ), $io->opened, $io->getline, $io->getlines, +$io->ungetc ( ORD ), $io->write ( BUF, LEN [, OFFSET ] ), $io->error, +$io->clearerr, $io->sync, $io->flush, $io->printflush ( ARGS ), +$io->blocking ( [ BOOL ] ), $io->untaint + +=item NOTE + +=item SEE ALSO + +=item BUGS + +=item HISTORY + +=head2 IO::Pipe - supply object methods for pipes + +=item SYNOPSIS + +=item DESCRIPTION + +=item CONSTRUCTOR + +new ( [READER, WRITER] ) + +=item METHODS + +reader ([ARGS]), writer ([ARGS]), handles () + +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + +=head2 IO::Poll - Object interface to system poll call + +=item SYNOPSIS + +=item DESCRIPTION + +=item METHODS + +mask ( IO [, EVENT_MASK ] ), poll ( [ TIMEOUT ] ), events ( IO ), remove ( +IO ), handles( [ EVENT_MASK ] ) + +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + +=head2 IO::Seekable - supply seek based methods for I/O objects + +=item SYNOPSIS + +=item DESCRIPTION + +=item SEE ALSO + +=item HISTORY + +=head2 IO::Select - OO interface to the select system call + +=item SYNOPSIS + +=item DESCRIPTION + +=item CONSTRUCTOR + +new ( [ HANDLES ] ) + +=item METHODS + +add ( HANDLES ), remove ( HANDLES ), exists ( HANDLE ), handles, can_read ( +[ TIMEOUT ] ), can_write ( [ TIMEOUT ] ), has_exception ( [ TIMEOUT ] ), +count (), bits(), select ( READ, WRITE, ERROR [, TIMEOUT ] ) + +=item EXAMPLE + +=item AUTHOR + +=item COPYRIGHT + +=head2 IO::Socket - Object interface to socket communications + +=item SYNOPSIS + +=item DESCRIPTION + +=item CONSTRUCTOR + +new ( [ARGS] ) + +=item METHODS + +accept([PKG]), timeout([VAL]), sockopt(OPT [, VAL]), sockdomain, socktype, +protocol, connected + +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + +=head2 IO::Socket::INET - Object interface for AF_INET domain sockets + +=item SYNOPSIS + +=item DESCRIPTION + +=item CONSTRUCTOR + +new ( [ARGS] ) + +=over + +=item METHODS + +sockaddr (), sockport (), sockhost (), peeraddr (), peerport (), peerhost +() + +=back + +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + +=head2 IO::Socket::UNIX - Object interface for AF_UNIX domain sockets + +=item SYNOPSIS + +=item DESCRIPTION + +=item CONSTRUCTOR + +new ( [ARGS] ) + +=item METHODS + +hostpath(), peerpath() + +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + +=head2 IO::lib::IO::Dir, IO::Dir - supply object methods for directory +handles + +=item SYNOPSIS + +=item DESCRIPTION + +new ( [ DIRNAME ] ), open ( DIRNAME ), read (), seek ( POS ), tell (), +rewind (), close (), tie %hash, IO::Dir, DIRNAME [, OPTIONS ] + +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + =head2 IO::lib::IO::File, IO::File - supply object methods for filehandles =item SYNOPSIS @@ -4765,7 +5832,7 @@ locale =item CONSTRUCTOR -new ([ ARGS ] ), new_tmpfile +new ( FILENAME [,MODE [,PERMS]] ), new_tmpfile =item METHODS @@ -4788,9 +5855,10 @@ new (), new_from_fd ( FD, MODE ) =item METHODS -$fh->fdopen ( FD, MODE ), $fh->opened, $fh->getline, $fh->getlines, -$fh->ungetc ( ORD ), $fh->write ( BUF, LEN [, OFFSET }\] ), $fh->flush, -$fh->error, $fh->clearerr, $fh->untaint +$io->fdopen ( FD, MODE ), $io->opened, $io->getline, $io->getlines, +$io->ungetc ( ORD ), $io->write ( BUF, LEN [, OFFSET ] ), $io->error, +$io->clearerr, $io->sync, $io->flush, $io->printflush ( ARGS ), +$io->blocking ( [ BOOL ] ), $io->untaint =item NOTE @@ -4800,13 +5868,13 @@ $fh->error, $fh->clearerr, $fh->untaint =item HISTORY -=head2 IO::lib::IO::Pipe, IO::pipe - supply object methods for pipes +=head2 IO::lib::IO::Pipe, IO::Pipe - supply object methods for pipes =item SYNOPSIS =item DESCRIPTION -=item CONSTRCUTOR +=item CONSTRUCTOR new ( [READER, WRITER] ) @@ -4820,6 +5888,23 @@ reader ([ARGS]), writer ([ARGS]), handles () =item COPYRIGHT +=head2 IO::lib::IO::Poll, IO::Poll - Object interface to system poll call + +=item SYNOPSIS + +=item DESCRIPTION + +=item METHODS + +mask ( IO [, EVENT_MASK ] ), poll ( [ TIMEOUT ] ), events ( IO ), remove ( +IO ), handles( [ EVENT_MASK ] ) + +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + =head2 IO::lib::IO::Seekable, IO::Seekable - supply seek based methods for I/O objects @@ -4845,8 +5930,8 @@ new ( [ HANDLES ] ) =item METHODS add ( HANDLES ), remove ( HANDLES ), exists ( HANDLE ), handles, can_read ( -[ TIMEOUT ] ), can_write ( [ TIMEOUT ] ), has_error ( [ TIMEOUT ] ), count -(), bits(), bits(), select ( READ, WRITE, ERROR [, TIMEOUT ] ) +[ TIMEOUT ] ), can_write ( [ TIMEOUT ] ), has_exception ( [ TIMEOUT ] ), +count (), bits(), select ( READ, WRITE, ERROR [, TIMEOUT ] ) =item EXAMPLE @@ -4868,26 +5953,72 @@ new ( [ARGS] ) =item METHODS accept([PKG]), timeout([VAL]), sockopt(OPT [, VAL]), sockdomain, socktype, -protocol +protocol, connected -=item SUB-CLASSES +=item SEE ALSO -=over +=item AUTHOR + +=item COPYRIGHT + +=head2 IO::lib::IO::Socket::INET, IO::Socket::INET - Object interface for +AF_INET domain sockets -=item IO::Socket::INET +=item SYNOPSIS + +=item DESCRIPTION + +=item CONSTRUCTOR + +new ( [ARGS] ) + +=over =item METHODS sockaddr (), sockport (), sockhost (), peeraddr (), peerport (), peerhost () -=item IO::Socket::UNIX +=back + +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + +=head2 IO::lib::IO::Socket::UNIX, IO::Socket::UNIX - Object interface for +AF_UNIX domain sockets + +=item SYNOPSIS + +=item DESCRIPTION + +=item CONSTRUCTOR + +new ( [ARGS] ) =item METHODS hostpath(), peerpath() -=back +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + +=head2 IPC::Msg - SysV Msg IPC object class + +=item SYNOPSIS + +=item DESCRIPTION + +=item METHODS + +new ( KEY , FLAGS ), id, rcv ( BUF, LEN [, TYPE [, FLAGS ]] ), remove, set +( STAT ), set ( NAME => VALUE [, NAME => VALUE ...] ), snd ( TYPE, MSG [, +FLAGS ] ), stat =item SEE ALSO @@ -4914,6 +6045,25 @@ handling =item WARNING +=head2 IPC::Semaphore - SysV Semaphore IPC object class + +=item SYNOPSIS + +=item DESCRIPTION + +=item METHODS + +new ( KEY , NSEMS , FLAGS ), getall, getncnt ( SEM ), getpid ( SEM ), +getval ( SEM ), getzcnt ( SEM ), id, op ( OPLIST ), remove, set ( STAT ), +set ( NAME => VALUE [, NAME => VALUE ...] ), setall ( VALUES ), setval ( N +, VALUE ), stat + +=item SEE ALSO + +=item AUTHOR + +=item COPYRIGHT + =head2 IPC::SysV - SysV IPC constants =item SYNOPSIS @@ -4972,7 +6122,8 @@ set ( NAME => VALUE [, NAME => VALUE ...] ), setall ( VALUES ), setval ( N =item DESCRIPTION -number format, Error returns 'NaN', Division is computed to +number format, Error returns 'NaN', Division is computed to, Rounding is +performed =item BUGS @@ -5133,6 +6284,18 @@ functions =item AUTHOR +=head2 O - Generic interface to Perl Compiler backends + +=item SYNOPSIS + +=item DESCRIPTION + +=item CONVENTIONS + +=item IMPLEMENTATION + +=item AUTHOR + =head2 ODBM_File - Tied access to odbm files =item SYNOPSIS @@ -5209,7 +6372,7 @@ Memory, CPU, Snooping, Signals, State Changes =head2 Opcode::ops, ops - Perl pragma to restrict unsafe operations when compiling -=item SYNOPSIS +=item SYNOPSIS =item DESCRIPTION @@ -5342,6 +6505,18 @@ Constants, Macros =item CREATION +=head2 Pod::Checker, podchecker() - check pod documents for syntax errors + +=item SYNOPSIS + +=item OPTIONS/ARGUMENTS + +=item DESCRIPTION + +=item EXAMPLES + +=item AUTHOR + =head2 Pod::Html - module to convert pod files to HTML =item SYNOPSIS @@ -5350,8 +6525,8 @@ Constants, Macros =item ARGUMENTS -help, htmlroot, infile, outfile, podroot, podpath, libpods, netscape, -nonetscape, index, noindex, recurse, norecurse, title, verbose +help, htmldir, htmlroot, infile, outfile, podroot, podpath, libpods, +netscape, nonetscape, index, noindex, recurse, norecurse, title, verbose =item EXAMPLE @@ -5363,6 +6538,261 @@ nonetscape, index, noindex, recurse, norecurse, title, verbose =item COPYRIGHT +=head2 Pod::InputObjects - objects representing POD input paragraphs, +commands, etc. + +=item SYNOPSIS + +=item REQUIRES + +=item EXPORTS + +=item DESCRIPTION + +B<Pod::InputSource>, B<Pod::Paragraph>, B<Pod::InteriorSequence>, +B<Pod::ParseTree> + +=item B<Pod::InputSource> + +=over + +=item B<new()> + +=item B<name()> + +=item B<handle()> + +=item B<was_cutting()> + +=back + +=item B<Pod::Paragraph> + +=over + +=item B<new()> + +=item B<cmd_name()> + +=item B<text()> + +=item B<raw_text()> + +=item B<cmd_prefix()> + +=item B<cmd_separator()> + +=item B<parse_tree()> + +=item B<file_line()> + +=back + +=item B<Pod::InteriorSequence> + +=over + +=item B<new()> + +=item B<cmd_name()> + +=item B<prepend()> + +=item B<append()> + +=item B<nested()> + +=item B<raw_text()> + +=item B<left_delimiter()> + +=item B<right_delimiter()> + +=item B<parse_tree()> + +=item B<file_line()> + +=item B<DESTROY()> + +=back + +=item B<Pod::ParseTree> + +=over + +=item B<new()> + +=item B<top()> + +=item B<children()> + +=item B<prepend()> + +=item B<append()> + +=item B<raw_text()> + +=item B<DESTROY()> + +=back + +=item SEE ALSO + +=item AUTHOR + +=head2 Pod::Parser - base class for creating POD filters and translators + +=item SYNOPSIS + +=item REQUIRES + +=item EXPORTS + +=item DESCRIPTION + +=item QUICK OVERVIEW + +=item RECOMMENDED SUBROUTINE/METHOD OVERRIDES + +=item B<command()> + +C<$cmd>, C<$text>, C<$line_num>, C<$pod_para> + +=item B<verbatim()> + +C<$text>, C<$line_num>, C<$pod_para> + +=item B<textblock()> + +C<$text>, C<$line_num>, C<$pod_para> + +=item B<interior_sequence()> + +=item OPTIONAL SUBROUTINE/METHOD OVERRIDES + +=item B<new()> + +=item B<initialize()> + +=item B<begin_pod()> + +=item B<begin_input()> + +=item B<end_input()> + +=item B<end_pod()> + +=item B<preprocess_line()> + +=item B<preprocess_paragraph()> + +=item METHODS FOR PARSING AND PROCESSING + +=item B<parse_text()> + +B<-expand_seq> =E<gt> I<code-ref>|I<method-name>, B<-expand_ptree> =E<gt> +I<code-ref>|I<method-name> + +=item B<interpolate()> + +=item B<parse_paragraph()> + +=item B<parse_from_filehandle()> + +=item B<parse_from_file()> + +=item ACCESSOR METHODS + +=item B<cutting()> + +=item B<output_file()> + +=item B<output_handle()> + +=item B<input_file()> + +=item B<input_handle()> + +=item B<input_streams()> + +=item B<top_stream()> + +=item PRIVATE METHODS AND DATA + +=item B<_push_input_stream()> + +=item B<_pop_input_stream()> + +=item SEE ALSO + +=item AUTHOR + +=head2 Pod::PlainText, pod2plaintext - function to convert POD data to +formatted ASCII text + +=item SYNOPSIS + +=item REQUIRES + +=item EXPORTS + +=item DESCRIPTION + +=item SEE ALSO + +=item AUTHOR + +=head2 Pod::Select, podselect() - extract selected sections of POD from +input + +=item SYNOPSIS + +=item REQUIRES + +=item EXPORTS + +=item DESCRIPTION + +=item SECTION SPECIFICATIONS + +=item RANGE SPECIFICATIONS + +=item OBJECT METHODS + +=item B<curr_headings()> + +=item B<select()> + +=item B<add_selection()> + +=item B<clear_selections()> + +=item B<match_section()> + +=item B<is_selected()> + +=item EXPORTED FUNCTIONS + +=item B<podselect()> + +B<-output>, B<-sections>, B<-ranges> + +=item PRIVATE METHODS AND DATA + +=item B<_compile_section_spec()> + +=over + +=item $self->{_SECTION_HEADINGS} + +=item $self->{_SELECTED_SECTIONS} + +=back + +=item SEE ALSO + +=item AUTHOR + =head2 Pod::Text - convert POD data to formatted ASCII text =item SYNOPSIS @@ -5373,12 +6803,67 @@ nonetscape, index, noindex, recurse, norecurse, title, verbose =item TODO +=head2 Pod::Usage, pod2usage() - print a usage message from embedded pod +documentation + +=item SYNOPSIS + +=item ARGUMENTS + +C<-message>, C<-msg>, C<-exitval>, C<-verbose>, C<-output>, C<-input>, +C<-pathlist> + +=item DESCRIPTION + +=item EXAMPLES + +=over + +=item Recommended Use + +=back + +=item CAVEATS + +=item AUTHOR + +=item ACKNOWLEDGEMENTS + =head2 SDBM_File - Tied access to sdbm files =item SYNOPSIS =item DESCRIPTION +=head2 Safe - Compile and execute code in restricted compartments + +=item SYNOPSIS + +=item DESCRIPTION + +a new namespace, an operator mask + +=item WARNING + +=over + +=item RECENT CHANGES + +=item Methods in class Safe + +permit (OP, ...), permit_only (OP, ...), deny (OP, ...), deny_only (OP, +...), trap (OP, ...), untrap (OP, ...), share (NAME, ...), share_from +(PACKAGE, ARRAYREF), varglob (VARNAME), reval (STRING), rdo (FILENAME), +root (NAMESPACE), mask (MASK) + +=item Some Safety Issues + +Memory, CPU, Snooping, Signals, State Changes + +=item AUTHOR + +=back + =head2 Search::Dict, look - search for key in dictionary file =item SYNOPSIS @@ -5511,7 +6996,7 @@ C<tkRunning>, C<ornaments>, C<newTTY> =item ENVIRONMENT -=head2 Test - provides a simple framework for writing test scripts +=head2 Test - provides a simple framework for writing test scripts =item SYNOPSIS @@ -5521,6 +7006,8 @@ C<tkRunning>, C<ornaments>, C<newTTY> NORMAL TESTS, SKIPPED TESTS, TODO TESTS +=item RETURN VALUE + =item ONFAIL =item SEE ALSO @@ -5613,8 +7100,6 @@ unexpand(1) =item EXAMPLE -=item BUGS - =item AUTHOR =head2 Thread - multithreading @@ -5627,11 +7112,11 @@ unexpand(1) new \&start_sub, new \&start_sub, LIST, lock VARIABLE, async BLOCK;, Thread->self, Thread->list, cond_wait VARIABLE, cond_signal VARIABLE, -cond_broadcast VARIABLE +cond_broadcast VARIABLE, yield =item METHODS -join, eval, tid +join, eval, detach, equal, tid =item LIMITATIONS @@ -5671,11 +7156,13 @@ new, new NUMBER, down, down NUMBER, up, up NUMBER =item SYNOPSIS +=item DESCRIPTION + =head2 Tie::Array - base class for tied arrays -=item SYNOPSIS +=item SYNOPSIS -=item DESCRIPTION +=item DESCRIPTION TIEARRAY classname, LIST, STORE this, index, value, FETCH this, index, FETCHSIZE this, STORESIZE this, count, EXTEND this, count, CLEAR this, @@ -5686,7 +7173,8 @@ SPLICE this, offset, length, LIST =item AUTHOR -=head2 Tie::Handle - base class definitions for tied handles +=head2 Tie::Handle, Tie::StdHandle - base class definitions for tied +handles =item SYNOPSIS @@ -5694,7 +7182,8 @@ SPLICE this, offset, length, LIST TIEHANDLE classname, LIST, WRITE this, scalar, length, offset, PRINT this, LIST, PRINTF this, format, LIST, READ this, scalar, length, offset, -READLINE this, GETC this, DESTROY this +READLINE this, GETC this, CLOSE this, OPEN this, filename, BINMODE this, +EOF this, TELL this, SEEK this, offset, whence, DESTROY this =item MORE INFORMATION @@ -5750,6 +7239,10 @@ TIESCALAR classname, LIST, FETCH this, STORE this, value, DESTROY this =item DESCRIPTION +=item IMPLEMENTATION + +=item BUGS + =head2 Time::gmtime - by-name interface to Perl's built-in gmtime() function diff --git a/pod/perltootc.pod b/pod/perltootc.pod new file mode 100644 index 0000000000..f7157e83aa --- /dev/null +++ b/pod/perltootc.pod @@ -0,0 +1,1337 @@ +=head1 NAME + +perltootc - Tom's OO Tutorial for Class Data in Perl + +=head1 DESCRIPTION + +When designing an object class, you are sometimes faced with the situation +of wanting common state shared by all objects of that class. +Such I<class attributes> act somewhat like global variables for the entire +class, but unlike program-wide globals, class attributes have meaning only to +the class itself. + +Here are a few examples where class attributes might come in handy: + +=over + +=item * + +to keep a count of the objects you've created, or how many are +still extant. + +=item * + +to extract the name or file descriptor for a logfile used by a debugging +method. + +=item * + +to access collective data, like the total amount of cash dispensed by +all ATMs in a network in a given day. + +=item * + +to access the last object created by a class, or the most accessed object, +or to retrieve a list of all objects. + +=back + +Unlike a true global, class attributes should not be accessed directly. +Instead, their state should be inspected, and perhaps altered, only +through the mediated access of I<class methods>. These class attributes +accessor methods are similar in spirit and function to accessors used +to manipulate the state of instance attributes on an object. They provide a +clear firewall between interface and implementation. + +You should allow access to class attributes through either the class +name or any object of that class. If we assume that $an_object is of +type Some_Class, and the &Some_Class::population_count method accesses +class attributes, then these two invocations should both be possible, +and almost certainly equivalent. + + Some_Class->population_count() + $an_object->population_count() + +The question is, where do you store the state which that method accesses? +Unlike more restrictive languages like C++, where these are called +static data members, Perl provides no syntactic mechanism to declare +class attributes, any more than it provides a syntactic mechanism to +declare instance attributes. Perl provides the developer with a broad +set of powerful but flexible features that can be uniquely crafted to +the particular demands of the situation. + +A class in Perl is typically implemented in a module. A module consists +of two complementary feature sets: a package for interfacing with the +outside world, and a lexical file scope for privacy. Either of these +two mechanisms can be used to implement class attributes. That means you +get to decide whether to put your class attributes in package variables +or to put them in lexical variables. + +And those aren't the only decisions to make. If you choose to use package +variables, you can make your class attribute accessor methods either ignorant +of inheritance or sensitive to it. If you choose lexical variables, +you can elect to permit access to them from anywhere in the entire file +scope, or you can limit direct data access exclusively to the methods +implementing those attributes. + +=head1 Class Data as Package Variables + +Because a class in Perl is really just a package, using package variables +to hold class attributes is the most natural choice. This makes it simple +for each class to have its own class attributes. Let's say you have a class +called Some_Class that needs a couple of different attributes that you'd +like to be global to the entire class. The simplest thing to do is to +use package variables like $Some_Class::CData1 and $Some_Class::CData2 +to hold these attributes. But we certainly don't want to encourage +outsiders to touch those data directly, so we provide methods +to mediate access. + +In the accessor methods below, we'll for now just ignore the first +argument--that part to the left of the arrow on method invocation, which +is either a class name or an object reference. + + package Some_Class; + sub CData1 { + shift; # XXX: ignore calling class/object + $Some_Class::CData1 = shift if @_; + return $Some_Class::CData1; + } + sub CData2 { + shift; # XXX: ignore calling class/object + $Some_Class::CData2 = shift if @_; + return $Some_Class::CData2; + } + +This technique is highly legible and should be completely straightforward +to even the novice Perl programmer. By fully qualifying the package +variables, they stand out clearly when reading the code. Unfortunately, +if you misspell one of these, you've introduced an error that's hard +to catch. It's also somewhat disconcerting to see the class name itself +hard-coded in so many places. + +Both these problems can be easily fixed. Just add the C<use strict> +pragma, then pre-declare your package variables. (The C<our> operator +will be new in 5.006, and will work for package globals just like C<my> +works for scoped lexicals.) + + package Some_Class; + use strict; + our($CData1, $CData2); # our() is new to perl5.006 + sub CData1 { + shift; # XXX: ignore calling class/object + $CData1 = shift if @_; + return $CData1; + } + sub CData2 { + shift; # XXX: ignore calling class/object + $CData2 = shift if @_; + return $CData2; + } + + +As with any other global variable, some programmers prefer to start their +package variables with capital letters. This helps clarity somewhat, but +by no longer fully qualifying the package variables, their significance +can be lost when reading the code. You can fix this easily enough by +choosing better names than were used here. + +=head2 Putting All Your Eggs in One Basket + +Just as the mindless enumeration of accessor methods for instance attributes +grows tedious after the first few (see L<perltoot>), so too does the +repetition begin to grate when listing out accessor methods for class +data. Repetition runs counter to the primary virtue of a programmer: +Laziness, here manifesting as that innate urge every programmer feels +to factor out duplicate code whenever possible. + +Here's what to do. First, make just one hash to hold all class attributes. + + package Some_Class; + use strict; + our %ClassData = ( # our() is new to perl5.006 + CData1 => "", + CData2 => "", + ); + +Using closures (see L<perlref>) and direct access to the package symbol +table (see L<perlmod>), now clone an accessor method for each key in +the %ClassData hash. Each of these methods is used to fetch or store +values to the specific, named class attribute. + + for my $datum (keys %ClassData) { + no strict "refs"; # to register new methods in package + *$datum = sub { + shift; # XXX: ignore calling class/object + $ClassData{$datum} = shift if @_; + return $ClassData{$datum}; + } + } + +It's true that you could work out a solution employing an &AUTOLOAD +method, but this approach is unlikely to prove satisfactory. Your +function would have to distinguish between class attributes and object +attributes; it could interfere with inheritance; and it would have to +careful about DESTROY. Such complexity is uncalled for in most cases, +and certainly in this one. + +You may wonder why we're rescinding strict refs for the loop. We're +manipulating the package's symbol table to introduce new function names +using symbolic references (indirect naming), which the strict pragma +would otherwise forbid. Normally, symbolic references are a dodgy +notion at best. This isn't just because they can be used accidentally +when you aren't meaning to. It's also because for most uses +to which beginning Perl programmers attempt to put symbolic references, +we have much better approaches, like nested hashes or hashes of arrays. +But there's nothing wrong with using symbolic references to manipulate +something that is meaningful only from the perspective of the package +symbol symbol table, like method names or package variables. In other +words, when you want to refer to the symbol table, use symbol references. + +Clustering all the class attributes in one place has several advantages. +They're easy to spot, initialize, and change. The aggregation also +makes them convenient to access externally, such as from a debugger +or a persistence package. The only possible problem is that we don't +automatically know the name of each class's class object, should it have +one. This issue is addressed below in L<"The Eponymous Meta-Object">. + +=head2 Inheritance Concerns + +Suppose you have an instance of a derived class, and you access class +data using an inherited method call. Should that end up referring +to the base class's attributes, or to those in the derived class? +How would it work in the earlier examples? The derived class inherits +all the base class's methods, including those that access class attributes. +But what package are the class attributes stored in? + +The answer is that, as written, class attributes are stored in the package into +which those methods were compiled. When you invoke the &CData1 method +on the name of the derived class or on one of that class's objects, the +version shown above is still run, so you'll access $Some_Class::CData1--or +in the method cloning version, C<$Some_Class::ClassData{CData1}>. + +Think of these class methods as executing in the context of their base +class, not in that of their derived class. Sometimes this is exactly +what you want. If Feline subclasses Carnivore, then the population of +Carnivores in the world should go up when a new Feline is born. +But what if you wanted to figure out how many Felines you have apart +from Carnivores? The current approach doesn't support that. + +You'll have to decide on a case-by-case basis whether it makes any sense +for class attributes to be package-relative. If you want it to be so, +then stop ignoring the first argument to the function. Either it will +be a package name if the method was invoked directly on a class name, +or else it will be an object reference if the method was invoked on an +object reference. In the latter case, the ref() function provides the +class of that object. + + package Some_Class; + sub CData1 { + my $obclass = shift; + my $class = ref($obclass) || $obclass; + my $varname = $class . "::CData1"; + no strict "refs"; # to access package data symbolically + $$varname = shift if @_; + return $$varname; + } + +And then do likewise for all other class attributes (such as CData2, +etc.) that you wish to access as package variables in the invoking package +instead of the compiling package as we had previously. + +Once again we temporarily disable the strict references ban, because +otherwise we couldn't use the fully-qualified symbolic name for +the package global. This is perfectly reasonable: since all package +variables by definition live in a package, there's nothing wrong with +accessing them via that package's symbol table. That's what it's there +for (well, somewhat). + +What about just using a single hash for everything and then cloning +methods? What would that look like? The only difference would be the +closure used to produce new method entries for the class's symbol table. + + no strict "refs"; + *$datum = sub { + my $obclass = shift; + my $class = ref($obclass) || $obclass; + my $varname = $class . "::ClassData"; + $varname->{$datum} = shift if @_; + return $varname->{$datum}; + } + +=head2 The Eponymous Meta-Object + +It could be argued that the %ClassData hash in the previous example is +neither the most imaginative nor the most intuitive of names. Is there +something else that might make more sense, be more useful, or both? + +As it happens, yes, there is. For the "class meta-object", we'll use +a package variable of the same name as the package itself. Within the +scope of a package Some_Class declaration, we'll use the eponymously +named hash %Some_Class as that class's meta-object. (Using an eponymously +named hash is somewhat reminiscent of classes that name their constructors +eponymously in the Python or C++ fashion. That is, class Some_Class would +use &Some_Class::Some_Class as a constructor, probably even exporting that +name as well. The StrNum class in Recipe 13.14 in I<The Perl Cookbook> +does this, if you're looking for an example.) + +This predictable approach has many benefits, including having a well-known +identifier to aid in debugging, transparent persistence, +or checkpointing. It's also the obvious name for monadic classes and +translucent attributes, discussed later. + +Here's an example of such a class. Notice how the name of the +hash storing the meta-object is the same as the name of the package +used to implement the class. + + package Some_Class; + use strict; + + # create class meta-object using that most perfect of names + our %Some_Class = ( # our() is new to perl5.006 + CData1 => "", + CData2 => "", + ); + + # this accessor is calling-package-relative + sub CData1 { + my $obclass = shift; + my $class = ref($obclass) || $obclass; + no strict "refs"; # to access eponymous meta-object + $class->{CData1} = shift if @_; + return $class->{CData1}; + } + + # but this accessor is not + sub CData2 { + shift; # XXX: ignore calling class/object + no strict "refs"; # to access eponymous meta-object + __PACKAGE__ -> {CData2} = shift if @_; + return __PACKAGE__ -> {CData2}; + } + +In the second accessor method, the __PACKAGE__ notation was used for +two reasons. First, to avoid hardcoding the literal package name +in the code in case we later want to change that name. Second, to +clarify to the reader that what matters here is the package currently +being compiled into, not the package of the invoking object or class. +If the long sequence of non-alphabetic characters bothers you, you can +always put the __PACKAGE__ in a variable first. + + sub CData2 { + shift; # XXX: ignore calling class/object + no strict "refs"; # to access eponymous meta-object + my $class = __PACKAGE__; + $class->{CData2} = shift if @_; + return $class->{CData2}; + } + +Even though we're using symbolic references for good not evil, some +folks tend to become unnerved when they see so many places with strict +ref checking disabled. Given a symbolic reference, you can always +produce a real reference (the reverse is not true, though). So we'll +create a subroutine that does this conversion for us. If invoked as a +function of no arguments, it returns a reference to the compiling class's +eponymous hash. Invoked as a class method, it returns a reference to +the eponymous hash of its caller. And when invoked as an object method, +this function returns a reference to the eponymous hash for whatever +class the object belongs to. + + package Some_Class; + use strict; + + our %Some_Class = ( # our() is new to perl5.006 + CData1 => "", + CData2 => "", + ); + + # tri-natured: function, class method, or object method + sub _classobj { + my $obclass = shift || __PACKAGE__; + my $class = ref($obclass) || $obclass; + no strict "refs"; # to convert sym ref to real one + return \%$class; + } + + for my $datum (keys %{ _classobj() } ) { + # turn off strict refs so that we can + # register a method in the symbol table + no strict "refs"; + *$datum = sub { + use strict "refs"; + my $self = shift->_classobj(); + $self->{$datum} = shift if @_; + return $self->{$datum}; + } + } + +=head2 Indirect References to Class Data + +A reasonably common strategy for handling class attributes is to store +a reference to each package variable on the object itself. This is +a strategy you've probably seen before, such as in L<perltoot> and +L<perlbot>, but there may be variations in the example below that you +haven't thought of before. + + package Some_Class; + our($CData1, $CData2); # our() is new to perl5.006 + + sub new { + my $obclass = shift; + return bless my $self = { + ObData1 => "", + ObData2 => "", + CData1 => \$CData1, + CData2 => \$CData2, + } => (ref $obclass || $obclass); + } + + sub ObData1 { + my $self = shift; + $self->{ObData1} = shift if @_; + return $self->{ObData1}; + } + + sub ObData2 { + my $self = shift; + $self->{ObData2} = shift if @_; + return $self->{ObData2}; + } + + sub CData1 { + my $self = shift; + my $dataref = ref $self + ? $self->{CData1} + : \$CData1; + $$dataref = shift if @_; + return $$dataref; + } + + sub CData2 { + my $self = shift; + my $dataref = ref $self + ? $self->{CData2} + : \$CData2; + $$dataref = shift if @_; + return $$dataref; + } + +As written above, a derived class will inherit these methods, which +will consequently access package variables in the base class's package. +This is not necessarily expected behavior in all circumstances. Here's an +example that uses a variable meta-object, taking care to access the +proper package's data. + + package Some_Class; + use strict; + + our %Some_Class = ( # our() is new to perl5.006 + CData1 => "", + CData2 => "", + ); + + sub _classobj { + my $self = shift; + my $class = ref($self) || $self; + no strict "refs"; + # get (hard) ref to eponymous meta-object + return \%$class; + } + + sub new { + my $obclass = shift; + my $classobj = $obclass->_classobj(); + bless my $self = { + ObData1 => "", + ObData2 => "", + CData1 => \$classobj->{CData1}, + CData2 => \$classobj->{CData2}, + } => (ref $obclass || $obclass); + return $self; + } + + sub ObData1 { + my $self = shift; + $self->{ObData1} = shift if @_; + return $self->{ObData1}; + } + + sub ObData2 { + my $self = shift; + $self->{ObData2} = shift if @_; + return $self->{ObData2}; + } + + sub CData1 { + my $self = shift; + $self = $self->_classobj() unless ref $self; + my $dataref = $self->{CData1}; + $$dataref = shift if @_; + return $$dataref; + } + + sub CData2 { + my $self = shift; + $self = $self->_classobj() unless ref $self; + my $dataref = $self->{CData2}; + $$dataref = shift if @_; + return $$dataref; + } + +Not only are we now strict refs clean, using an eponymous meta-object +seems to make the code cleaner. Unlike the previous version, this one +does something interesting in the face of inheritance: it accesses the +class meta-object in the invoking class instead of the one into which +the method was initially compiled. + +You can easily access data in the class meta-object, making +it easy to dump the complete class state using an external mechanism such +as when debugging or implementing a persistent class. This works because +the class meta-object is a package variable, has a well-known name, and +clusters all its data together. (Transparent persistence +is not always feasible, but it's certainly an appealing idea.) + +There's still no check that object accessor methods have not been +invoked on a class name. If strict ref checking is enabled, you'd +blow up. If not, then you get the eponymous meta-object. What you do +with--or about--this is up to you. The next two sections demonstrate +innovative uses for this powerful feature. + +=head2 Monadic Classes + +Some of the standard modules shipped with Perl provide class interfaces +without any attribute methods whatsoever. The most commonly used module +not numbered amongst the pragmata, the Exporter module, is a class with +neither constructors nor attributes. Its job is simply to provide a +standard interface for modules wishing to export part of their namespace +into that of their caller. Modules use the Exporter's &import method by +setting their inheritance list in their package's @ISA array to mention +"Exporter". But class Exporter provides no constructor, so you can't +have several instances of the class. In fact, you can't have any--it +just doesn't make any sense. All you get is its methods. Its interface +contains no statefulness, so state data is wholly superfluous. + +Another sort of class that pops up from time to time is one that supports +a unique instance. Such classes are called I<monadic classes>, or less +formally, I<singletons> or I<highlander classes>. + +If a class is monadic, where do you store its state, that is, +its attributes? How do you make sure that there's never more than +one instance? While you could merely use a slew of package variables, +it's a lot cleaner to use the eponymously named hash. Here's a complete +example of a monadic class: + + package Cosmos; + %Cosmos = (); + + # accessor method for "name" attribute + sub name { + my $self = shift; + $self->{name} = shift if @_; + return $self->{name}; + } + + # read-only accessor method for "birthday" attribute + sub birthday { + my $self = shift; + die "can't reset birthday" if @_; # XXX: croak() is better + return $self->{birthday}; + } + + # accessor method for "stars" attribute + sub stars { + my $self = shift; + $self->{stars} = shift if @_; + return $self->{stars}; + } + + # oh my - one of our stars just went out! + sub supernova { + my $self = shift; + my $count = $self->stars(); + $self->stars($count - 1) if $count > 0; + } + + # constructor/initializer method - fix by reboot + sub bigbang { + my $self = shift; + %$self = ( + name => "the world according to tchrist", + birthday => time(), + stars => 0, + ); + return $self; # yes, it's probably a class. SURPRISE! + } + + # After the class is compiled, but before any use or require + # returns, we start off the universe with a bang. + __PACKAGE__ -> bigbang(); + +Hold on, that doesn't look like anything special. Those attribute +accessors look no different than they would if this were a regular class +instead of a monadic one. The crux of the matter is there's nothing +that says that $self must hold a reference to a blessed object. It merely +has to be something you can invoke methods on. Here the package name +itself, Cosmos, works as an object. Look at the &supernova method. Is that +a class method or an object method? The answer is that static analysis +cannot reveal the answer. Perl doesn't care, and neither should you. +In the three attribute methods, C<%$self> is really accessing the %Cosmos +package variable. + +If like Stephen Hawking, you posit the existence of multiple, sequential, +and unrelated universes, then you can invoke the &bigbang method yourself +at any time to start everything all over again. You might think of +&bigbang as more of an initializer than a constructor, since the function +doesn't allocate new memory; it only initializes what's already there. +But like any other constructor, it does return a scalar value to use +for later method invocations. + +Imagine that some day in the future, you decide that one universe just +isn't enough. You could write a new class from scratch, but you already +have an existing class that does what you want--except that it's monadic, +and you want more than just one cosmos. + +That's what code reuse via subclassing is all about. Look how short +the new code is: + + package Multiverse; + use Cosmos; + @ISA = qw(Cosmos); + + sub new { + my $protoverse = shift; + my $class = ref($protoverse) || $protoverse; + my $self = {}; + return bless($self, $class)->bigbang(); + } + 1; + +Because we were careful to be good little creators when we designed our +Cosmos class, we can now reuse it without touching a single line of code +when it comes time to write our Multiverse class. The same code that +worked when invoked as a class method continues to work perfectly well +when invoked against separate instances of a derived class. + +The astonishing thing about the Cosmos class above is that the value +returned by the &bigbang "constructor" is not a reference to a blessed +object at all. It's just the class's own name. A class name is, for +virtually all intents and purposes, a perfectly acceptable object. +It has state, behavior, and identify, the three crucial components +of an object system. It even manifests inheritance, polymorphism, +and encapsulation. And what more can you ask of an object? + +To understand object orientation in Perl, it's important to recognize the +unification of what other programming languages might think of as class +methods and object methods into just plain methods. "Class methods" +and "object methods" are distinct only in the compartmentalizing mind +of the Perl programmer, not in the Perl language itself. + +Along those same lines, a constructor is nothing special either, which +is one reason why Perl has no pre-ordained name for them. "Constructor" +is just an informal term loosely used to describe a method that returns +a scalar value that you can make further method calls against. So long +as it's either a class name or an object reference, that's good enough. +It doesn't even have to be a reference to a brand new object. + +You can have as many--or as few--constructors as you want, and you can +name them whatever you care to. Blindly and obediently using new() +for each and every constructor you ever write is to speak Perl with +such a severe C++ accent that you do a disservice to both languages. +There's no reason to insist that each class have but one constructor, +or that that constructor be named new(), or that that constructor be +used solely as a class method and not an object method. + +The next section shows how useful it can be to further distance ourselves +from any formal distinction between class method calls and object method +calls, both in constructors and in accessor methods. + +=head2 Translucent Attributes + +A package's eponymous hash can be used for more than just containing +per-class, global state data. It can also serve as a sort of template +containing default settings for object attributes. These default +settings can then be used in constructors for initialization of a +particular object. The class's eponymous hash can also be used to +implement I<translucent attributes>. A translucent attribute is one +that has a class-wide default. Each object can set its own value for the +attribute, in which case C<$object-E<gt>attribute()> returns that value. +But if no value has been set, then C<$object-E<gt>attribute()> returns +the class-wide default. + +We'll apply something of a copy-on-write approach to these translucent +attributes. If you're just fetching values from them, you get +translucency. But if you store a new value to them, that new value is +set on the current object. On the other hand, if you use the class as +an object and store the attribute value directly on the class, then the +meta-object's value changes, and later fetch operations on objects with +uninitialized values for those attributes will retrieve the meta-object's +new values. Objects with their own initialized values, however, won't +see any change. + +Let's look at some concrete examples of using these properties before we +show how to implement them. Suppose that a class named Some_Class +had a translucent data attribute called "color". First you set the color +in the meta-object, then you create three objects using a constructor +that happens to be named &spawn. + + use Vermin; + Vermin->color("vermilion"); + + $ob1 = Vermin->spawn(); # so that's where Jedi come from + $ob2 = Vermin->spawn(); + $ob3 = Vermin->spawn(); + + print $obj3->color(); # prints "vermilion" + +Each of these objects' colors is now "vermilion", because that's the +meta-object's value that attribute, and these objects do not have +individual color values set. + +Changing the attribute on one object has no effect on other objects +previously created. + + $ob3->color("chartreuse"); + print $ob3->color(); # prints "chartreuse" + print $ob1->color(); # prints "vermilion", translucently + +If you now use $ob3 to spawn off another object, the new object will +take the color its parent held, which now happens to be "chartreuse". +That's because the constructor uses the invoking object as its template +for initializing attributes. When that invoking object is the +class name, the object used as a template is the eponymous meta-object. +When the invoking object is a reference to an instantiated object, the +&spawn constructor uses that existing object as a template. + + $ob4 = $ob3->spawn(); # $ob3 now template, not %Vermin + print $ob4->color(); # prints "chartreuse" + +Any actual values set on the template object will be copied to the +new object. But attributes undefined in the template object, being +translucent, will remain undefined and consequently translucent in the +new one as well. + +Now let's change the color attribute on the entire class: + + Vermin->color("azure"); + print $ob1->color(); # prints "azure" + print $ob2->color(); # prints "azure" + print $ob3->color(); # prints "chartreuse" + print $ob4->color(); # prints "chartreuse" + +That color change took effect only in the first pair of objects, which +were still translucently accessing the meta-object's values. The second +pair had per-object initialized colors, and so didn't change. + +One important question remains. Changes to the meta-object are reflected +in translucent attributes in the entire class, but what about +changes to discrete objects? If you change the color of $ob3, does the +value of $ob4 see that change? Or vice-versa. If you change the color +of $ob4, does then the value of $ob3 shift? + + $ob3->color("amethyst"); + print $ob3->color(); # prints "amethyst" + print $ob4->color(); # hmm: "chartreuse" or "amethyst"? + +While one could argue that in certain rare cases it should, let's not +do that. Good taste aside, we want the answer to the question posed in +the comment above to be "chartreuse", not "amethyst". So we'll treat +these attributes similar to the way process attributes like environment +variables, user and group IDs, or the current working directory are +treated across a fork(). You can change only yourself, but you will see +those changes reflected in your unspawned children. Changes to one object +will propagate enither up to the parent nor down to any existing child objects. +Those objects made later, however, will see the changes. + +If you have an object with an actual attribute value, and you want to +make that object's attribute value translucent again, what do you do? +Let's design the class so that when you invoke an accessor method with +C<undef> as its argument, that attribute returns to translucency. + + $ob4->color(undef); # back to "azure" + +Here's a complete implementation of Vermin as described above. + + package Vermin; + + # here's the class meta-object, eponymously named. + # it holds all class attributes, and also all instance attributes + # so the latter can be used for both initialization + # and translucency. + + our %Vermin = ( # our() is new to perl5.006 + PopCount => 0, # capital for class attributes + color => "beige", # small for instance attributes + ); + + # constructor method + # invoked as class method or object method + sub spawn { + my $obclass = shift; + my $class = ref($obclass) || $obclass; + my $self = {}; + bless($self, $class); + $class->{PopCount}++; + # init fields from invoking object, or omit if + # invoking object is the class to provide translucency + %$self = %$obclass if ref $obclass; + return $self; + } + + # translucent accessor for "color" attribute + # invoked as class method or object method + sub color { + my $self = shift; + my $class = ref($self) || $self; + + # handle class invocation + unless (ref $self) { + $class->{color} = shift if @_; + return $class->{color} + } + + # handle object invocation + $self->{color} = shift if @_; + if (defined $self->{color}) { # not exists! + return $self->{color}; + } else { + return $class->{color}; + } + } + + # accessor for "PopCount" class attribute + # invoked as class method or object method + # but uses object solely to locate meta-object + sub population { + my $obclass = shift; + my $class = ref($obclass) || $obclass; + return $class->{PopCount}; + } + + # instance destructor + # invoked only as object method + sub DESTROY { + my $self = shift; + my $class = ref $self; + $class->{PopCount}--; + } + +Here are a couple of helper methods that might be convenient. They aren't +accessor methods at all. They're used to detect accessibility of data +attributes. The &is_translucent method determines whether a particular +object attribute is coming from the meta-object. The &has_attribute +method detects whether a class implements a particular property at all. +It could also be used to distinguish undefined properties from non-existent +ones. + + # detect whether an object attribute is translucent + # (typically?) invoked only as object method + sub is_translucent { + my($self, $attr) = @_; + return !defined $self->{$attr}; + } + + # test for presence of attribute in class + # invoked as class method or object method + sub has_attribute { + my($self, $attr) = @_; + my $class = ref $self if $self; + return exists $class->{$attr}; + } + +If you prefer to install your accessors more generically, you can make +use of the upper-case versus lower-case convention to register into the +package appropriate methods cloned from generic closures. + + for my $datum (keys %{ +__PACKAGE__ }) { + *$datum = ($datum =~ /^[A-Z]/) + ? sub { # install class accessor + my $obclass = shift; + my $class = ref($obclass) || $obclass; + return $class->{$datum}; + } + : sub { # install translucent accessor + my $self = shift; + my $class = ref($self) || $self; + unless (ref $self) { + $class->{$datum} = shift if @_; + return $class->{$datum} + } + $self->{$datum} = shift if @_; + return defined $self->{$datum} + ? $self -> {$datum} + : $class -> {$datum} + } + } + +Translations of this closure-based approach into C++, Java, and Python +have been left as exercises for the reader. Be sure to send us mail as +soon as you're done. + +=head1 Class Data as Lexical Variables + +=head2 Privacy and Responsibility + +Unlike conventions used by some Perl programmers, in the previous +examples, we didn't prefix the package variables used for class attributes +with an underscore, nor did we do so for the names of the hash keys used +for instance attributes. You don't need little markers on data names to +suggest nominal privacy on attribute variables or hash keys, because these +are B<already> notionally private! Outsiders have no business whatsoever +playing with anything within a class save through the mediated access of +its documented interface; in other words, through method invocations. +And not even through just any method, either. Methods that begin with +an underscore are traditionally considered off-limits outside the class. +If outsiders skip the documented method interface to poke around the +internals of your class and end up breaking something, that's not your +fault--it's theirs. + +Perl believes in individual responsibility rather than mandated control. +Perl respects you enough to let you choose your own preferred level of +pain, or of pleasure. Perl believes that you are creative, intelligent, +and capable of making your own decisions--and fully expects you to +take complete responsibility for your own actions. In a perfect world, +these admonitions alone would suffice, and everyone would be intelligent, +responsible, happy, and creative. And careful. One probably shouldn't +forget careful, and that's a good bit harder to expect. Even Einstein +would take wrong turns by accident and end up lost in the wrong part +of town. + +Some folks get the heebie-jeebies when they see package variables +hanging out there for anyone to reach over and alter them. Some folks +live in constant fear that someone somewhere might do something wicked. +The solution to that problem is simply to fire the wicked, of course. +But unfortunately, it's not as simple as all that. These cautious +types are also afraid that they or others will do something not so +much wicked as careless, whether by accident or out of desperation. +If we fire everyone who ever gets careless, pretty soon there won't be +anybody left to get any work done. + +Whether it's needless paranoia or sensible caution, this uneasiness can +be a problem for some people. We can take the edge off their discomfort +by providing the option of storing class attributes as lexical variables +instead of as package variables. The my() operator is the source of +all privacy in Perl, and it is a powerful form of privacy indeed. + +It is widely perceived, and indeed has often been written, that Perl +provides no data hiding, that it affords the class designer no privacy +nor isolation, merely a rag-tag assortment of weak and unenforcible +social conventions instead. This perception is demonstrably false and +easily disproven. In the next section, we show how to implement forms +of privacy that are far stronger than those provided in nearly any +other object-oriented language. + +=head2 File-Scoped Lexicals + +A lexical variable is visible only through the end of its static scope. +That means that the only code able to access that variable is code +residing textually below the my() operator through the end of its block +if it has one, or through the end of the current file if it doesn't. + +Starting again with our simplest example given at the start of this +document, we replace our() variables with my() versions. + + package Some_Class; + my($CData1, $CData2); # file scope, not in any package + sub CData1 { + shift; # XXX: ignore calling class/object + $CData1 = shift if @_; + return $CData1; + } + sub CData2 { + shift; # XXX: ignore calling class/object + $CData2 = shift if @_; + return $CData2; + } + +So much for that old $Some_Class::CData1 package variable and its brethren! +Those are gone now, replaced with lexicals. No one outside the +scope can reach in and alter the class state without resorting to the +documented interface. Not even subclasses or superclasses of +this one have unmediated access to $CData1. They have to invoke the &CData1 +method against Some_Class or an instance thereof, just like anybody else. + +To be scrupulously honest, that last statement assumes you haven't packed +several classes together into the same file scope, nor strewn your class +implementation across several different files. Accessibility of those +variables is based uniquely on the static file scope. It has nothing to +do with the package. That means that code in a different file but +the same package (class) could not access those variables, yet code in the +same file but a different package (class) could. There are sound reasons +why we usually suggest a one-to-one mapping between files and packages +and modules and classes. You don't have to stick to this suggestion if +you really know what you're doing, but you're apt to confuse yourself +otherwise, especially at first. + +If you'd like to aggregate your class attributes into one lexically scoped, +composite structure, you're perfectly free to do so. + + package Some_Class; + my %ClassData = ( + CData1 => "", + CData2 => "", + ); + sub CData1 { + shift; # XXX: ignore calling class/object + $ClassData{CData1} = shift if @_; + return $ClassData{CData1}; + } + sub CData2 { + shift; # XXX: ignore calling class/object + $ClassData{CData2} = shift if @_; + return $ClassData{CData2}; + } + +To make this more scalable as other class attributes are added, we can +again register closures into the package symbol table to create accessor +methods for them. + + package Some_Class; + my %ClassData = ( + CData1 => "", + CData2 => "", + ); + for my $datum (keys %ClassData) { + no strict "refs"; + *$datum = sub { + shift; # XXX: ignore calling class/object + $ClassData{$datum} = shift if @_; + return $ClassData{$datum}; + }; + } + +Requiring even your own class to use accessor methods like anybody else is +probably a good thing. But demanding and expecting that everyone else, +be they subclass or superclass, friend or foe, will all come to your +object through mediation is more than just a good idea. It's absolutely +critical to the model. Let there be in your mind no such thing as +"public" data, nor even "protected" data, which is a seductive but +ultimately destructive notion. Both will come back to bite at you. +That's because as soon as you take that first step out of the solid +position in which all state is considered completely private, save from the +perspective of its own accessor methods, you have violated the envelope. +And, having pierced that encapsulating envelope, you shall doubtless +someday pay the price when future changes in the implementation break +unrelated code. Considering that avoiding this infelicitous outcome was +precisely why you consented to suffer the slings and arrows of obsequious +abstraction by turning to object orientation in the first place, such +breakage seems unfortunate in the extreme. + +=head2 More Inheritance Concerns + +Suppose that Some_Class were used as a base class from which to derive +Another_Class. If you invoke a &CData method on the derived class or +on an object of that class, what do you get? Would the derived class +have its own state, or would it piggyback on its base class's versions +of the class attributes? + +The answer is that under the scheme outlined above, the derived class +would B<not> have its own state data. As before, whether you consider +this a good thing or a bad one depends on the semantics of the classes +involved. + +The cleanest, sanest, simplest way to address per-class state in a +lexical is for the derived class to override its base class's version +of the method that accesses the class attributes. Since the actual method +called is the one in the object's derived class if this exists, you +automatically get per-class state this way. Any urge to provide an +unadvertised method to sneak out a reference to the %ClassData hash +should be strenuously resisted. + +As with any other overridden method, the implementation in the +derived class always has the option of invoking its base class's +version of the method in addition to its own. Here's an example: + + package Another_Class; + @ISA = qw(Some_Class); + + my %ClassData = ( + CData1 => "", + ); + + sub CData1 { + my($self, $newvalue) = @_; + if (@_ > 1) { + # set locally first + $ClassData{CData1} = $newvalue; + + # then pass the buck up to the first + # overridden version, if there is one + if ($self->can("SUPER::CData1")) { + $self->SUPER::CData1($newvalue); + } + } + return $ClassData{CData1}; + } + +Those dabbling in multiple inheritance might be concerned +about there being more than one override. + + for my $parent (@ISA) { + my $methname = $parent . "::CData1"; + if ($self->can($methname)) { + $self->$methname($newvalue); + } + } + +Because the &UNIVERSAL::can method returns a reference +to the function directly, you can use this directly +for a significant performance improvement: + + for my $parent (@ISA) { + if (my $coderef = $self->can($parent . "::CData1")) { + $self->$coderef($newvalue); + } + } + +=head2 Locking the Door and Throwing Away the Key + +As currently implemented, any code within the same scope as the +file-scoped lexical %ClassData can alter that hash directly. Is that +ok? Is it acceptable or even desirable to allow other parts of the +implementation of this class to access class attributes directly? + +That depends on how careful you want to be. Think back to the Cosmos +class. If the &supernova method had directly altered $Cosmos::Stars or +C<$Cosmos::Cosmos{stars}>, then we wouldn't have been able to reuse the +class when it came to inventing a Multiverse. So letting even the class +itself access its own class attributes without the mediating intervention of +properly designed accessor methods is probably not a good idea after all. + +Restricting access to class attributes from the class itself is usually +not enforcible even in strongly object-oriented languages. But in Perl, +you can. + +Here's one way: + + package Some_Class; + + { # scope for hiding $CData1 + my $CData1; + sub CData1 { + shift; # XXX: unused + $CData1 = shift if @_; + return $CData1; + } + } + + { # scope for hiding $CData2 + my $CData2; + sub CData2 { + shift; # XXX: unused + $CData2 = shift if @_; + return $CData2; + } + } + +No one--absolutely no one--is allowed to read or write the class +attributes without the mediation of the managing accessor method, since +only that method has access to the lexical variable it's managing. +This use of mediated access to class attributes is a form privacy far +stronger than most OO languages provide. + +The repetition of code used to create per-datum accessor methods chafes +at our Laziness, so we'll again use closures to create similar +methods. + + package Some_Class; + + { # scope for ultra-private meta-object for class attributes + my %ClassData = ( + CData1 => "", + CData2 => "", + ); + + for my $datum (keys %ClassData ) { + no strict "refs"; + *$datum = sub { + use strict "refs"; + my ($self, $newvalue) = @_; + $ClassData{$datum} = $newvalue if @_ > 1; + return $ClassData{$datum}; + } + } + + } + +The closure above can be modified to take inheritance into account using +the &UNIVERSAL::can method and SUPER as shown previously. + +=head2 Translucency Revisited + +The Vermin class used to demonstrate translucency used an eponymously +named package variable, %Vermin, as its meta-object. If you prefer to +use absolutely no package variables beyond those necessary to appease +inheritance or possibly the Exporter, this strategy is closed to you. +That's too bad, because translucent attributes are an appealing +technique, so it would be valuable to devise an implementation using +only lexicals. + +There's a second reason why you might wish to avoid the eponymous +package hash. If you use class names with double-colons in them, you +would end up poking around somewhere you might not have meant to poke. + + package Vermin; + $class = "Vermin"; + $class->{PopCount}++; + # accesses $Vermin::Vermin{PopCount} + + package Vermin::Noxious; + $class = "Vermin::Noxious"; + $class->{PopCount}++; + # accesses $Vermin::Noxious{PopCount} + +In the first case, because the class name had no double-colons, we got +the hash in the current package. But in the second case, instead of +getting some hash in the current package, we got the hash %Noxious in +the Vermin package. (The noxious vermin just invaded another package and +sprayed their data around it. :-) Perl doesn't support relative packages +in its naming conventions, so any double-colons trigger a fully-qualified +lookup instead of just looking in the current package. + +In practice, it is unlikely that the Vermin class had an existing +package variable named %Noxious that you just blew away. If you're +still mistrustful, you could always stake out your own territory +where you know the rules, such as using Eponymous::Vermin::Noxious or +Hieronymus::Vermin::Boschious or Leave_Me_Alone::Vermin::Noxious as class +names instead. Sure, it's in theory possible that someone else has +a class named Eponymous::Vermin with its own %Noxious hash, but this +kind of thing is always true. There's no arbiter of package names. +It's always the case that globals like @Cwd::ISA would collide if more +than one class uses the same Cwd package. + +If this still leaves you with an uncomfortable twinge of paranoia, +we have another solution for you. There's nothing that says that you +have to have a package variable to hold a class meta-object, either for +monadic classes or for translucent attributes. Just code up the methods +so that they access a lexical instead. + +Here's another implementation of the Vermin class with semantics identical +to those given previously, but this time using no package variables. + + package Vermin; + + + # Here's the class meta-object, eponymously named. + # It holds all class data, and also all instance data + # so the latter can be used for both initialization + # and translucency. it's a template. + my %ClassData = ( + PopCount => 0, # capital for class attributes + color => "beige", # small for instance attributes + ); + + # constructor method + # invoked as class method or object method + sub spawn { + my $obclass = shift; + my $class = ref($obclass) || $obclass; + my $self = {}; + bless($self, $class); + $ClassData{PopCount}++; + # init fields from invoking object, or omit if + # invoking object is the class to provide translucency + %$self = %$obclass if ref $obclass; + return $self; + } + + # translucent accessor for "color" attribute + # invoked as class method or object method + sub color { + my $self = shift; + + # handle class invocation + unless (ref $self) { + $ClassData{color} = shift if @_; + return $ClassData{color} + } + + # handle object invocation + $self->{color} = shift if @_; + if (defined $self->{color}) { # not exists! + return $self->{color}; + } else { + return $ClassData{color}; + } + } + + # class attribute accessor for "PopCount" attribute + # invoked as class method or object method + sub population { + return $ClassData{PopCount}; + } + + # instance destructor; invoked only as object method + sub DESTROY { + $ClassData{PopCount}--; + } + + # detect whether an object attribute is translucent + # (typically?) invoked only as object method + sub is_translucent { + my($self, $attr) = @_; + $self = \%ClassData if !ref $self; + return !defined $self->{$attr}; + } + + # test for presence of attribute in class + # invoked as class method or object method + sub has_attribute { + my($self, $attr) = @_; + return exists $ClassData{$attr}; + } + +=head1 NOTES + +Inheritance is a powerful but subtle device, best used only after careful +forethought and design. Aggregation instead of inheritance is often a +better approach. + +We use the hypothetical our() syntax for package variables. It works +like C<use vars>, but looks like my(). It should be in this summer's +major release (5.006) of perl--we hope. + +You can't use file-scoped lexicals in conjunction with the SelfLoader +or the AutoLoader, because they alter the lexical scope in which the +module's methods wind up getting compiled. + +The usual mealy-mouthed package-mungeing doubtless applies to setting +up names of object attributes. For example, C<$self-E<gt>{ObData1}> +should probably be C<$self-E<gt>{ __PACKAGE__ . "_ObData1" }>, but that +would just confuse the examples. + +=head1 SEE ALSO + +L<perltoot>, L<perlobj>, L<perlmod>, and L<perlbot>. + +The Tie::SecureHash module from CPAN is worth checking out. + +=head1 AUTHOR AND COPYRIGHT + +Copyright (c) 1999 Tom Christiansen. +All rights reserved. + +When included as part of the Standard Version of Perl, or as part of +its complete documentation whether printed or otherwise, this work +may be distributed only under the terms of Perl's Artistic License. +Any distribution of this file or derivatives thereof I<outside> +of that package require that special arrangements be made with +copyright holder. + +Irrespective of its distribution, all code examples in this file +are hereby placed into the public domain. You are permitted and +encouraged to use this code in your own programs for fun +or for profit as you see fit. A simple comment in the code giving +credit would be courteous but is not required. + +=head1 ACKNOWLEDGEMENTS + +Russ Albery, Jon Orwant, Randy Ray, Larry Rosler, Nat Torkington, +and Stephen Warren all contributed suggestions and corrections to this +piece. Thanks especially to Damian Conway for his ideas and feedback, +and without whose indirect prodding I might never have taken the time +to show others how much Perl has to offer in the way of objects once +you start thinking outside the tiny little box that today's "popular" +object-oriented languages enforce. + +=head1 HISTORY + +Last edit: Fri May 21 15:47:56 MDT 1999 diff --git a/pod/perltrap.pod b/pod/perltrap.pod index 852d8e9826..321c86dd7f 100644 --- a/pod/perltrap.pod +++ b/pod/perltrap.pod @@ -22,7 +22,7 @@ The English module, loaded via use English; allows you to refer to special variables (like C<$/>) with names (like -C<$RS>), as though they were in B<awk>; see L<perlvar> for details. +$RS), as though they were in B<awk>; see L<perlvar> for details. =item * @@ -160,7 +160,7 @@ You must use C<elsif> rather than C<else if>. The C<break> and C<continue> keywords from C become in Perl C<last> and C<next>, respectively. -Unlike in C, these do I<NOT> work within a C<do { } while> construct. +Unlike in C, these do I<not> work within a C<do { } while> construct. =item * @@ -305,7 +305,7 @@ file read is the sole condition in a while loop: =item * -Remember not to use "C<=>" when you need "C<=~>"; +Remember not to use C<=> when you need C<=~>; these two constructs are quite different: $x = /foo/; @@ -1056,7 +1056,7 @@ All types of RE traps. =item * Regular Expression C<s'$lhs'$rhs'> now does no interpolation on either side. It used to -interpolate C<$lhs> but not C<$rhs>. (And still does not match a literal +interpolate $lhs but not $rhs. (And still does not match a literal '$' in string) $a=1;$b=2; @@ -1095,7 +1095,7 @@ the very first time in any such closure. For instance, if you say } build_match() will always return a sub which matches the contents of -C<$left> and C<$right> as they were the I<first> time that build_match() +$left and $right as they were the I<first> time that build_match() was called, not as they are in the current call. This is probably a bug, and may change in future versions of Perl. @@ -1327,7 +1327,7 @@ Note that you can C<use strict;> to ward off such trappiness under perl5. =item * Interpolation The construct "this is $$x" used to interpolate the pid at that -point, but now apparently tries to dereference C<$x>. C<$$> by itself still +point, but now apparently tries to dereference $x. C<$$> by itself still works fine, however. print "this is $$x\n"; diff --git a/pod/perlvar.pod b/pod/perlvar.pod index 5c851d9c15..cb41c96aa0 100644 --- a/pod/perlvar.pod +++ b/pod/perlvar.pod @@ -7,9 +7,9 @@ perlvar - Perl predefined variables =head2 Predefined Names The following names have special meaning to Perl. Most -punctuation names have reasonable mnemonics, or analogues in one of -the shells. Nevertheless, if you wish to use long variable names, -you just need to say +punctuation names have reasonable mnemonics, or analogs in the +shells. Nevertheless, if you wish to use long variable names, +you need only say use English; @@ -17,21 +17,12 @@ at the top of your program. This will alias all the short names to the long names in the current package. Some even have medium names, generally borrowed from B<awk>. -Due to an unfortunate accident of Perl's implementation, "C<use English>" -imposes a considerable performance penalty on all regular expression -matches in a program, regardless of whether they occur in the scope of -"C<use English>". For that reason, saying "C<use English>" in -libraries is strongly discouraged. See the Devel::SawAmpersand module -documentation from CPAN -(http://www.perl.com/CPAN/modules/by-module/Devel/Devel-SawAmpersand-0.10.readme) -for more information. - -To go a step further, those variables that depend on the currently -selected filehandle may instead (and preferably) be set by calling an -object method on the FileHandle object. (Summary lines below for this -contain the word HANDLE.) First you must say +If you don't mind the performance hit, variables that depend on the +currently selected filehandle may instead be set by calling an +appropriate object method on the IO::Handle object. (Summary lines +below for this contain the word HANDLE.) First you must say - use FileHandle; + use IO::Handle; after which you may use either @@ -41,11 +32,13 @@ or more safely, HANDLE->method(EXPR) -Each of the methods returns the old value of the FileHandle attribute. +Each method returns the old value of the IO::Handle attribute. The methods each take an optional EXPR, which if supplied specifies the -new value for the FileHandle attribute in question. If not supplied, -most of the methods do nothing to the current value, except for +new value for the IO::Handle attribute in question. If not supplied, +most methods do nothing to the current value--except for autoflush(), which will assume a 1 for you, just to be different. +Because loading in the IO::Handle class is an expensive operation, you should +learn how to use the regular built-in variables. A few of these variables are considered "read-only". This means that if you try to assign to this variable, either directly or indirectly through @@ -53,10 +46,9 @@ a reference, you'll raise a run-time exception. The following list is ordered by scalar variables first, then the arrays, then the hashes (except $^M was added in the wrong place). -This is somewhat obscured by the fact that %ENV and %SIG are listed as +This is somewhat obscured because %ENV and %SIG are listed as $ENV{expr} and $SIG{expr}. - =over 8 =item $ARG @@ -66,7 +58,7 @@ $ENV{expr} and $SIG{expr}. The default input and pattern-searching space. The following pairs are equivalent: - while (<>) {...} # equivalent in only while! + while (<>) {...} # equivalent only in while! while (defined($_ = <>)) {...} /^Subject:/ @@ -75,8 +67,8 @@ equivalent: tr/a-z/A-Z/ $_ =~ tr/a-z/A-Z/ - chop - chop($_) + chomp + chomp($_) Here are the places where Perl will assume $_ even if you don't use it: @@ -111,7 +103,7 @@ The implicit iterator variable in the grep() and map() functions. The default place to put an input record when a C<E<lt>FHE<gt>> operation's result is tested by itself as the sole criterion of a C<while> -test. Note that outside of a C<while> test, this will not happen. +test. Outside a C<while> test, this will not happen. =back @@ -123,10 +115,11 @@ test. Note that outside of a C<while> test, this will not happen. =item $E<lt>I<digits>E<gt> -Contains the subpattern from the corresponding set of parentheses in -the last pattern matched, not counting patterns matched in nested -blocks that have been exited already. (Mnemonic: like \digits.) -These variables are all read-only. +Contains the subpattern from the corresponding set of capturing +parentheses from the last pattern match, not counting patterns +matched in nested blocks that have been exited already. (Mnemonic: +like \digits.) These variables are all read-only and dynamically +scoped to the current BLOCK. =item $MATCH @@ -134,11 +127,11 @@ These variables are all read-only. The string matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the current -BLOCK). (Mnemonic: like & in some editors.) This variable is read-only. +BLOCK). (Mnemonic: like & in some editors.) This variable is read-only +and dynamically scoped to the current BLOCK. The use of this variable anywhere in a program imposes a considerable -performance penalty on all regular expression matches. See the -Devel::SawAmpersand module from CPAN for more information. +performance penalty on all regular expression matches. See L<BUGS>. =item $PREMATCH @@ -150,8 +143,7 @@ enclosed by the current BLOCK). (Mnemonic: C<`> often precedes a quoted string.) This variable is read-only. The use of this variable anywhere in a program imposes a considerable -performance penalty on all regular expression matches. See the -Devel::SawAmpersand module from CPAN for more information. +performance penalty on all regular expression matches. See L<BUGS>. =item $POSTMATCH @@ -166,28 +158,27 @@ string.) Example: /def/; print "$`:$&:$'\n"; # prints abc:def:ghi -This variable is read-only. +This variable is read-only and dynamically scoped to the current BLOCK. The use of this variable anywhere in a program imposes a considerable -performance penalty on all regular expression matches. See the -Devel::SawAmpersand module from CPAN for more information. +performance penalty on all regular expression matches. See L<BUGS>. =item $LAST_PAREN_MATCH =item $+ The last bracket matched by the last search pattern. This is useful if -you don't know which of a set of alternative patterns matched. For +you don't know which one of a set of alternative patterns matched. For example: /Version: (.*)|Revision: (.*)/ && ($rev = $+); (Mnemonic: be positive and forward looking.) -This variable is read-only. +This variable is read-only and dynamically scoped to the current BLOCK. =item @+ -$+[0] is the offset of the end of the last successfull match. +$+[0] is the offset of the end of the last successful match. C<$+[>I<n>C<]> is the offset of the end of the substring matched by I<n>-th subpattern, or undef if the subpattern did not match. @@ -195,8 +186,8 @@ Thus after a match against $_, $& coincides with C<substr $_, $-[0], $+[0] - $-[0]>. Similarly, C<$>I<n> coincides with C<substr $_, $-[>I<n>C<], $+[>I<n>C<] - $-[>I<n>C<]> if C<$-[>I<n>C<]> is defined, and $+ coincides with C<substr $_, $-[$#-], $+[$#-]>. One can use C<$#+> to find the number -of subgroups in the last successful match. Note the difference with -C<$#->, which is the last I<matched> subgroup. Compare with L<"@-">. +of subgroups in the last successful match. Contrast with +C<$#->, the last I<matched> subgroup. Compare with C<@->. =item $MULTILINE_MATCHING @@ -205,12 +196,12 @@ C<$#->, which is the last I<matched> subgroup. Compare with L<"@-">. Set to 1 to do multi-line matching within a string, 0 to tell Perl that it can assume that strings contain a single line, for the purpose of optimizing pattern matches. Pattern matches on strings containing -multiple newlines can produce confusing results when "C<$*>" is 0. Default -is 0. (Mnemonic: * matches multiple things.) Note that this variable -influences the interpretation of only "C<^>" and "C<$>". A literal newline can +multiple newlines can produce confusing results when C<$*> is 0. Default +is 0. (Mnemonic: * matches multiple things.) This variable +influences the interpretation of only C<^> and C<$>. A literal newline can be searched for even when C<$* == 0>. -Use of "C<$*>" is deprecated in modern Perls, supplanted by +Use of C<$*> is deprecated in modern Perl, supplanted by the C</s> and C</m> modifiers on pattern matching. =item input_line_number HANDLE EXPR @@ -221,17 +212,16 @@ the C</s> and C</m> modifiers on pattern matching. =item $. -The current input line number for the last file handle from -which you read (or performed a C<seek> or C<tell> on). The value +The current input record number for the last file handle from which +you just read() (or called a C<seek> or C<tell> on). The value may be different from the actual physical line number in the file, -depending on what notion of "line" is in effect--see L<$/> on how -to affect that. An -explicit close on a filehandle resets the line number. Because -"C<E<lt>E<gt>>" never does an explicit close, line numbers increase -across ARGV files (but see examples under eof()). Localizing C<$.> has -the effect of also localizing Perl's notion of "the last read -filehandle". (Mnemonic: many programs use "." to mean the current line -number.) +depending on what notion of "line" is in effect--see C<$/> on how +to change that. An explicit close on a filehandle resets the line +number. Because C<E<lt>E<gt>> never does an explicit close, line +numbers increase across ARGV files (but see examples in L<perlfunc/eof>). +Consider this variable read-only: setting it does not reposition +the seek pointer; you'll have to do that on your own. (Mnemonic: +many programs use "." to mean the current line number.) =item input_record_separator HANDLE EXPR @@ -241,48 +231,50 @@ number.) =item $/ -The input record separator, newline by default. This is used to -influence Perl's idea of what a "line" is. Works like B<awk>'s RS -variable, including treating empty lines as delimiters if set to the -null string. (Note: An empty line cannot contain any spaces or tabs.) -You may set it to a multi-character string to match a multi-character -delimiter, or to C<undef> to read to end of file. Note that setting it -to C<"\n\n"> means something slightly different than setting it to -C<"">, if the file contains consecutive empty lines. Setting it to -C<""> will treat two or more consecutive empty lines as a single empty -line. Setting it to C<"\n\n"> will blindly assume that the next input -character belongs to the next paragraph, even if it's a newline. -(Mnemonic: / is used to delimit line boundaries when quoting poetry.) +The input record separator, newline by default. This +influences Perl's idea of what a "line" is. Works like B<awk>'s RS +variable, including treating empty lines as a terminator if set to +the null string. (An empty line cannot contain any spaces +or tabs.) You may set it to a multi-character string to match a +multi-character terminator, or to C<undef> to read through the end +of file. Setting it to C<"\n\n"> means something slightly +different than setting to C<"">, if the file contains consecutive +empty lines. Setting to C<""> will treat two or more consecutive +empty lines as a single empty line. Setting to C<"\n\n"> will +blindly assume that the next input character belongs to the next +paragraph, even if it's a newline. (Mnemonic: / delimits +line boundaries when quoting poetry.) undef $/; # enable "slurp" mode $_ = <FH>; # whole file now here s/\n[ \t]+/ /g; -Remember: the value of $/ is a string, not a regexp. AWK has to be -better for something :-) +Remember: the value of C<$/> is a string, not a regex. B<awk> has to be +better for something. :-) -Setting $/ to a reference to an integer, scalar containing an integer, or -scalar that's convertable to an integer will attempt to read records +Setting C<$/> to a reference to an integer, scalar containing an integer, or +scalar that's convertible to an integer will attempt to read records instead of lines, with the maximum record size being the referenced -integer. So this: +integer. So this: $/ = \32768; # or \"32768", or \$var_containing_32768 open(FILE, $myfile); $_ = <FILE>; -will read a record of no more than 32768 bytes from FILE. If you're not -reading from a record-oriented file (or your OS doesn't have -record-oriented files), then you'll likely get a full chunk of data with -every read. If a record is larger than the record size you've set, you'll -get the record back in pieces. +will read a record of no more than 32768 bytes from FILE. If you're +not reading from a record-oriented file (or your OS doesn't have +record-oriented files), then you'll likely get a full chunk of data +with every read. If a record is larger than the record size you've +set, you'll get the record back in pieces. -On VMS, record reads are done with the equivalent of C<sysread>, so it's -best not to mix record and non-record reads on the same file. (This is -likely not a problem, as any file you'd want to read in record mode is -probably usable in line mode) Non-VMS systems perform normal I/O, so -it's safe to mix record and non-record reads of a file. +On VMS, record reads are done with the equivalent of C<sysread>, +so it's best not to mix record and non-record reads on the same +file. (This is unlikely to be a problem, because any file you'd +want to read in record mode is probably usable in line mode.) +Non-VMS systems do normal I/O, so it's safe to mix record and +non-record reads of a file. -Also see L<$.>. +See also L<perlport/"Newlines">. Also see C<$.>. =item autoflush HANDLE EXPR @@ -290,16 +282,17 @@ Also see L<$.>. =item $| -If set to nonzero, forces a flush right away and after every write or print on the -currently selected output channel. Default is 0 (regardless of whether -the channel is actually buffered by the system or not; C<$|> tells you -only whether you've asked Perl explicitly to flush after each write). -Note that STDOUT will typically be line buffered if output is to the -terminal and block buffered otherwise. Setting this variable is useful -primarily when you are outputting to a pipe, such as when you are running -a Perl script under rsh and want to see the output as it's happening. This -has no effect on input buffering. -(Mnemonic: when you want your pipes to be piping hot.) +If set to nonzero, forces a flush right away and after every write +or print on the currently selected output channel. Default is 0 +(regardless of whether the channel is really buffered by the +system or not; C<$|> tells you only whether you've asked Perl +explicitly to flush after each write). STDOUT will +typically be line buffered if output is to the terminal and block +buffered otherwise. Setting this variable is useful primarily when +you are outputting to a pipe or socket, such as when you are running +a Perl program under B<rsh> and want to see the output as it's +happening. This has no effect on input buffering. See L<perlfunc/getc> +for that. (Mnemonic: when you want your pipes to be piping hot.) =item output_field_separator HANDLE EXPR @@ -310,11 +303,11 @@ has no effect on input buffering. =item $, The output field separator for the print operator. Ordinarily the -print operator simply prints out the comma-separated fields you -specify. To get behavior more like B<awk>, set this variable -as you would set B<awk>'s OFS variable to specify what is printed -between fields. (Mnemonic: what is printed when there is a , in your -print statement.) +print operator simply prints out its arguments without further +adornment. To get behavior more like B<awk>, set this variable as +you would set B<awk>'s OFS variable to specify what is printed +between fields. (Mnemonic: what is printed when there is a "," in +your print statement.) =item output_record_separator HANDLE EXPR @@ -325,21 +318,21 @@ print statement.) =item $\ The output record separator for the print operator. Ordinarily the -print operator simply prints out the comma-separated fields you -specify, with no trailing newline or record separator assumed. -To get behavior more like B<awk>, set this variable as you would -set B<awk>'s ORS variable to specify what is printed at the end of the -print. (Mnemonic: you set "C<$\>" instead of adding \n at the end of the -print. Also, it's just like C<$/>, but it's what you get "back" from -Perl.) +print operator simply prints out its arguments as is, with no +trailing newline or other end-of-record string added. To get +behavior more like B<awk>, set this variable as you would set +B<awk>'s ORS variable to specify what is printed at the end of the +print. (Mnemonic: you set C<$\> instead of adding "\n" at the +end of the print. Also, it's just like C<$/>, but it's what you +get "back" from Perl.) =item $LIST_SEPARATOR =item $" -This is like "C<$,>" except that it applies to array values interpolated -into a double-quoted string (or similar interpreted string). Default -is a space. (Mnemonic: obvious, I think.) +This is like C<$,> except that it applies to array and slice values +interpolated into a double-quoted string (or similar interpreted +string). Default is a space. (Mnemonic: obvious, I think.) =item $SUBSCRIPT_SEPARATOR @@ -364,13 +357,14 @@ which means ($foo{$a},$foo{$b},$foo{$c}) -Default is "\034", the same as SUBSEP in B<awk>. Note that if your -keys contain binary data there might not be any safe value for "C<$;>". +Default is "\034", the same as SUBSEP in B<awk>. If your +keys contain binary data there might not be any safe value for C<$;>. (Mnemonic: comma (the syntactic subscript separator) is a -semi-semicolon. Yeah, I know, it's pretty lame, but "C<$,>" is already +semi-semicolon. Yeah, I know, it's pretty lame, but C<$,> is already taken for something more important.) -Consider using "real" multidimensional arrays. +Consider using "real" multidimensional arrays as described +in L<perllol>. =item $OFMT @@ -378,13 +372,13 @@ Consider using "real" multidimensional arrays. The output format for printed numbers. This variable is a half-hearted attempt to emulate B<awk>'s OFMT variable. There are times, however, -when B<awk> and Perl have differing notions of what is in fact -numeric. The initial value is %.I<n>g, where I<n> is the value +when B<awk> and Perl have differing notions of what counts as +numeric. The initial value is "%.I<n>g", where I<n> is the value of the macro DBL_DIG from your system's F<float.h>. This is different from -B<awk>'s default OFMT setting of %.6g, so you need to set "C<$#>" +B<awk>'s default OFMT setting of "%.6g", so you need to set C<$#> explicitly to get B<awk>'s value. (Mnemonic: # is the number sign.) -Use of "C<$#>" is deprecated. +Use of C<$#> is deprecated. =item format_page_number HANDLE EXPR @@ -393,6 +387,7 @@ Use of "C<$#>" is deprecated. =item $% The current page number of the currently selected output channel. +Used with formats. (Mnemonic: % is page number in B<nroff>.) =item format_lines_per_page HANDLE EXPR @@ -402,7 +397,9 @@ The current page number of the currently selected output channel. =item $= The current page length (printable lines) of the currently selected -output channel. Default is 60. (Mnemonic: = has horizontal lines.) +output channel. Default is 60. +Used with formats. +(Mnemonic: = has horizontal lines.) =item format_lines_left HANDLE EXPR @@ -411,11 +408,13 @@ output channel. Default is 60. (Mnemonic: = has horizontal lines.) =item $- The number of lines left on the page of the currently selected output -channel. (Mnemonic: lines_on_page - lines_printed.) +channel. +Used with formats. +(Mnemonic: lines_on_page - lines_printed.) =item @- -$-[0] is the offset of the start of the last successfull match. +$-[0] is the offset of the start of the last successful match. C<$-[>I<n>C<]> is the offset of the start of the substring matched by I<n>-th subpattern, or undef if the subpattern did not match. @@ -423,9 +422,9 @@ Thus after a match against $_, $& coincides with C<substr $_, $-[0], $+[0] - $-[0]>. Similarly, C<$>I<n> coincides with C<substr $_, $-[>I<n>C<], $+[>I<n>C<] - $-[>I<n>C<]> if C<$-[>I<n>C<]> is defined, and $+ coincides with C<substr $_, $-[$#-], $+[$#-]>. One can use C<$#-> to find the last -matched subgroup in the last successful match. Note the difference with -C<$#+>, which is the number of subgroups in the regular expression. Compare -with L<"@+">. +matched subgroup in the last successful match. Contrast with +C<$#+>, the number of subgroups in the regular expression. Compare +with C<@+>. =item format_name HANDLE EXPR @@ -434,8 +433,8 @@ with L<"@+">. =item $~ The name of the current report format for the currently selected output -channel. Default is name of the filehandle. (Mnemonic: brother to -"C<$^>".) +channel. Default is the name of the filehandle. (Mnemonic: brother to +C<$^>.) =item format_top_name HANDLE EXPR @@ -444,7 +443,7 @@ channel. Default is name of the filehandle. (Mnemonic: brother to =item $^ The name of the current top-of-page format for the currently selected -output channel. Default is name of the filehandle with _TOP +output channel. Default is the name of the filehandle with _TOP appended. (Mnemonic: points to top of page.) =item format_line_break_characters HANDLE EXPR @@ -464,16 +463,16 @@ poetry is a part of a line.) =item $^L -What formats output to perform a form feed. Default is \f. +What formats output as a form feed. Default is \f. =item $ACCUMULATOR =item $^A The current value of the write() accumulator for format() lines. A format -contains formline() commands that put their result into C<$^A>. After +contains formline() calls that put their result into C<$^A>. After calling its format, write() prints out the contents of C<$^A> and empties. -So you never actually see the contents of C<$^A> unless you call +So you never really see the contents of C<$^A> unless you call formline() yourself and then look at it. See L<perlform> and L<perlfunc/formline()>. @@ -482,21 +481,27 @@ L<perlfunc/formline()>. =item $? The status returned by the last pipe close, backtick (C<``>) command, -or system() operator. Note that this is the status word returned by the -wait() system call (or else is made up to look like it). Thus, the exit -value of the subprocess is actually (C<$? E<gt>E<gt> 8>), and C<$? & 127> -gives which signal, if any, the process died from, and C<$? & 128> reports -whether there was a core dump. (Mnemonic: similar to B<sh> and B<ksh>.) +successful call to wait() or waitpid(), or from the system() +operator. This is just the 16-bit status word returned by the +wait() system call (or else is made up to look like it). Thus, the +exit value of the subprocess is really (C<$? E<gt>E<gt> 8>), and +C<$? & 127> gives which signal, if any, the process died from, and +C<$? & 128> reports whether there was a core dump. (Mnemonic: +similar to B<sh> and B<ksh>.) Additionally, if the C<h_errno> variable is supported in C, its value -is returned via $? if any of the C<gethost*()> functions fail. +is returned via $? if any C<gethost*()> function fails. -Note that if you have installed a signal handler for C<SIGCHLD>, the +If you have installed a signal handler for C<SIGCHLD>, the value of C<$?> will usually be wrong outside that handler. Inside an C<END> subroutine C<$?> contains the value that is going to be given to C<exit()>. You can modify C<$?> in an C<END> subroutine to -change the exit status of the script. +change the exit status of your program. For example: + + END { + $? = 1 if $? == 255; # die would make it 255 + } Under VMS, the pragma C<use vmsish 'status'> makes C<$?> reflect the actual VMS exit status, instead of the default emulation of POSIX @@ -510,14 +515,15 @@ Also see L<Error Indicators>. =item $! -If used in a numeric context, yields the current value of errno, with -all the usual caveats. (This means that you shouldn't depend on the -value of C<$!> to be anything in particular unless you've gotten a -specific error return indicating a system error.) If used in a string -context, yields the corresponding system error string. You can assign -to C<$!> to set I<errno> if, for instance, you want C<"$!"> to return the -string for error I<n>, or you want to set the exit value for the die() -operator. (Mnemonic: What just went bang?) +If used numerically, yields the current value of the C C<errno> +variable, with all the usual caveats. (This means that you shouldn't +depend on the value of C<$!> to be anything in particular unless +you've gotten a specific error return indicating a system error.) +If used an a string, yields the corresponding system error string. +You can assign a number to C<$!> to set I<errno> if, for instance, +you want C<"$!"> to return the string for error I<n>, or you want +to set the exit value for the die() operator. (Mnemonic: What just +went bang?) Also see L<Error Indicators>. @@ -541,7 +547,7 @@ OS/2 API either via CRT, or directly from perl. Under Win32, C<$^E> always returns the last error information reported by the Win32 call C<GetLastError()> which describes the last error from within the Win32 API. Most Win32-specific -code will report errors via C<$^E>. ANSI C and UNIX-like calls +code will report errors via C<$^E>. ANSI C and Unix-like calls set C<errno> and so most portable Perl code will report errors via C<$!>. @@ -554,12 +560,12 @@ Also see L<Error Indicators>. =item $@ -The Perl syntax error message from the last eval() command. If null, the +The Perl syntax error message from the last eval() operator. If null, the last eval() parsed and executed correctly (although the operations you invoked may have failed in the normal fashion). (Mnemonic: Where was the syntax error "at"?) -Note that warning messages are not collected in this variable. You can, +Warning messages are not collected in this variable. You can, however, set up a routine to process warnings by setting C<$SIG{__WARN__}> as described below. @@ -571,8 +577,9 @@ Also see L<Error Indicators>. =item $$ -The process number of the Perl running this script. (Mnemonic: same -as shells.) +The process number of the Perl running this script. You should +consider this variable read-only, although it will be altered +across fork() calls. (Mnemonic: same as shells.) =item $REAL_USER_ID @@ -580,7 +587,7 @@ as shells.) =item $< -The real uid of this process. (Mnemonic: it's the uid you came I<FROM>, +The real uid of this process. (Mnemonic: it's the uid you came I<from>, if you're running setuid.) =item $EFFECTIVE_USER_ID @@ -594,8 +601,8 @@ The effective uid of this process. Example: $< = $>; # set real to effective uid ($<,$>) = ($>,$<); # swap real and effective uid -(Mnemonic: it's the uid you went I<TO>, if you're running setuid.) -Note: "C<$E<lt>>" and "C<$E<gt>>" can be swapped only on machines +(Mnemonic: it's the uid you went I<to>, if you're running setuid.) +C<$E<lt>> and C<$E<gt>> can be swapped only on machines supporting setreuid(). =item $REAL_GROUP_ID @@ -610,12 +617,12 @@ list of groups you are in. The first number is the one returned by getgid(), and the subsequent ones by getgroups(), one of which may be the same as the first number. -However, a value assigned to "C<$(>" must be a single number used to -set the real gid. So the value given by "C<$(>" should I<not> be assigned -back to "C<$(>" without being forced numeric, such as by adding zero. +However, a value assigned to C<$(> must be a single number used to +set the real gid. So the value given by C<$(> should I<not> be assigned +back to C<$(> without being forced numeric, such as by adding zero. -(Mnemonic: parentheses are used to I<GROUP> things. The real gid is the -group you I<LEFT>, if you're running setgid.) +(Mnemonic: parentheses are used to I<group> things. The real gid is the +group you I<left>, if you're running setgid.) =item $EFFECTIVE_GROUP_ID @@ -629,42 +636,41 @@ separated list of groups you are in. The first number is the one returned by getegid(), and the subsequent ones by getgroups(), one of which may be the same as the first number. -Similarly, a value assigned to "C<$)>" must also be a space-separated -list of numbers. The first number is used to set the effective gid, and +Similarly, a value assigned to C<$)> must also be a space-separated +list of numbers. The first number sets the effective gid, and the rest (if any) are passed to setgroups(). To get the effect of an empty list for setgroups(), just repeat the new effective gid; that is, to force an effective gid of 5 and an effectively empty setgroups() list, say C< $) = "5 5" >. -(Mnemonic: parentheses are used to I<GROUP> things. The effective gid -is the group that's I<RIGHT> for you, if you're running setgid.) +(Mnemonic: parentheses are used to I<group> things. The effective gid +is the group that's I<right> for you, if you're running setgid.) -Note: "C<$E<lt>>", "C<$E<gt>>", "C<$(>" and "C<$)>" can be set only on -machines that support the corresponding I<set[re][ug]id()> routine. "C<$(>" -and "C<$)>" can be swapped only on machines supporting setregid(). +C<$E<lt>>, C<$E<gt>>, C<$(> and C<$)> can be set only on +machines that support the corresponding I<set[re][ug]id()> routine. C<$(> +and C<$)> can be swapped only on machines supporting setregid(). =item $PROGRAM_NAME =item $0 -Contains the name of the file containing the Perl script being -executed. On some operating systems -assigning to "C<$0>" modifies the argument area that the ps(1) -program sees. This is more useful as a way of indicating the -current program state than it is for hiding the program you're running. +Contains the name of the program being executed. On some operating +systems assigning to C<$0> modifies the argument area that the B<ps> +program sees. This is more useful as a way of indicating the current +program state than it is for hiding the program you're running. (Mnemonic: same as B<sh> and B<ksh>.) =item $[ The index of the first element in an array, and of the first character -in a substring. Default is 0, but you could set it to 1 to make -Perl behave more like B<awk> (or Fortran) when subscripting and when -evaluating the index() and substr() functions. (Mnemonic: [ begins -subscripts.) +in a substring. Default is 0, but you could theoretically set it +to 1 to make Perl behave more like B<awk> (or Fortran) when +subscripting and when evaluating the index() and substr() functions. +(Mnemonic: [ begins subscripts.) -As of Perl 5, assignment to "C<$[>" is treated as a compiler directive, -and cannot influence the behavior of any other file. Its use is -discouraged. +As of release 5 of Perl, assignment to C<$[> is treated as a compiler +directive, and cannot influence the behavior of any other file. +Its use is highly discouraged. =item $PERL_VERSION @@ -678,16 +684,17 @@ of perl in the right bracket?) Example: warn "No checksumming!\n" if $] < 3.019; See also the documentation of C<use VERSION> and C<require VERSION> -for a convenient way to fail if the Perl interpreter is too old. +for a convenient way to fail if the running Perl interpreter is too old. =item $COMPILING =item $^C -The current value of the flag associated with the B<-c> switch. Mainly -of use with B<-MO=...> to allow code to alter its behaviour when being compiled. -(For example to automatically AUTOLOADing at compile time rather than normal -deferred loading.) Setting C<$^C = 1> is similar to calling C<B::minus_c>. +The current value of the flag associated with the B<-c> switch. +Mainly of use with B<-MO=...> to allow code to alter its behavior +when being compiled, such as for example to AUTOLOAD at compile +time rather than normal, deferred loading. See L<perlcc>. Setting +C<$^C = 1> is similar to calling C<B::minus_c>. =item $DEBUGGING @@ -704,7 +711,7 @@ The maximum system file descriptor, ordinarily 2. System file descriptors are passed to exec()ed processes, while higher file descriptors are not. Also, during an open(), system file descriptors are preserved even if the open() fails. (Ordinary file descriptors are -closed before the open() is attempted.) Note that the close-on-exec +closed before the open() is attempted.) The close-on-exec status of a file descriptor will be decided according to the value of C<$^F> when the open() or pipe() was called, not the time of the exec(). @@ -722,17 +729,18 @@ inplace editing. (Mnemonic: value of B<-i> switch.) =item $^M -By default, running out of memory it is not trappable. However, if -compiled for this, Perl may use the contents of C<$^M> as an emergency -pool after die()ing with this message. Suppose that your Perl were -compiled with -DPERL_EMERGENCY_SBRK and used Perl's malloc. Then +By default, running out of memory is an untrappable, fatal error. +However, if suitably built, Perl can use the contents of C<$^M> +as an emergency memory pool after die()ing. Suppose that your Perl +were compiled with -DPERL_EMERGENCY_SBRK and used Perl's malloc. +Then - $^M = 'a' x (1<<16); + $^M = 'a' x (1 << 16); -would allocate a 64K buffer for use when in emergency. See the F<INSTALL> -file for information on how to enable this option. As a disincentive to -casual use of this advanced feature, there is no L<English> long name for -this variable. +would allocate a 64K buffer for use when in emergency. See the +F<INSTALL> file in the Perl distribution for information on how to +enable this option. To discourage casual use of this advanced +feature, there is no L<English> long name for this variable. =item $OSNAME @@ -740,14 +748,15 @@ this variable. The name of the operating system under which this copy of Perl was built, as determined during the configuration process. The value -is identical to C<$Config{'osname'}>. +is identical to C<$Config{'osname'}>. See also L<Config> and the +B<-V> command-line switch documented in L<perlrun>. =item $PERLDB =item $^P -The internal variable for debugging support. Different bits mean the -following (subject to change): +The internal variable for debugging support. The meanings of the +various bits are subject to change, but currently indicate: =over 6 @@ -777,42 +786,42 @@ Start with single-step on. =back -Note that some bits may be relevant at compile-time only, some at -run-time only. This is a new mechanism and the details may change. +Some bits may be relevant at compile-time only, some at +run-time only. This is a new mechanism and the details may change. =item $^R -The result of evaluation of the last successful L<perlre/C<(?{ code })>> -regular expression assertion. (Excluding those used as switches.) May -be written to. +The result of evaluation of the last successful C<(?{ code })> +regular expression assertion (see L<perlre>). May be written to. =item $^S Current state of the interpreter. Undefined if parsing of the current module/eval is not finished (may happen in $SIG{__DIE__} and -$SIG{__WARN__} handlers). True if inside an eval, otherwise false. +$SIG{__WARN__} handlers). True if inside an eval(), otherwise false. =item $BASETIME =item $^T -The time at which the script began running, in seconds since the +The time at which the program began running, in seconds since the epoch (beginning of 1970). The values returned by the B<-M>, B<-A>, -and B<-C> filetests are -based on this value. +and B<-C> filetests are based on this value. =item $WARNING =item $^W -The current value of the warning switch, either TRUE or FALSE. -(Mnemonic: related to the B<-w> switch.) +The current value of the warning switch, initially true if B<-w> +was used, false otherwise, but directly modifiable. (Mnemonic: +related to the B<-w> switch.) See also L<warning>. =item $EXECUTABLE_NAME =item $^X The name that the Perl binary itself was executed as, from C's C<argv[0]>. +This may not be a full pathname, nor even necessarily in your path. =item $ARGV @@ -820,20 +829,21 @@ contains the name of the current file when reading from E<lt>E<gt>. =item @ARGV -The array @ARGV contains the command line arguments intended for the -script. Note that C<$#ARGV> is the generally number of arguments minus -one, because C<$ARGV[0]> is the first argument, I<NOT> the command name. See -"C<$0>" for the command name. +The array @ARGV contains the command-line arguments intended for +the script. C<$#ARGV> is generally the number of arguments minus +one, because C<$ARGV[0]> is the first argument, I<not> the program's +command name itself. See C<$0> for the command name. =item @INC -The array @INC contains the list of places to look for Perl scripts to -be evaluated by the C<do EXPR>, C<require>, or C<use> constructs. It -initially consists of the arguments to any B<-I> command line switches, -followed by the default Perl library, probably F</usr/local/lib/perl>, -followed by ".", to represent the current directory. If you need to -modify this at runtime, you should use the C<use lib> pragma -to get the machine-dependent library properly loaded also: +The array @INC contains the list of places that the C<do EXPR>, +C<require>, or C<use> constructs look for their library files. It +initially consists of the arguments to any B<-I> command-line +switches, followed by the default Perl library, probably +F</usr/local/lib/perl>, followed by ".", to represent the current +directory. If you need to modify this at runtime, you should use +the C<use lib> pragma to get the machine-dependent library properly +loaded also: use lib '/mypath/libdir/'; use SomeMod; @@ -841,29 +851,30 @@ to get the machine-dependent library properly loaded also: =item @_ Within a subroutine the array @_ contains the parameters passed to that -subroutine. See L<perlsub>. +subroutine. See L<perlsub>. =item %INC -The hash %INC contains entries for each filename that has -been included via C<do> or C<require>. The key is the filename you -specified, and the value is the location of the file actually found. -The C<require> command uses this array to determine whether a given file -has already been included. +The hash %INC contains entries for each filename included via the +C<do>, C<require>, or C<use> operators. The key is the filename +you specified (with module names converted to pathnames), and the +value is the location of the file found. The C<require> +operator uses this array to determine whether a particular file has +already been included. =item %ENV =item $ENV{expr} The hash %ENV contains your current environment. Setting a -value in C<ENV> changes the environment for child processes. +value in C<ENV> changes the environment for any child processes +you subsequently fork() off. =item %SIG =item $SIG{expr} -The hash %SIG is used to set signal handlers for various -signals. Example: +The hash %SIG contains signal handlers for signals. For example: sub handler { # 1st argument is signal name my($sig) = @_; @@ -875,30 +886,27 @@ signals. Example: $SIG{'INT'} = \&handler; $SIG{'QUIT'} = \&handler; ... - $SIG{'INT'} = 'DEFAULT'; # restore default action + $SIG{'INT'} = 'DEFAULT'; # restore default action $SIG{'QUIT'} = 'IGNORE'; # ignore SIGQUIT Using a value of C<'IGNORE'> usually has the effect of ignoring the signal, except for the C<CHLD> signal. See L<perlipc> for more about this special case. -The %SIG array contains values for only the signals actually set within -the Perl script. Here are some other examples: +Here are some other examples: - $SIG{"PIPE"} = Plumber; # SCARY!! $SIG{"PIPE"} = "Plumber"; # assumes main::Plumber (not recommended) $SIG{"PIPE"} = \&Plumber; # just fine; assume current Plumber + $SIG{"PIPE"} = *Plumber; # somewhat esoteric $SIG{"PIPE"} = Plumber(); # oops, what did Plumber() return?? -The one marked scary is problematic because it's a bareword, which means -sometimes it's a string representing the function, and sometimes it's -going to call the subroutine call right then and there! Best to be sure -and quote it or take a reference to it. *Plumber works too. See L<perlsub>. +Be sure not to use a bareword as the name of a signal handler, +lest you inadvertently call it. If your system has the sigaction() function then signal handlers are installed using it. This means you get reliable signal handling. If your system has the SA_RESTART flag it is used when signals handlers are -installed. This means that system calls for which it is supported +installed. This means that system calls for which restarting is supported continue rather than returning when a signal arrives. If you want your system calls to be interrupted by signal delivery then do something like this: @@ -929,16 +937,20 @@ unless the hook routine itself exits via a C<goto>, a loop exit, or a die(). The C<__DIE__> handler is explicitly disabled during the call, so that you can die from a C<__DIE__> handler. Similarly for C<__WARN__>. -Note that the C<$SIG{__DIE__}> hook is called even inside eval()ed -blocks/strings. See L<perlfunc/die> and L<perlvar/$^S> for how to -circumvent this. - -Note that C<__DIE__>/C<__WARN__> handlers are very special in one -respect: they may be called to report (probable) errors found by the -parser. In such a case the parser may be in inconsistent state, so -any attempt to evaluate Perl code from such a handler will probably -result in a segfault. This means that calls which result/may-result -in parsing Perl should be used with extreme caution, like this: +Due to an implementation glitch, the C<$SIG{__DIE__}> hook is called +even inside an eval(). Do not use this to rewrite a pending exception +in C<$@>, or as a bizarre substitute for overriding CORE::GLOBAL::die(). +This strange action at a distance may be fixed in a future release +so that C<$SIG{__DIE__}> is only called if your program is about +to exit, as was the original intent. Any other use is deprecated. + +C<__DIE__>/C<__WARN__> handlers are very special in one respect: +they may be called to report (probable) errors found by the parser. +In such a case the parser may be in inconsistent state, so any +attempt to evaluate Perl code from such a handler will probably +result in a segfault. This means that warnings or errors that +result from parsing Perl should be used with extreme caution, like +this: require Carp if defined $^S; Carp::confess("Something wrong") if defined &Carp::confess; @@ -950,94 +962,94 @@ called the handler. The second line will print backtrace and die if Carp was available. The third line will be executed only if Carp was not available. -See L<perlfunc/die>, L<perlfunc/warn> and L<perlfunc/eval> for -additional info. +See L<perlfunc/die>, L<perlfunc/warn>, L<perlfunc/eval>, and +L<warning> for additional information. =back =head2 Error Indicators -The variables L<$@>, L<$!>, L<$^E>, and L<$?> contain information about -different types of error conditions that may appear during execution of -Perl script. The variables are shown ordered by the "distance" between -the subsystem which reported the error and the Perl process, and -correspond to errors detected by the Perl interpreter, C library, -operating system, or an external program, respectively. +The variables C<$@>, C<$!>, C<$^E>, and C<$?> contain information +about different types of error conditions that may appear during +execution of a Perl program. The variables are shown ordered by +the "distance" between the subsystem which reported the error and +the Perl process. They correspond to errors detected by the Perl +interpreter, C library, operating system, or an external program, +respectively. To illustrate the differences between these variables, consider the -following Perl expression: +following Perl expression, which uses a single-quoted string: - eval ' - open PIPE, "/cdrom/install |"; - @res = <PIPE>; - close PIPE or die "bad pipe: $?, $!"; - '; + eval q{ + open PIPE, "/cdrom/install |"; + @res = <PIPE>; + close PIPE or die "bad pipe: $?, $!"; + }; After execution of this statement all 4 variables may have been set. -$@ is set if the string to be C<eval>-ed did not compile (this may happen if -C<open> or C<close> were imported with bad prototypes), or if Perl -code executed during evaluation die()d (either implicitly, say, -if C<open> was imported from module L<Fatal>, or the C<die> after -C<close> was triggered). In these cases the value of $@ is the compile -error, or C<Fatal> error (which will interpolate C<$!>!), or the argument -to C<die> (which will interpolate C<$!> and C<$?>!). - -When the above expression is executed, open(), C<<PIPEE<gt>>, and C<close> -are translated to C run-time library calls. $! is set if one of these -calls fails. The value is a symbolic indicator chosen by the C run-time -library, say C<No such file or directory>. - -On some systems the above C library calls are further translated -to calls to the kernel. The kernel may have set more verbose error -indicator that one of the handful of standard C errors. In such cases $^E -contains this verbose error indicator, which may be, say, C<CDROM tray not -closed>. On systems where C library calls are identical to system calls -$^E is a duplicate of $!. - -Finally, $? may be set to non-C<0> value if the external program -C</cdrom/install> fails. Upper bits of the particular value may reflect -specific error conditions encountered by this program (this is -program-dependent), lower-bits reflect mode of failure (segfault, completion, -etc.). Note that in contrast to $@, $!, and $^E, which are set only -if error condition is detected, the variable $? is set on each C<wait> or -pipe C<close>, overwriting the old value. - -For more details, see the individual descriptions at L<$@>, L<$!>, L<$^E>, -and L<$?>. +C<$@> is set if the string to be C<eval>-ed did not compile (this +may happen if C<open> or C<close> were imported with bad prototypes), +or if Perl code executed during evaluation die()d . In these cases +the value of $@ is the compile error, or the argument to C<die> +(which will interpolate C<$!> and C<$?>!). (See also L<Fatal>, +though.) + +When the eval() expression above is executed, open(), C<<PIPEE<gt>>, +and C<close> are translated to calls in the C run-time library and +thence to the operating system kernel. C<$!> is set to the C library's +C<errno> if one of these calls fails. + +Under a few operating systems, C<$^E> may contain a more verbose +error indicator, such as in this case, "CDROM tray not closed." +Systems that do not support extended error messages leave C<$^E> +the same as C<$!>. + +Finally, C<$?> may be set to non-0 value if the external program +F</cdrom/install> fails. The upper eight bits reflect specific +error conditions encountered by the program (the program's exit() +value). The lower eight bits reflect mode of failure, like signal +death and core dump information See wait(2) for details. In +contrast to C<$!> and C<$^E>, which are set only if error condition +is detected, the variable C<$?> is set on each C<wait> or pipe +C<close>, overwriting the old value. This is more like C<$@>, which +on every eval() is always set on failure and cleared on success. +For more details, see the individual descriptions at C<$@>, C<$!>, C<$^E>, +and C<$?>. =head2 Technical Note on the Syntax of Variable Names -Variable names in Perl can have several formats. Usually, they must -begin with a letter or underscore, in which case they can be -arbitrarily long (up to an internal limit of 256 characters) and may -contain letters, digits, underscores, or the special sequence C<::>. -In this case the part before the last C<::> is taken to be a I<package -qualifier>; see L<perlmod>. +Variable names in Perl can have several formats. Usually, they +must begin with a letter or underscore, in which case they can be +arbitrarily long (up to an internal limit of 251 characters) and +may contain letters, digits, underscores, or the special sequence +C<::> or C<'>. In this case, the part before the last C<::> or +C<'> is taken to be a I<package qualifier>; see L<perlmod>. Perl variable names may also be a sequence of digits or a single punctuation or control character. These names are all reserved for -special uses by Perl; for example, the all-digits names are used to -hold backreferences after a regulare expression match. Perl has a -special syntax for the single-control-character names: It understands -C<^X> (caret C<X>) to mean the control-C<X> character. For example, -the notation C<$^W> (dollar-sign caret C<W>) is the scalar variable -whose name is the single character control-C<W>. This is better than -typing a literal control-C<W> into your program. +special uses by Perl; for example, the all-digits names are used +to hold data captured by backreferences after a regular expression +match. Perl has a special syntax for the single-control-character +names: It understands C<^X> (caret C<X>) to mean the control-C<X> +character. For example, the notation C<$^W> (dollar-sign caret +C<W>) is the scalar variable whose name is the single character +control-C<W>. This is better than typing a literal control-C<W> +into your program. Finally, new in Perl 5.006, Perl variable names may be alphanumeric -strings that begin with control characters. These variables must be -written in the form C<${^Foo}>; the braces are not optional. -C<${^Foo}> denotes the scalar variable whose name is a control-C<F> -followed by two C<o>'s. These variables are reserved for future -special uses by Perl, except for the ones that begin with C<^_> -(control-underscore). No control-character name that begins with -C<^_> will acquire a special meaning in any future version of Perl; -such names may therefore be used safely in programs. C<^_> itself, -however, I<is> reserved. - -All Perl variables that begin with digits, control characters, or +strings that begin with control characters (or better yet, a caret). +These variables must be written in the form C<${^Foo}>; the braces +are not optional. C<${^Foo}> denotes the scalar variable whose +name is a control-C<F> followed by two C<o>'s. These variables are +reserved for future special uses by Perl, except for the ones that +begin with C<^_> (control-underscore or caret-underscore). No +control-character name that begins with C<^_> will acquire a special +meaning in any future version of Perl; such names may therefore be +used safely in programs. C<$^_> itself, however, I<is> reserved. + +Perl identifiers that begin with digits, control characters, or punctuation characters are exempt from the effects of the C<package> declaration and are always forced to be in package C<main>. A few other names are also exempt: @@ -1049,7 +1061,21 @@ other names are also exempt: SIG In particular, the new special C<${^_XYZ}> variables are always taken -to be in package C<main> regardless of any C<package> declarations +to be in package C<main>, regardless of any C<package> declarations presently in scope. +=head1 BUGS + +Due to an unfortunate accident of Perl's implementation, C<use +English> imposes a considerable performance penalty on all regular +expression matches in a program, regardless of whether they occur +in the scope of C<use English>. For that reason, saying C<use +English> in libraries is strongly discouraged. See the +Devel::SawAmpersand module documentation from CPAN +(http://www.perl.com/CPAN/modules/by-module/Devel/Devel-SawAmpersand-0.10.readme) +for more information. +Having to even think about the C<$^S> variable in your exception +handlers is simply wrong. C<$SIG{__DIE__}> as currently implemented +invites grievous and difficult to track down errors. Avoid it +and use an C<END{}> or CORE::GLOBAL::die override instead. diff --git a/pod/perlxs.pod b/pod/perlxs.pod index 98a983422f..ee582e0a55 100644 --- a/pod/perlxs.pod +++ b/pod/perlxs.pod @@ -367,8 +367,8 @@ The following code demonstrates how to supply initialization code for function parameters. The initialization code is eval'd within double quotes by the compiler before it is added to the output so anything which should be interpreted literally [mainly C<$>, C<@>, or C<\\>] -must be protected with backslashes. The variables C<$var>, C<$arg>, -and C<$type> can be used as in typemaps. +must be protected with backslashes. The variables $var, $arg, +and $type can be used as in typemaps. bool_t rpcb_gettime(host,timep) diff --git a/pod/pod2man.PL b/pod/pod2man.PL index a673ea127d..20610a84c3 100644 --- a/pod/pod2man.PL +++ b/pod/pod2man.PL @@ -785,7 +785,7 @@ while (<>) { } {I<$1>\\|$2}gx; # convert simple variable references - s/(\s+)([\$\@%][\w:]+)(?!\()/${1}C<$2>/g; + s/(\s+)([\$\@%&*][\w:]+)(?!\()/${1}C<$2>/g; if (m{ ( [\-\w]+ |