summaryrefslogtreecommitdiff
path: root/pod
diff options
context:
space:
mode:
authorTom Christiansen <tchrist@perl.com>1998-06-13 16:19:32 -0600
committerGurusamy Sarathy <gsar@cpan.org>1998-06-15 01:37:12 +0000
commit5a964f204835a8014f4ba86fc91884cff958ac67 (patch)
treeb1ad7153799ba133ce772012c9dc05ea615f1c6e /pod
parentad973f306c11e119dc3a8448590409962bde25db (diff)
downloadperl-5a964f204835a8014f4ba86fc91884cff958ac67.tar.gz
documentation update from tchrist
Message-Id: <199806140419.WAA20549@chthon.perl.com> Subject: doc patches p4raw-id: //depot/perl@1132
Diffstat (limited to 'pod')
-rw-r--r--pod/perl.pod4
-rw-r--r--pod/perlbook.pod25
-rw-r--r--pod/perldata.pod97
-rw-r--r--pod/perldsc.pod8
-rw-r--r--pod/perlfaq.pod2
-rw-r--r--pod/perlfaq1.pod27
-rw-r--r--pod/perlfaq2.pod211
-rw-r--r--pod/perlfaq3.pod111
-rw-r--r--pod/perlfaq4.pod361
-rw-r--r--pod/perlfaq5.pod479
-rw-r--r--pod/perlfaq6.pod39
-rw-r--r--pod/perlfaq7.pod17
-rw-r--r--pod/perlfaq8.pod17
-rw-r--r--pod/perlfaq9.pod40
-rw-r--r--pod/perlform.pod20
-rw-r--r--pod/perlfunc.pod716
-rw-r--r--pod/perlipc.pod251
-rw-r--r--pod/perllocale.pod431
-rw-r--r--pod/perllol.pod16
-rw-r--r--pod/perlmod.pod145
-rw-r--r--pod/perlmodlib.pod30
-rw-r--r--pod/perlobj.pod140
-rw-r--r--pod/perlop.pod335
-rw-r--r--pod/perlre.pod190
-rw-r--r--pod/perlref.pod180
-rw-r--r--pod/perlrun.pod5
-rw-r--r--pod/perlsec.pod20
-rw-r--r--pod/perlsub.pod175
-rw-r--r--pod/perlsyn.pod124
-rw-r--r--pod/perltie.pod2
-rw-r--r--pod/perltoot.pod38
-rw-r--r--pod/perlvar.pod26
32 files changed, 2842 insertions, 1440 deletions
diff --git a/pod/perl.pod b/pod/perl.pod
index a7e02f600e..4c9808feae 100644
--- a/pod/perl.pod
+++ b/pod/perl.pod
@@ -265,7 +265,9 @@ Perl developers, please write to <F<perl-thanks@perl.org>>.
The B<-w> switch produces some lovely diagnostics.
-See L<perldiag> for explanations of all Perl's diagnostics.
+See L<perldiag> for explanations of all Perl's diagnostics. The C<use
+diagnostics> pragma automatically turns Perl's normally terse warnings
+and errors into these longer forms.
Compilation errors will tell you the line number of the error, with an
indication of the next token or token type that was to be examined.
diff --git a/pod/perlbook.pod b/pod/perlbook.pod
index f5bf99dbba..0ff2cd5a05 100644
--- a/pod/perlbook.pod
+++ b/pod/perlbook.pod
@@ -11,15 +11,17 @@ web-connected, you can even mosey on over to http://www.ora.com/ for
an online order form.
I<Programming Perl, Second Edition> is a reference work that covers
-nearly all of Perl, while I<Learning Perl, Second Edition> is a
-tutorial that covers the most frequently used subset of the language,
-and I<Advanced Perl Programming> is an indepth study of complex topics
-including the internals of perl. You might also check out the very
-handy, inexpensive, and compact I<Perl 5 Desktop Reference>, especially
-when the thought of lugging the 676-page Camel around doesn't make much
-sense. I<Mastering Regular Expressions>, by Jeffrey Friedl, is a
-reference work that covers the art and implementation of regular
-expressions in various languages including Perl.
+nearly all of Perl; I<Learning Perl, Second Edition> is a tutorial that
+covers the most frequently used subset of the language; and I<Advanced
+Perl Programming> is an in-depth study of complex topics including the
+internals of perl. You might also check out the very handy, inexpensive,
+and compact I<Perl 5 Desktop Reference>, especially when the thought of
+lugging the 676-page Camel around doesn't make much sense. I<Mastering
+Regular Expressions>, by Jeffrey Friedl, is a reference work that covers
+the art and implementation of regular expressions in various languages
+including Perl. Currently published quarterly by Jon Orwant, I<The Perl
+Journal> is the first and only periodical devoted to All Things Perl.
+See http://www.tpj.com/ for information.
Programming Perl, Second Edition (the Camel Book):
ISBN 1-56592-149-6 (English)
@@ -27,7 +29,10 @@ expressions in various languages including Perl.
Learning Perl, Second Edition (the Llama Book):
ISBN 1-56592-284-0 (English)
- Advanced Perl Programming:
+ Learning Perl on Win32 Systems (the Gecko Book):
+ ISBN 1-56592-324-3 (English)
+
+ Advanced Perl Programming (the Panther Book):
ISBN 1-56592-220-4 (English)
Perl 5 Desktop Reference (the reference card):
diff --git a/pod/perldata.pod b/pod/perldata.pod
index dc2975a7d4..58c11234b4 100644
--- a/pod/perldata.pod
+++ b/pod/perldata.pod
@@ -22,15 +22,15 @@ that's deprecated); all but the last are interpreted as names of
packages, to locate the namespace in which to look
up the final identifier (see L<perlmod/Packages> for details).
It's possible to substitute for a simple identifier an expression
-which produces a reference to the value at runtime; this is
+that produces a reference to the value at runtime; this is
described in more detail below, and in L<perlref>.
There are also special variables whose names don't follow these
rules, so that they don't accidentally collide with one of your
-normal variables. Strings which match parenthesized parts of a
+normal variables. Strings that match parenthesized parts of a
regular expression are saved under names containing only digits after
the C<$> (see L<perlop> and L<perlre>). In addition, several special
-variables which provide windows into the inner working of Perl have names
+variables that provide windows into the inner working of Perl have names
containing punctuation characters (see L<perlvar>).
Scalar values are always named with '$', even when referring to a scalar
@@ -81,7 +81,7 @@ that returns a reference to an object of that type. For a description
of this, see L<perlref>.
Names that start with a digit may contain only more digits. Names
-which do not start with a letter, underscore, or digit are limited to
+that do not start with a letter, underscore, or digit are limited to
one character, e.g., C<$%> or C<$$>. (Most of these one character names
have a predefined significance to Perl. For instance, C<$$> is the
current process id.)
@@ -171,13 +171,15 @@ numbers count as 0, just as they do in B<awk>:
That's usually preferable because otherwise you won't treat IEEE notations
like C<NaN> or C<Infinity> properly. At other times you might prefer to
-use a regular expression to check whether data is numeric. See L<perlre>
-for details on regular expressions.
+use the POSIX::strtod function or a regular expression to check whether
+data is numeric. See L<perlre> for details on regular expressions.
warn "has nondigits" if /\D/;
- warn "not a whole number" unless /^\d+$/;
- warn "not an integer" unless /^[+-]?\d+$/
- warn "not a decimal number" unless /^[+-]?\d+\.?\d*$/
+ warn "not a natural number" unless /^\d+$/; # rejects -3
+ warn "not an integer" unless /^-?\d+$/; # rejects +3
+ warn "not an integer" unless /^[+-]?\d+$/;
+ warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2
+ warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/;
warn "not a C float"
unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
@@ -189,7 +191,7 @@ length of the array. Shortening an array by this method destroys
intervening values. Lengthening an array that was previously shortened
I<NO LONGER> recovers the values that were in those elements. (It used to
in Perl 4, but we had to break this to make sure destructors were
-called when expected.) You can also gain some measure of efficiency by
+called when expected.) You can also gain some miniscule measure of efficiency by
pre-extending an array that is going to get big. (You can also extend
an array by assigning to an element that is off the end of the array.)
You can truncate an array down to nothing by assigning the null list ()
@@ -200,7 +202,8 @@ to it. The following are equivalent:
If you evaluate a named array in a scalar context, it returns the length of
the array. (Note that this is not true of lists, which return the
-last value, like the C comma operator.) The following is always true:
+last value, like the C comma operator, nor of built-in functions, which return
+whatever they feel like returning.) The following is always true:
scalar(@whatever) == $#whatever - $[ + 1;
@@ -216,7 +219,7 @@ left to doubt:
$element_count = scalar(@whatever);
-If you evaluate a hash in a scalar context, it returns a value which is
+If you evaluate a hash in a scalar context, it returns a value that is
true if and only if the hash contains any key/value pairs. (If there
are any key/value pairs, the value returned is a string consisting of
the number of used buckets and the number of allocated buckets, separated
@@ -227,6 +230,11 @@ scalar context reveals "1/16", which means only one out of sixteen buckets
has been touched, and presumably contains all 10,000 of your items. This
isn't supposed to happen.)
+You can preallocate space for a hash by assigning to the keys() function.
+This rounds up the allocated bucked to the next power of two:
+
+ keys(%users) = 1000; # allocate 1024 buckets
+
=head2 Scalar value constructors
Numeric literals are specified in any of the customary floating point or
@@ -288,7 +296,7 @@ Three special literals are __FILE__, __LINE__, and __PACKAGE__, which
represent the current filename, line number, and package name at that
point in your program. They may be used only as separate tokens; they
will not be interpolated into strings. If there is no current package
-(due to a C<package;> directive), __PACKAGE__ is the undefined value.
+(due to an empty C<package;> directive), __PACKAGE__ is the undefined value.
The tokens __END__ and __DATA__ may be used to indicate the logical end
of the script before the actual end of file. Any following text is
@@ -421,14 +429,14 @@ list literal, so that you can say:
LISTs do automatic interpolation of sublists. That is, when a LIST is
evaluated, each element of the list is evaluated in a list context, and
the resulting list value is interpolated into LIST just as if each
-individual element were a member of LIST. Thus arrays lose their
+individual element were a member of LIST. Thus arrays and hashes lose their
identity in a LIST--the list
- (@foo,@bar,&SomeSub)
+ (@foo,@bar,&SomeSub,%glarch)
contains all the elements of @foo followed by all the elements of @bar,
-followed by all the elements returned by the subroutine named SomeSub when
-it's called in a list context.
+followed by all the elements returned by the subroutine named SomeSub
+called in a list context, followed by the key/value pairs of %glarch.
To make a list reference that does I<NOT> interpolate, see L<perlref>.
The null list is represented by (). Interpolating it in a list
@@ -476,7 +484,7 @@ which when assigned produces a 0, which is interpreted as FALSE.
The final element may be an array or a hash:
($a, $b, @rest) = split;
- local($a, $b, %rest) = @_;
+ my($a, $b, %rest) = @_;
You can actually put an array or hash anywhere in the list, but the first one
in the list will soak up all the values, and anything after it will get
@@ -498,7 +506,7 @@ key/value pairs. That's why it's good to use references sometimes.
It is often more readable to use the C<=E<gt>> operator between key/value
pairs. The C<=E<gt>> operator is mostly just a more visually distinctive
synonym for a comma, but it also arranges for its left-hand operand to be
-interpreted as a string, if it's a bareword which would be a legal identifier.
+interpreted as a string--if it's a bareword that would be a legal identifier.
This makes it nice for initializing hashes:
%map = (
@@ -535,13 +543,28 @@ Perl uses an internal type called a I<typeglob> to hold an entire
symbol table entry. The type prefix of a typeglob is a C<*>, because
it represents all types. This used to be the preferred way to
pass arrays and hashes by reference into a function, but now that
-we have real references, this is seldom needed. It also used to be the
-preferred way to pass filehandles into a function, but now
-that we have the *foo{THING} notation it isn't often needed for that,
-either. It is still needed to pass new filehandles into functions
-(*HANDLE{IO} only works if HANDLE has already been used).
+we have real references, this is seldom needed.
+
+The main use of typeglobs in modern Perl is create symbol table aliases.
+This assignment:
+
+ *this = *that;
+
+makes $this an alias for $that, @this an alias for @that, %this an alias
+for %that, &this an alias for &that, etc. Much safer is to use a reference.
+This:
-If you need to use a typeglob to save away a filehandle, do it this way:
+ local *Here::blue = \$There::green;
+
+temporarily makes $Here::blue an alias for $There::green, but doesn't
+make @Here::blue an alias for @There::green, or %Here::blue an alias for
+%There::green, etc. See L<perlmod/"Symbol Tables"> for more examples
+of this. Strange though this may seem, this is the basis for the whole
+module import/export system.
+
+Another use for typeglobs is to to pass filehandles into a function or
+to create new filehandles. If you need to use a typeglob to save away
+a filehandle, do it this way:
$fh = *STDOUT;
@@ -549,18 +572,32 @@ or perhaps as a real reference, like this:
$fh = \*STDOUT;
-This is also a way to create a local filehandle. For example:
+See L<perlsub> for examples of using these as indirect filehandles
+in functions.
+
+Typeglobs are also a way to create a local filehandle using the local()
+operator. These last until their block is exited, but may be passed back.
+For example:
sub newopen {
my $path = shift;
local *FH; # not my!
- open (FH, $path) || return undef;
+ open (FH, $path) or return undef;
return *FH;
}
$fh = newopen('/etc/passwd');
-Another way to create local filehandles is with IO::Handle and its ilk,
-see the bottom of L<perlfunc/open()>.
+Now that we have the *foo{THING} notation, typeglobs aren't used as much
+for filehandle manipulations, although they're still needed to pass brand
+new file and directory handles into or out of functions. That's because
+*HANDLE{IO} only works if HANDLE has already been used as a handle.
+In other words, *FH can be used to create new symbol table entries,
+but *foo{THING} cannot.
+
+Another way to create anonymous filehandles is with the IO::Handle
+module and its ilk. These modules have the advantage of not hiding
+different types of the same name during the local(). See the bottom of
+L<perlfunc/open()> for an example.
See L<perlref>, L<perlsub>, and L<perlmod/"Symbol Tables"> for more
-discussion on typeglobs.
+discussion on typeglobs and the *foo{THING} syntax.
diff --git a/pod/perldsc.pod b/pod/perldsc.pod
index cd689e37bc..d0cc335736 100644
--- a/pod/perldsc.pod
+++ b/pod/perldsc.pod
@@ -64,8 +64,8 @@ sections on each of the following:
=back
-But for now, let's look at some of the general issues common to all
-of these types of data structures.
+But for now, let's look at general issues common to all
+these types of data structures.
=head1 REFERENCES
@@ -461,7 +461,7 @@ types of data structures.
$a cmp $b
} keys %HoL )
{
- print "$family: ", join(", ", sort @{ $HoL{$family}), "\n";
+ print "$family: ", join(", ", sort @{ $HoL{$family} }), "\n";
}
=head1 LISTS OF HASHES
@@ -614,7 +614,7 @@ types of data structures.
# append new members to an existing family
%new_folks = (
wife => "wilma",
- pet => "dino";
+ pet => "dino",
);
for $what (keys %new_folks) {
diff --git a/pod/perlfaq.pod b/pod/perlfaq.pod
index 2213a0f2f0..370d9cb267 100644
--- a/pod/perlfaq.pod
+++ b/pod/perlfaq.pod
@@ -141,7 +141,7 @@ and added mass commenting to L<perlfaq7>. Added Net::Telnet, fixed
backticks, added reader/writer pair to telnet question, added FindBin,
grouped module questions together in L<perlfaq8>. Expanded caveats
for the simple URL extractor, gave LWP example, added CGI security
-question, expanded on the email address answer in L<perlfaq9>.
+question, expanded on the mail address answer in L<perlfaq9>.
=item 25/March/97
diff --git a/pod/perlfaq1.pod b/pod/perlfaq1.pod
index a9a5fd4858..34caab8c29 100644
--- a/pod/perlfaq1.pod
+++ b/pod/perlfaq1.pod
@@ -77,6 +77,8 @@ To avoid the "what language is perl5?" confusion, some people prefer to
simply use "perl" to refer to the latest version of perl and avoid using
"perl5" altogether. It's not really that big a deal, though.
+See L<perlhist> for a history of Perl revisions.
+
=head2 How stable is Perl?
Production releases, which incorporate bug fixes and new functionality,
@@ -92,10 +94,10 @@ and the rare new keyword).
=head2 Is Perl difficult to learn?
-Perl is easy to start learning -- and easy to keep learning. It looks
-like most programming languages you're likely to have had experience
+No, Perl is easy to start learning -- and easy to keep learning. It looks
+like most programming languages you're likely to have experience
with, so if you've ever written an C program, an awk script, a shell
-script, or even an Excel macro, you're already part way there.
+script, or even BASIC program, you're already part way there.
Most tasks only require a small subset of the Perl language. One of
the guiding mottos for Perl development is "there's more than one way
@@ -220,7 +222,7 @@ sometimes helpful to point out that delivery times may be reduced
using Perl, as compared to other languages.
If you have a project which has a bottleneck, especially in terms of
-translation, or testing, Perl almost certainly will provide a viable,
+translation or testing, Perl almost certainly will provide a viable,
and quick solution. In conjunction with any persuasion effort, you
should not fail to point out that Perl is used, quite extensively, and
with extremely reliable and valuable results, at many large computer
@@ -245,5 +247,18 @@ behind). Several important bugs were fixed from the 5.000 through
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
-All rights reserved. See L<perlfaq> for distribution information.
+Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
+
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
diff --git a/pod/perlfaq2.pod b/pod/perlfaq2.pod
index 0f73eea978..4d873c7561 100644
--- a/pod/perlfaq2.pod
+++ b/pod/perlfaq2.pod
@@ -19,7 +19,7 @@ as well as Windows NT, Plan 9, VMS, QNX, OS/2, and the Amiga.
Binary distributions for various platforms can be found
http://www.perl.com/CPAN/ports/ directory. Some of these ports (especially
-the ones that are not part of the standard sources) may behave differently
+the ones not part of the standard sources) may behave differently
than what is documented in the standard source documentation. These
differences can be either positive (e.g. extensions for the features of the
particular platform that are not supported in the source release of perl)
@@ -27,7 +27,7 @@ or negative (e.g. might be based upon a less current source release of perl).
A useful FAQ for Win32 Perl users is:
http://www.endcontsw.com/people/evangelo/Perl_for_Win32_FAQ.html
-[This FAQ is seriously outdated as of Jan 1998--it is only relevant to
+[This FAQ is seriously outdated as of May 1998--it is only relevant to
the perl that ActiveState distributes, especially where it describes
various inadequacies and differences with the standard perl extension
build support.]
@@ -119,13 +119,13 @@ Certainly not. Larry expects that he'll be certified before Perl is.
=head2 Where can I get information on Perl?
-The complete Perl documentation is available with the perl
-distribution. If you have perl installed locally, you probably have
-the documentation installed as well: type C<man perl> if you're on a
-system resembling Unix. This will lead you to other important man
-pages. If you're not on a Unix system, access to the documentation
-will be different; for example, it might be only in HTML format. But
-all proper perl installations have fully-accessible documentation.
+The complete Perl documentation is available with the perl distribution.
+If you have perl installed locally, you probably have the documentation
+installed as well: type C<man perl> if you're on a system resembling Unix.
+This will lead you to other important man pages, including how to set your
+$MANPATH. If you're not on a Unix system, access to the documentation
+will be different; for example, it might be only in HTML format. But all
+proper perl installations have fully-accessible documentation.
You might also try C<perldoc perl> in case your system doesn't
have a proper man command, or it's been misinstalled. If that doesn't
@@ -136,10 +136,6 @@ complete documentation in various formats, including native pod,
troff, html, and plain text. There's also a web page at
http://www.perl.com/perl/info/documentation.html that might help.
-It's also worth noting that there's a PDF version of the complete
-documentation for perl available in the CPAN/authors/id/BMIDD
-directory.
-
Many good books have been written about Perl -- see the section below
for more details.
@@ -150,14 +146,18 @@ following groups:
comp.lang.perl.announce Moderated announcement group
comp.lang.perl.misc Very busy group about Perl in general
+ comp.lang.perl.moderated Moderated discussion group
comp.lang.perl.modules Use and development of Perl modules
comp.lang.perl.tk Using Tk (and X) from Perl
comp.infosystems.www.authoring.cgi Writing CGI scripts for the Web.
+Actually, the moderated group hasn't passed yet, but we're
+keeping our fingers crossed.
+
There is also USENET gateway to the mailing list used by the crack
Perl development team (perl5-porters) at
-news://genetics.upenn.edu/perl.porters-gw/ .
+news://news.perl.com/perl.porters-gw/ .
=head2 Where should I post source code?
@@ -167,6 +167,10 @@ cross-post to alt.sources, please make sure it follows their posting
standards, including setting the Followup-To header line to NOT
include alt.sources; see their FAQ for details.
+If you're just looking for software, first use Alta Vista, Deja News, and
+search CPAN. This is faster and more productive than just posting
+a request.
+
=head2 Perl Books
A number of books on Perl and/or CGI programming are available. A few of
@@ -182,71 +186,104 @@ fourth printing.
Authors: Larry Wall, Tom Christiansen, and Randal Schwartz
ISBN 1-56592-149-6 (English)
ISBN 4-89052-384-7 (Japanese)
- (French and German translations in progress)
+ (French, German, and Italian translations also available)
Note that O'Reilly books are color-coded: turquoise (some would call
it teal) covers indicate perl5 coverage, while magenta (some would
call it pink) covers indicate perl4 only. Check the cover color
before you buy!
+If you're already a hard-core systems programmer, then the Camel Book
+might suffice for you to learn Perl from. But if you're not, check
+out I<Learning Perl> by Randal and Tom. The second edition of "Llama
+Book" has a blue cover, and is updated for the 5.004 release of Perl.
+
+If you're not an accidental programmer, but a more serious and possibly
+even degreed computer scientist who doesn't need as much hand-holding as
+we try to provide in the Llama or its defurred cousin the Gecko, please
+check out the delightful book, I<Perl: The Programmer's Companion>,
+written by Nigel Chapman.
+
+You can order O'Reilly books diretly from O'Reilly & Associates,
+1-800-998-9938. Local/overseas is 1-707-829-0515. If you can
+locate an O'Reilly order form, you can also fax to 1-707-829-0104.
+See http://www.ora.com/ on the Web.
+
What follows is a list of the books that the FAQ authors found personally
useful. Your mileage may (but, we hope, probably won't) vary.
-If you're already a hard-core systems programmer, then the Camel Book
-just might suffice for you to learn Perl from. But if you're not,
-check out the "Llama Book". It currently doesn't cover perl5, but the
-2nd edition is nearly done and should be out by summer 97:
+Recommended books on (or muchly on) Perl are the following.
+Those marked with a star may be ordered from O'Reilly.
- Learning Perl (the Llama Book):
- Author: Randal Schwartz, with intro by Larry Wall
- ISBN 1-56592-042-2 (English)
- ISBN 4-89502-678-1 (Japanese)
- ISBN 2-84177-005-2 (French)
- ISBN 3-930673-08-8 (German)
+=over
-Another stand-out book in the turquoise O'Reilly Perl line is the "Hip
-Owls" book. It covers regular expressions inside and out, with quite a
-bit devoted exclusively to Perl:
+=item References
- Mastering Regular Expressions (the Cute Owls Book):
- Author: Jeffrey Friedl
- ISBN 1-56592-257-3
+ *Programming Perl
+ by Larry Wall, Tom Christiansen, and Randal L. Schwartz
-You can order any of these books from O'Reilly & Associates,
-1-800-998-9938. Local/overseas is 1-707-829-0515. If you can locate
-an O'Reilly order form, you can also fax to 1-707-829-0104. See
-http://www.ora.com/ on the Web.
+ *Perl 5 Desktop Reference
+ By Johan Vromans
-Recommended Perl books that are not from O'Reilly are the following:
+=item Tutorials
+
+ *Learning Perl [2nd edition]
+ by Randal L. Schwartz and Tom Christiansen
- Cross-Platform Perl, (for Unix and Windows NT)
- Author: Eric F. Johnson
- ISBN: 1-55851-483-X
+ *Learning Perl on Win32 Systems
+ by Randal L. Schwartz, Erik Olson, and Tom Christiansen,
+ with foreword by Larry Wall
- How to Set up and Maintain a World Wide Web Site, (2nd edition)
- Author: Lincoln Stein, M.D., Ph.D.
- ISBN: 0-201-63462-7
+ Perl: The Programmer's Companion
+ by Nigel Chapman
- CGI Programming in C & Perl,
- Author: Thomas Boutell
- ISBN: 0-201-42219-0
+ Cross-Platform Perl
+ by Eric F. Johnson
-Note that some of these address specific application areas (e.g. the
-Web) and are not general-purpose programming books.
+ MacPerl: Power and Ease
+ by Vicki Brown and Chris Nandor, foreword by Matthias Neeracher
-=head2 Perl in Magazines
+=item Task-Oriented
+
+ *The Perl Cookbook
+ by Tom Christiansen and Nathan Torkington
+ with foreword by Larry Wall
+
+ Perl5 Interactive Course [2nd edition]
+ by Jon Orwant
+
+ *Advanced Perl Programming
+ by Sriram Srinivasan
-The Perl Journal is the first and only magazine dedicated to Perl.
-It is published (on paper, not online) quarterly by Jon Orwant
-(orwant@tpj.com), editor. Subscription information is at http://tpj.com
-or via email to subscriptions@tpj.com.
+ Effective Perl Programming
+ by Joseph Hall
-Beyond this, two other magazines that frequently carry high-quality
-articles on Perl are Web Techniques (see
-http://www.webtechniques.com/) and Unix Review
-(http://www.unixreview.com/). Randal Schwartz's Web Technique's
-columns are available on the web at
-http://www.stonehenge.com/merlyn/WebTechniques/ .
+=item Special Topics
+
+ *Mastering Regular Expressions
+ by Jeffrey Friedl
+
+ How to Set up and Maintain a World Wide Web Site [2nd edition]
+ by Lincoln Stein
+
+=back
+
+=head2 Perl in Magazines
+
+The first and only periodical devoted to All Things Perl, I<The
+Perl Journal> contains tutorials, demonstrations, case studies,
+announcements, contests, and much more. TPJ has columns on web
+development, databases, Win32 Perl, graphical programming, regular
+expressions, and networking, and sponsors the Obfuscated Perl Contest.
+It is published quarterly by Jon Orwant. See http://www.tpj.com/ or
+send mail to subscriptions@tpj.com.
+
+Beyond this, magazines that frequently carry high-quality articles
+on Perl are I<Web Techniques> (see http://www.webtechniques.com/),
+I<Performance Computing> at www.performance-computing.com, and Usenix's
+newsletter/magazine to its members, I<login:>, at http://www.usenix.org/.
+Randal's Web Technique's columns are available on the web at
+http://www.stonehenge.com/merlyn/WebTechniques/.
=head2 Perl on the Net: FTP and WWW Access
@@ -297,13 +334,13 @@ for information on subscribing.
=item NTPerl
This list is used to discuss issues involving Win32 Perl 5 (Windows NT
-and Win95). Subscribe by emailing ListManager@ActiveWare.com with the
+and Win95). Subscribe by mailing ListManager@ActiveWare.com with the
message body:
subscribe Perl-Win32-Users
The list software, also written in perl, will automatically determine
-your address, and subscribe you automatically. To unsubscribe, email
+your address, and subscribe you automatically. To unsubscribe, mail
the following in the message body to the same address like so:
unsubscribe Perl-Win32-Users
@@ -314,7 +351,7 @@ to join or leave this list.
=item Perl-Packrats
Discussion related to archiving of perl materials, particularly the
-Comprehensive Perl Archive Network (CPAN). Subscribe by emailing
+Comprehensive Perl Archive Network (CPAN). Subscribe by mailing
majordomo@cis.ufl.edu:
subscribe perl-packrats
@@ -348,13 +385,14 @@ let perlfaq-suggestions@perl.com know.
=head2 Perl Training
-While some large training companies offer their own courses on Perl,
-you may prefer to contact individuals near and dear to the heart of
-Perl development. Two well-known members of the Perl development team
-who offer such things are Tom Christiansen <perl-classes@perl.com>
-and Randal Schwartz <perl-training-info@stonehenge.com>, plus their
-respective minions, who offer a variety of professional tutorials
-and seminars on Perl. These courses include large public seminars,
+While some large training companies offer their own courses on
+Perl, you may prefer to contact individuals near and dear to the
+heart of Perl development. Two well-known members of the Perl
+development team head companies which offer such things are
+Tom Christiansen <perl-classes@perl.com> and Randal Schwartz
+<perl-training-info@stonehenge.com>, plus their respective
+minions, who offer a variety of professional tutorials and
+seminars on Perl. These courses include large public seminars,
private corporate training, and fly-ins to Colorado and Oregon.
See http://www.perl.com/perl/info/training.html for more details.
@@ -406,8 +444,8 @@ For more information, contact the The Perl Clinic:
=head2 Where do I send bug reports?
If you are reporting a bug in the perl interpreter or the modules
-shipped with perl, use the perlbug program in the perl distribution or
-email your report to perlbug@perl.com.
+shipped with perl, use the I<perlbug> program in the perl distribution or
+mail your report to perlbug@perl.com.
If you are posting a bug with a non-standard port (see the answer to
"What platforms is Perl available for?"), a binary distribution, or a
@@ -415,10 +453,18 @@ non-standard module (such as Tk, CGI, etc), then please see the
documentation that came with it to determine the correct place to post
bugs.
-Read the perlbug man page (perl5.004 or later) for more information.
+Read the perlbug(1) man page (perl5.004 or later) for more information.
=head2 What is perl.com? perl.org? The Perl Institute?
+The perl.com domain is Tom Christiansen's domain. He created it as a
+public service long before perl.org came about. Despite the name, it's a
+pretty non-commercial site meant to be a clearinghouse for information
+about all things Perlian, accepting no paid advertisements, bouncy
+happy gifs, or silly java applets on its pages. The Perl Home Page at
+http://www.perl.com/ is currently hosted on a T3 line courtesy of Songline
+Systems, a software-oriented subsidiary of O'Reilly and Associates.
+
perl.org is the official vehicle for The Perl Institute. The motto of
TPI is "helping people help Perl help people" (or something like
that). It's a non-profit organization supporting development,
@@ -426,12 +472,6 @@ documentation, and dissemination of perl. Current directors of TPI
include Larry Wall, Tom Christiansen, and Randal Schwartz, whom you
may have heard of somewhere else around here.
-The perl.com domain is Tom Christiansen's domain. He created it as a
-public service long before perl.org came about. It's the original PBS
-of the Perl world, a clearinghouse for information about all things
-Perlian, accepting no paid advertisements, glossy gifs, or (gasp!)
-java applets on its pages.
-
=head2 How do I learn about object-oriented Perl programming?
L<perltoot> (distributed with 5.004 or later) is a good place to start.
@@ -440,5 +480,18 @@ while L<perlbot> has some excellent tips and tricks.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
-All rights reserved. See L<perlfaq> for distribution information.
+Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
+
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
diff --git a/pod/perlfaq3.pod b/pod/perlfaq3.pod
index 7a307594da..31e669fd5d 100644
--- a/pod/perlfaq3.pod
+++ b/pod/perlfaq3.pod
@@ -13,10 +13,13 @@ Have you looked at CPAN (see L<perlfaq2>)? The chances are that
someone has already written a module that can solve your problem.
Have you read the appropriate man pages? Here's a brief index:
+ Basics perldata, perlvar, perlsyn, perlop, perlsub
+ Execution perlrun, perldebug
+ Functions perlfunc
Objects perlref, perlmod, perlobj, perltie
Data Structures perlref, perllol, perldsc
Modules perlmod, perlmodlib, perlsub
- Regexps perlre, perlfunc, perlop
+ Regexps perlre, perlfunc, perlop, perllocale
Moving to perl5 perltrap, perl
Linking w/C perlxstut, perlxs, perlcall, perlguts, perlembed
Various http://www.perl.com/CPAN/doc/FMTEYEWTK/index.html
@@ -65,8 +68,8 @@ breakdowns of where your code spends its time.
=head2 How do I cross-reference my Perl programs?
The B::Xref module, shipped with the new, alpha-release Perl compiler
-(not the general distribution), can be used to generate
-cross-reference reports for Perl programs.
+(not the general distribution prior to the 5.005 release), can be used
+to generate cross-reference reports for Perl programs.
perl -MO=Xref[,OPTIONS] foo.pl
@@ -99,10 +102,10 @@ the trick.
=head2 Where can I get Perl macros for vi?
For a complete version of Tom Christiansen's vi configuration file,
-see ftp://ftp.perl.com/pub/vi/toms.exrc, the standard benchmark file
-for vi emulators. This runs best with nvi, the current version of vi
-out of Berkeley, which incidentally can be built with an embedded Perl
-interpreter -- see http://www.perl.com/CPAN/src/misc .
+see http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/toms.exrc,
+the standard benchmark file for vi emulators. This runs best with nvi,
+the current version of vi out of Berkeley, which incidentally can be built
+with an embedded Perl interpreter -- see http://www.perl.com/CPAN/src/misc.
=head2 Where can I get perl-mode for emacs?
@@ -121,25 +124,23 @@ should be using "main::foo", anyway.
=head2 How can I use curses with Perl?
The Curses module from CPAN provides a dynamically loadable object
-module interface to a curses library.
+module interface to a curses library. A small demo can be found at the
+directory http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/rep;
+this program repeats a command and updates the screen as needed, rendering
+B<rep ps axu> similar to B<top>.
=head2 How can I use X or Tk with Perl?
-Tk is a completely Perl-based, object-oriented interface to the Tk
-toolkit that doesn't force you to use Tcl just to get at Tk. Sx is an
-interface to the Athena Widget set. Both are available from CPAN.
+Tk is a completely Perl-based, object-oriented interface to the Tk toolkit
+that doesn't force you to use Tcl just to get at Tk. Sx is an interface
+to the Athena Widget set. Both are available from CPAN. See the
+directory http://www.perl.com/CPAN/modules/by-category/08_User_Interfaces/
=head2 How can I generate simple menus without using CGI or Tk?
The http://www.perl.com/CPAN/authors/id/SKUNZ/perlmenu.v4.0.tar.gz
module, which is curses-based, can help with this.
-=head2 Can I dynamically load C routines into Perl?
-
-If your system architecture supports it, then the standard perl
-on your system should also provide you with this via the
-DynaLoader module. Read L<perlxstut> for details.
-
=head2 What is undump?
See the next questions.
@@ -225,9 +226,9 @@ No, Perl's garbage collection system takes care of this.
=head2 How can I free an array or hash so my program shrinks?
-You can't. Memory the system allocates to a program will never be
-returned to the system. That's why long-running programs sometimes
-re-exec themselves.
+You can't. Memory the system allocates to a program will in practice
+never be returned to the system. That's why long-running programs
+sometimes re-exec themselves.
However, judicious use of my() on your variables will help make sure
that they go out of scope so that Perl can free up their storage for
@@ -266,6 +267,8 @@ Both of these solutions can have far-reaching effects on your system
and on the way you write your CGI scripts, so investigate them with
care.
+See http://www.perl.com/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/.
+
=head2 How can I hide the source for my Perl program?
Delete it. :-) Seriously, there are a number of (mostly
@@ -319,9 +322,10 @@ save little more than compilation time, leaving execution no more than
(like several times faster), but this takes some tweaking of your
code.
-Malcolm will be in charge of the 5.005 release of Perl itself
-to try to unify and merge his compiler and multithreading work into
-the main release.
+The 5.005 release of Perl itself, whose main goal is merging the various
+non-Unix ports back into the one Perl source, will also have preliminary
+(strictly beta) support for Malcolm's compiler and his light-weight
+processes (sometimes called "threads").
You'll probably be astonished to learn that the current version of the
compiler generates a compiled form of your script whose executable is
@@ -334,6 +338,16 @@ you link your main perl binary with this, it will make it miniscule.
For example, on one author's system, /usr/bin/perl is only 11k in
size!
+In general, the compiler will do nothing to make a Perl program smaller,
+faster, more portable, or more secure. In fact, it will usually hurt
+all of those. The executable will be bigger, your VM system may take
+longer to load the whole thing, the binary is fragile and hard to fix,
+and compilation never stopped software piracy in the form of crackers,
+viruses, or bootleggers. The real advantage of the compiler is merely
+packaging, and once you see the size of what it makes (well, unless
+you use a shared I<libperl.so>), you'll probably want a complete
+Perl install anywayt.
+
=head2 How can I get '#!perl' to work on [MS-DOS,NT,...]?
For OS/2 just use
@@ -365,12 +379,12 @@ Yes. Read L<perlrun> for more information. Some examples follow.
(These assume standard Unix shell quoting rules.)
# sum first and last fields
- perl -lane 'print $F[0] + $F[-1]'
+ perl -lane 'print $F[0] + $F[-1]' *
# identify text files
perl -le 'for(@ARGV) {print if -f && -T _}' *
- # remove comments from C program
+ # remove (most) comments from C program
perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
# make file a month younger than today, defeating reaper daemons
@@ -433,21 +447,28 @@ books. For problems and questions related to the web, like "Why
do I get 500 Errors" or "Why doesn't it run from the browser right
when it runs fine on the command line", see these sources:
- The Idiot's Guide to Solving Perl/CGI Problems, by Tom Christiansen
- http://www.perl.com/perl/faq/idiots-guide.html
+ WWW Security FAQ
+ http://www.w3.org/Security/Faq/
- Frequently Asked Questions about CGI Programming, by Nick Kew
- ftp://rtfm.mit.edu/pub/usenet/news.answers/www/cgi-faq
- http://www3.pair.com/webthing/docs/cgi/faqs/cgifaq.shtml
+ Web FAQ
+ http://www.boutell.com/faq/
- Perl/CGI programming FAQ, by Shishir Gundavaram and Tom Christiansen
- http://www.perl.com/perl/faq/perl-cgi-faq.html
+ CGI FAQ
+ http://www.webthing.com/page.cgi/cgifaq
- The WWW Security FAQ, by Lincoln Stein
- http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html
+ HTTP Spec
+ http://www.w3.org/pub/WWW/Protocols/HTTP/
+
+ HTML Spec
+ http://www.w3.org/TR/REC-html40/
+ http://www.w3.org/pub/WWW/MarkUp/
+
+ CGI Spec
+ http://www.w3.org/CGI/
+
+ CGI Security FAQ
+ http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt
- World Wide Web FAQ, by Thomas Boutell
- http://www.boutell.com/faq/
=head2 Where can I learn about object-oriented Perl programming?
@@ -499,6 +520,18 @@ information, see L<ExtUtils::MakeMaker>.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
-All rights reserved. See L<perlfaq> for distribution information.
-
+Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
+
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod
index 4c38d906ba..8c57db3df0 100644
--- a/pod/perlfaq4.pod
+++ b/pod/perlfaq4.pod
@@ -12,6 +12,10 @@ data issues.
=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
+The infinite set that a mathematician thinks of as the real numbers can
+only be approximate on a computer, since the computer only has a finite
+number of bits to store an infinite number of, um, numbers.
+
Internally, your computer represents floating-point numbers in binary.
Floating-point numbers read in from a file, or appearing as literals
in your program, are converted from their decimal floating-point
@@ -37,6 +41,7 @@ are consequently slower.
To get rid of the superfluous digits, just use a format (eg,
C<printf("%.2f", 19.95)>) to get the required precision.
+See L<perlop/"Floating-point Arithmetic">.
=head2 Why isn't my octal data interpreted correctly?
@@ -54,11 +59,10 @@ umask(), or sysopen(), which all want permissions in octal.
chmod(644, $file); # WRONG -- perl -w catches this
chmod(0644, $file); # right
-=head2 Does perl have a round function? What about ceil() and floor()?
-Trig functions?
+=head2 Does perl have a round function? What about ceil() and floor()? Trig functions?
-For rounding to a certain number of digits, sprintf() or printf() is
-usually the easiest route.
+Remember that int() merely truncates toward 0. For rounding to a certain
+number of digits, sprintf() or printf() is usually the easiest route.
The POSIX module (part of the standard perl distribution) implements
ceil(), floor(), and a number of other mathematical and trigonometric
@@ -131,11 +135,13 @@ Get the http://www.perl.com/CPAN/modules/by-module/Roman module.
=head2 Why aren't my random numbers random?
+John von Neumann said, ``Anyone who attempts to generate random numbers by
+deterministic means is, of course, living in a state of sin.''
+
The short explanation is that you're getting pseudorandom numbers, not
-random ones, because that's how these things work. A longer
-explanation is available on
-http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy of Tom
-Phoenix.
+random ones, because that's how these things work. A longer explanation
+is available on http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy
+of Tom Phoenix.
You should also check out the Math::TrulyRandom module from CPAN.
@@ -177,27 +183,33 @@ Instead, there is an example of Julian date calculation in
http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz,
which should help.
-=head2 Does Perl have a year 2000 problem?
+=head2 Does Perl have a year 2000 problem? Is Perl Y2K compliant?
-Not unless you use Perl to create one. The date and time functions
-supplied with perl (gmtime and localtime) supply adequate information
-to determine the year well beyond 2000 (2038 is when trouble strikes).
-The year returned by these functions when used in an array context is
-the year minus 1900. For years between 1910 and 1999 this I<happens>
-to be a 2-digit decimal number. To avoid the year 2000 problem simply
-do not treat the year as a 2-digit number. It isn't.
+Perl is just as Y2K compliant as your pencil--no more, and no less.
+The date and time functions supplied with perl (gmtime and localtime)
+supply adequate information to determine the year well beyond 2000 (2038
+is when trouble strikes). The year returned by these functions when used
+in an array context is the year minus 1900. For years between 1910 and
+1999 this I<happens> to be a 2-digit decimal number. To avoid the year
+2000 problem simply do not treat the year as a 2-digit number. It isn't.
-When gmtime() and localtime() are used in a scalar context they return
+When gmtime() and localtime() are used in scalar context they return
a timestamp string that contains a fully-expanded year. For example,
C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
2001". There's no year 2000 problem here.
+That doesn't mean that Perl can't be used to create non-Y2K compliant
+programs. It can. But so can your pencil. It's the fault of the user,
+not the language. At the risk of inflaming the NRA: ``Perl doesn't
+break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
+a longer exposition.
+
=head1 Data: Strings
=head2 How do I validate input?
The answer to this question is usually a regular expression, perhaps
-with auxiliary logic. See the more specific questions (numbers, email
+with auxiliary logic. See the more specific questions (numbers, mail
addresses, etc.) for details.
=head2 How do I unescape a string?
@@ -220,7 +232,7 @@ To turn "abbcccd" into "abccd":
This is documented in L<perlref>. In general, this is fraught with
quoting and readability problems, but it is possible. To interpolate
-a subroutine call (in a list context) into a string:
+a subroutine call (in list context) into a string:
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
@@ -241,16 +253,23 @@ multiple ones, then something more like C</alpha(.*?)omega/> would
be needed. But none of these deals with nested patterns, nor can they.
For that you'll have to write a parser.
+One destructive, inside-out approach that you might try is to pull
+out the smallest nesting parts one at a time:
+
+ while (s/BEGIN(.*?)END//gs) {
+ # do something with $1
+ }
+
=head2 How do I reverse a string?
-Use reverse() in a scalar context, as documented in
+Use reverse() in scalar context, as documented in
L<perlfunc/reverse>.
$reversed = reverse $string;
=head2 How do I expand tabs in a string?
-You can do it the old-fashioned way:
+You can do it yourself:
1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
@@ -299,6 +318,23 @@ into "whosoever" or "whomsoever", case insensitively.
: $1 # renege and leave it there
}igex;
+In the more general case, you can use the C</g> modifier in a C<while>
+loop, keeping count of matches.
+
+ $WANT = 3;
+ $count = 0;
+ while (/(\w+)\s+fish\b/gi) {
+ if (++$count == $WANT) {
+ print "The third fish is a $1 one.\n";
+ # Warning: don't `last' out of this loop
+ }
+ }
+
+That prints out: "The third fish is a red one." You can also use a
+repetition count and repeated pattern like this:
+
+ /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
+
=head2 How can I count the number of occurrences of a substring within a string?
There are a number of ways, with varying efficiency: If you want a
@@ -345,6 +381,10 @@ To force each word to be lower case, with the first letter upper case:
$line =~ s/(\w+)/\u\L$1/g;
+You can (and probably should) enable locale awareness of those
+characters by placing a C<use locale> pragma in your program.
+See L<perllocale> for endless details.
+
=head2 How can I split a [character] delimited string except when inside
[character]? (Comma-separated files)
@@ -382,11 +422,12 @@ distribution) lets you say:
=head2 How do I strip blank space from the beginning/end of a string?
-The simplest approach, albeit not the fastest, is probably like this:
+Although the simplest approach would seem to be:
$string =~ s/^\s*(.*?)\s*$/$1/;
-It would be faster to do this in two steps:
+This is unneccesarily slow, destructive, and fails with embedded newlines.
+It is much better faster to do this in two steps:
$string =~ s/^\s+//;
$string =~ s/\s+$//;
@@ -398,9 +439,39 @@ Or more nicely written as:
s/\s+$//;
}
+This idiom takes advantage of the for(each) loop's aliasing
+behavior to factor out common code. You can do this
+on several strings at once, or arrays, or even the
+values of a hash if you use a slide:
+
+ # trim whitespace in the scalar, the array,
+ # and all the values in the hash
+ foreach ($scalar, @array, @hash{keys %hash}) {
+ s/^\s+//;
+ s/\s+$//;
+ }
+
=head2 How do I extract selected columns from a string?
Use substr() or unpack(), both documented in L<perlfunc>.
+If you prefer thinking in terms of columns instead of widths,
+you can use this kind of thing:
+
+ # determine the unpack format needed to split Linux ps output
+ # arguments are cut columns
+ my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
+
+ sub cut2fmt {
+ my(@positions) = @_;
+ my $template = '';
+ my $lastpos = 1;
+ for my $place (@positions) {
+ $template .= "A" . ($place - $lastpos) . " ";
+ $lastpos = $place;
+ }
+ $template .= "A*";
+ return $template;
+ }
=head2 How do I find the soundex value of a string?
@@ -411,15 +482,26 @@ Use the standard Text::Soundex module distributed with perl.
Let's assume that you have a string like:
$text = 'this has a $foo in it and a $bar';
+
+If those were both global variables, then this would
+suffice:
+
$text =~ s/\$(\w+)/${$1}/g;
-Before version 5 of perl, this had to be done with a double-eval
-substitution:
+But since they are probably lexicals, or at least, they could
+be, you'd have to do this:
$text =~ s/(\$\w+)/$1/eeg;
+ die if $@; # needed on /ee, not /e
-Which is bizarre enough that you'll probably actually need an EEG
-afterwards. :-)
+It's probably better in the general case to treat those
+variables as entries in some special hash. For example:
+
+ %user_defs = (
+ foo => 23,
+ bar => 19,
+ );
+ $text =~ s/\$(\w+)/$user_defs{$1}/g;
See also "How do I expand function calls in a string?" in this section
of the FAQ.
@@ -458,6 +540,12 @@ that actually do care about the difference between a string and a
number, such as the magical C<++> autoincrement operator or the
syscall() function.
+Stringification also destroys arrays.
+
+ @lines = `command`;
+ print "@lines"; # WRONG - extra blanks
+ print @lines; # right
+
=head2 Why don't my <<HERE documents work?
Check for these three things:
@@ -472,6 +560,72 @@ Check for these three things:
=back
+If you want to indent the text in the here document, you
+can do this:
+
+ # all in one
+ ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
+ your text
+ goes here
+ HERE_TARGET
+
+But the HERE_TARGET must still be flush against the margin.
+If you want that indented also, you'll have to quote
+in the indentation.
+
+ ($quote = <<' FINIS') =~ s/^\s+//gm;
+ ...we will have peace, when you and all your works have
+ perished--and the works of your dark master to whom you
+ would deliver us. You are a liar, Saruman, and a corrupter
+ of men's hearts. --Theoden in /usr/src/perl/taint.c
+ FINIS
+ $quote =~ s/\s*--/\n--/;
+
+A nice general-purpose fixer-upper function for indented here documents
+follows. It expects to be called with a here document as its argument.
+It looks to see whether each line begins with a common substring, and
+if so, strips that off. Otherwise, it takes the amount of leading
+white space found on the first line and removes that much off each
+subsequent line.
+
+ sub fix {
+ local $_ = shift;
+ my ($white, $leader); # common white space and common leading string
+ if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
+ ($white, $leader) = ($2, quotemeta($1));
+ } else {
+ ($white, $leader) = (/^(\s+)/, '');
+ }
+ s/^\s*?$leader(?:$white)?//gm;
+ return $_;
+ }
+
+This owrks with leading special strings, dynamically determined:
+
+ $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
+ @@@ int
+ @@@ runops() {
+ @@@ SAVEI32(runlevel);
+ @@@ runlevel++;
+ @@@ while ( op = (*op->op_ppaddr)() ) ;
+ @@@ TAINT_NOT;
+ @@@ return 0;
+ @@@ }
+ MAIN_INTERPRETER_LOOP
+
+Or with a fixed amount of leading white space, with remaining
+indentation correctly preserved:
+
+ $poem = fix<<EVER_ON_AND_ON;
+ Now far ahead the Road has gone,
+ And I must follow, if I can,
+ Pursuing it with eager feet,
+ Until it joins some larger way
+ Where many paths and errands meet.
+ And whither then? I cannot say.
+ --Bilbo in /usr/src/perl/pp_ctl.c
+ EVER_ON_AND_ON
+
=head1 Data: Arrays
=head2 What is the difference between $array[1] and @array[1]?
@@ -500,6 +654,7 @@ ordered and whether you wish to preserve the ordering.
=over 4
=item a) If @in is sorted, and you want @out to be sorted:
+(this assumes all true values in the array)
$prev = 'nonesuch';
@out = grep($_ ne $prev && ($prev = $_), @in);
@@ -531,11 +686,15 @@ duplicates.
=back
-=head2 How can I tell whether an array contains a certain element?
+=head2 How can I tell whether a list or array contains a certain element?
+
+Hearing the word "in" is an I<in>dication that you probably should have
+used a hash, not a list or array, to store your data. Hashes are
+designed to answer this question quickly and efficiently. Arrays aren't.
-There are several ways to approach this. If you are going to make
-this query many times and the values are arbitrary strings, the
-fastest way is probably to invert the original array and keep an
+That being said, there are several ways to approach this. If you
+are going to make this query many times over arbitrary string values,
+the fastest way is probably to invert the original array and keep an
associative array lying about whose keys are the first array's values.
@blues = qw/azure cerulean teal turquoise lapis-lazuli/;
@@ -605,8 +764,11 @@ Now C<$found_index> has what you want.
In general, you usually don't need a linked list in Perl, since with
regular arrays, you can push and pop or shift and unshift at either end,
-or you can use splice to add and/or remove arbitrary number of elements
-at arbitrary points.
+or you can use splice to add and/or remove arbitrary number of elements at
+arbitrary points. Both pop and shift are both O(1) operations on perl's
+dynamic arrays. In the absence of shifts and pops, push in general
+needs to reallocate on the order every log(N) times, and unshift will
+need to copy pointers each time.
If you really, really wanted, you could use structures as described in
L<perldsc> or L<perltoot> and do just what the algorithm book tells you
@@ -622,7 +784,23 @@ lists, or you could just do something like this with an array:
=head2 How do I shuffle an array randomly?
-Here's a shuffling algorithm which works its way through the list,
+Use this:
+
+ # fisher_yates_shuffle( \@array ) :
+ # generate a random permutation of @array in place
+ sub fisher_yates_shuffle {
+ my $array = shift;
+ my $i;
+ for ($i = @$array; --$i; ) {
+ my $j = int rand ($i+1);
+ next if $i == $j;
+ @$array[$i,$j] = @$array[$j,$i];
+ }
+ }
+
+ fisher_yates_shuffle( \@array ); # permutes @array in place
+
+You've probably seen shuffling algorithms that works using splice,
randomly picking another element to swap the current element with:
srand;
@@ -632,65 +810,70 @@ randomly picking another element to swap the current element with:
push(@new, splice(@old, rand @old, 1));
}
-For large arrays, this avoids a lot of the reshuffling:
-
- srand;
- @new = ();
- @old = 1 .. 10000; # just a demo
- for( @old ){
- my $r = rand @new+1;
- push(@new,$new[$r]);
- $new[$r] = $_;
- }
+This is bad because splice is already O(N), and since you do it N times,
+you just invented a quadratic algorithm; that is, O(N**2). This does
+not scale, although Perl is so efficient that you probably won't notice
+this until you have rather largish arrays.
=head2 How do I process/modify each element of an array?
Use C<for>/C<foreach>:
for (@lines) {
- s/foo/bar/;
- tr[a-z][A-Z];
+ s/foo/bar/; # change that word
+ y/XZ/ZX/; # swap those letters
}
Here's another; let's compute spherical volumes:
- for (@radii) {
+ for (@volumes = @radii) { # @volumes has changed parts
$_ **= 3;
$_ *= (4/3) * 3.14159; # this will be constant folded
}
+If you want to do the same thing to modify the values of the hash,
+you may not use the C<values> function, oddly enough. You need a slice:
+
+ for $orbit ( @orbits{keys %orbits} ) {
+ ($orbit **= 3) *= (4/3) * 3.14159;
+ }
+
=head2 How do I select a random element from an array?
Use the rand() function (see L<perlfunc/rand>):
+ # at the top of the program:
srand; # not needed for 5.004 and later
+
+ # then later on
$index = rand @array;
$element = $array[$index];
+Make sure you I<only call srand once per program, if then>.
+If you are calling it more than once (such as before each
+call to rand), you're almost certainly doing something wrong.
+
=head2 How do I permute N elements of a list?
Here's a little program that generates all permutations
of all the words on each line of input. The algorithm embodied
-in the permut() function should work on any list:
+in the permute() function should work on any list:
#!/usr/bin/perl -n
- # permute - tchrist@perl.com
- permut([split], []);
- sub permut {
- my @head = @{ $_[0] };
- my @tail = @{ $_[1] };
- unless (@head) {
- # stop recursing when there are no elements in the head
- print "@tail\n";
+ # tsc-permute: permute each word of input
+ permute([split], []);
+ sub permute {
+ my @items = @{ $_[0] };
+ my @perms = @{ $_[1] };
+ unless (@items) {
+ print "@perms\n";
} else {
- # for all elements in @head, move one from @head to @tail
- # and call permut() on the new @head and @tail
- my(@newhead,@newtail,$i);
- foreach $i (0 .. $#head) {
- @newhead = @head;
- @newtail = @tail;
- unshift(@newtail, splice(@newhead, $i, 1));
- permut([@newhead], [@newtail]);
+ my(@newitems,@newperms,$i);
+ foreach $i (0 .. $#items) {
+ @newitems = @items;
+ @newperms = @perms;
+ unshift(@newperms, splice(@newitems, $i, 1));
+ permute([@newitems], [@newperms]);
}
}
}
@@ -796,7 +979,7 @@ See L<perlfunc/defined> in the 5.004 release or later of Perl.
Use the each() function (see L<perlfunc/each>) if you don't care
whether it's sorted:
- while (($key,$value) = each %hash) {
+ while ( ($key, $value) = each %hash) {
print "$key = $value\n";
}
@@ -862,6 +1045,7 @@ L<perllocale>).
You can look into using the DB_File module and tie() using the
$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
+The Tie::IxHash module from CPAN might also be instructive.
=head2 What's the difference between "delete" and "undef" with hashes?
@@ -953,7 +1137,7 @@ they end up doing is not what they do with ordinary hashes.
=head2 How do I reset an each() operation part-way through?
-Using C<keys %hash> in a scalar context returns the number of keys in
+Using C<keys %hash> in scalar context returns the number of keys in
the hash I<and> resets the iterator associated with the hash. You may
need to do this if you use C<last> to exit a loop early so that when you
re-enter it, the hash iterator has been reset.
@@ -1055,14 +1239,37 @@ Assuming that you don't care about IEEE notations like "NaN" or
"Infinity", you probably just want to use a regular expression.
warn "has nondigits" if /\D/;
- warn "not a whole number" unless /^\d+$/;
- warn "not an integer" unless /^-?\d+$/; # reject +3
+ warn "not a natural number" unless /^\d+$/; # rejects -3
+ warn "not an integer" unless /^-?\d+$/; # rejects +3
warn "not an integer" unless /^[+-]?\d+$/;
warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2
warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/;
warn "not a C float"
unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
+If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
+function. Its semantics are somewhat cumbersome, so here's a C<getnum>
+wrapper function for more convenient access. This function takes
+a string and returns the number it found, or C<undef> for input that
+isn't a C float. The C<is_numeric> function is a front end to C<getnum>
+if you just want to say, ``Is this a float?''
+
+ sub getnum {
+ use POSIX qw(strtod);
+ my $str = shift;
+ $str =~ s/^\s+//;
+ $str =~ s/\s+$//;
+ $! = 0;
+ my($num, $unparsed) = strtod($str);
+ if (($str eq '') || ($unparsed != 0) || $!) {
+ return undef;
+ } else {
+ return $num;
+ }
+ }
+
+ sub is_numeric { defined &getnum }
+
Or you could check out
http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz
instead. The POSIX module (part of the standard Perl distribution)
@@ -1096,6 +1303,18 @@ Get the Business::CreditCard module from CPAN.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
-All rights reserved. See L<perlfaq> for distribution information.
-
+Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
+
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod
index 5d71f648de..41c46a3dae 100644
--- a/pod/perlfaq5.pod
+++ b/pod/perlfaq5.pod
@@ -7,7 +7,7 @@ perlfaq5 - Files and Formats ($Revision: 1.22 $, $Date: 1997/04/24 22:44:02 $)
This section deals with I/O and the "f" issues: filehandles, flushing,
formats, and footers.
-=head2 How do I flush/unbuffer a filehandle? Why must I do this?
+=head2 How do I flush/unbuffer an output filehandle? Why must I do this?
The C standard I/O library (stdio) normally buffers characters sent to
devices. This is done for efficiency reasons, so that there isn't a
@@ -15,7 +15,7 @@ system call for each byte. Any time you use print() or write() in
Perl, you go though this buffering. syswrite() circumvents stdio and
buffering.
-In most stdio implementations, the type of buffering and the size of
+In most stdio implementations, the type of output buffering and the size of
the buffer varies according to the type of device. Disk files are block
buffered, often with a buffer size of more than 2k. Pipes and sockets
are often buffered with a buffer size between 1/2 and 2k. Serial devices
@@ -29,10 +29,23 @@ command. This isn't as hard on your system as unbuffering, but does
get the output where you want it when you want it.
If you expect characters to get to your device when you print them there,
-you'll want to autoflush its handle, as in the older:
+you'll want to autoflush its handle.
+Use select() and the C<$|> variable to control autoflushing
+(see L<perlvar/$|> and L<perlfunc/select>):
+
+ $old_fh = select(OUTPUT_HANDLE);
+ $| = 1;
+ select($old_fh);
+
+Or using the traditional idiom:
+
+ select((select(OUTPUT_HANDLE), $| = 1)[0]);
+
+Or if don't mind slowly loading several thousand lines of module code
+just because you're afraid of the C<$|> variable:
use FileHandle;
- open(DEV, "<+/dev/tty"); # ceci n'est pas une pipe
+ open(DEV, "+</dev/tty"); # ceci n'est pas une pipe
DEV->autoflush(1);
or the newer IO::* modules:
@@ -50,24 +63,18 @@ or even this:
die "$!" unless $sock;
$sock->autoflush();
- $sock->print("GET /\015\012");
- $document = join('', $sock->getlines());
+ print $sock "GET / HTTP/1.0" . "\015\012" x 2;
+ $document = join('', <$sock>);
print "DOC IS: $document\n";
-Note the hardcoded carriage return and newline in their octal
-equivalents. This is the ONLY way (currently) to assure a proper
-flush on all platforms, including Macintosh.
+Note the bizarrely hardcoded carriage return and newline in their octal
+equivalents. This is the ONLY way (currently) to assure a proper flush
+on all platforms, including Macintosh. That the way things work in
+network programming: you really should specify the exact bit pattern
+on the network line terminator. In practice, C<"\n\n"> often works,
+but this is not portable.
-You can use select() and the C<$|> variable to control autoflushing
-(see L<perlvar/$|> and L<perlfunc/select>):
-
- $oldh = select(DEV);
- $| = 1;
- select($oldh);
-
-You'll also see code that does this without a temporary variable, as in
-
- select((select(DEV), $| = 1)[0]);
+See L<perlfaq9> for other examples of fetching URLs over the web.
=head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file?
@@ -78,13 +85,15 @@ bytes. In general, there's no direct way for Perl to seek to a
particular line of a file, insert text into a file, or remove text
from a file.
-(There are exceptions in special circumstances. Replacing a sequence
-of bytes with another sequence of the same length is one. Another is
-using the C<$DB_RECNO> array bindings as documented in L<DB_File>.
-Yet another is manipulating files with all lines the same length.)
+(There are exceptions in special circumstances. You can add or remove at
+the very end of the file. Another is replacing a sequence of bytes with
+another sequence of the same length. Another is using the C<$DB_RECNO>
+array bindings as documented in L<DB_File>. Yet another is manipulating
+files with all lines the same length.)
The general solution is to create a temporary copy of the text file with
-the changes you want, then copy that over the original.
+the changes you want, then copy that over the original. This assumes
+no locking.
$old = $file;
$new = "$file.tmp.$$";
@@ -157,45 +166,73 @@ proper text file, so this may report one fewer line than you expect.
}
close FILE;
+This assumes no funny games with newline translations.
+
=head2 How do I make a temporary file name?
-Use the process ID and/or the current time-value. If you need to have
-many temporary files in one process, use a counter:
+Use the C<new_tmpfile> class method from the IO::File module to get a
+filehandle opened for reading and writing. Use this if you don't
+need to know the file's name.
- BEGIN {
use IO::File;
+ $fh = IO::File->new_tmpfile()
+ or die "Unable to make new temporary file: $!";
+
+Or you can use the C<tmpnam> function from the POSIX module to get a
+filename that you then open yourself. Use this if you do need to know
+the file's name.
+
+ use Fcntl;
+ use POSIX qw(tmpnam);
+
+ # try new temporary filenames until we get one that didn't already
+ # exist; the check should be unnecessary, but you can't be too careful
+ do { $name = tmpnam() }
+ until sysopen(FH, $name, O_RDWR|O_CREAT|O_EXCL);
+
+ # install atexit-style handler so that when we exit or die,
+ # we automatically delete this temporary file
+ END { unlink($name) or die "Couldn't unlink $name : $!" }
+
+ # now go on to use the file ...
+
+If you're committed to doing this by hand, use the process ID and/or
+the current time-value. If you need to have many temporary files in
+one process, use a counter:
+
+ BEGIN {
use Fcntl;
my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMP} || $ENV{TEMP};
my $base_name = sprintf("%s/%d-%d-0000", $temp_dir, $$, time());
sub temp_file {
- my $fh = undef;
+ local *FH;
my $count = 0;
- until (defined($fh) || $count > 100) {
+ until (defined(fileno(FH)) || $count++ > 100) {
$base_name =~ s/-(\d+)$/"-" . (1 + $1)/e;
- $fh = IO::File->new($base_name, O_WRONLY|O_EXCL|O_CREAT, 0644)
+ sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT);
}
- if (defined($fh)) {
- return ($fh, $base_name);
+ if (defined(fileno(FH))
+ return (*FH, $base_name);
} else {
return ();
}
}
}
-Or you could simply use IO::Handle::new_tmpfile.
-
=head2 How can I manipulate fixed-record-length files?
-The most efficient way is using pack() and unpack(). This is faster
-than using substr(). Here is a sample chunk of code to break up and
-put back together again some fixed-format input lines, in this case
-from the output of a normal, Berkeley-style ps:
+The most efficient way is using pack() and unpack(). This is faster than
+using substr() when take many, many strings. It is slower for just a few.
+
+Here is a sample chunk of code to break up and put back together again
+some fixed-format input lines, in this case from the output of a normal,
+Berkeley-style ps:
# sample input line:
# 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what
$PS_T = 'A6 A4 A7 A5 A*';
open(PS, "ps|");
- $_ = <PS>; print;
+ print scalar <PS>;
while (<PS>) {
($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_);
for $var (qw!pid tt stat time command!) {
@@ -205,61 +242,174 @@ from the output of a normal, Berkeley-style ps:
"\n";
}
+We've used C<$$var> in a way that forbidden by C<use strict 'refs'>.
+That is, we've promoted a string to a scalar variable reference using
+symbolic references. This is ok in small programs, but doesn't scale
+well. It also only works on global variables, not lexicals.
+
=head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles?
-You may have some success with typeglobs, as we always had to use
-in days of old:
+The fastest, simplest, and most direct way is to localize the typeglob
+of the filehandle in question:
- local(*FH);
+ local *TmpHandle;
-But while still supported, that isn't the best to go about getting
-local filehandles. Typeglobs have their drawbacks. You may well want
-to use the C<FileHandle> module, which creates new filehandles for you
-(see L<FileHandle>):
+Typeglobs are fast (especially compared with the alternatives) and
+reasonably easy to use, but they also have one subtle drawback. If you
+had, for example, a function named TmpHandle(), or a variable named
+%TmpHandle, you just hid it from yourself.
- use FileHandle;
sub findme {
- my $fh = FileHandle->new();
- open($fh, "</etc/hosts") or die "no /etc/hosts: $!";
- while (<$fh>) {
+ local *HostFile;
+ open(HostFile, "</etc/hosts") or die "no /etc/hosts: $!";
+ local $_; # <- VERY IMPORTANT
+ while (<HostFile>) {
print if /\b127\.(0\.0\.)?1\b/;
}
- # $fh automatically closes/disappears here
+ # *HostFile automatically closes/disappears here
+ }
+
+Here's how to use this in a loop to open and store a bunch of
+filehandles. We'll use as values of the hash an ordered
+pair to make it easy to sort the hash in insertion order.
+
+ @names = qw(motd termcap passwd hosts);
+ my $i = 0;
+ foreach $filename (@names) {
+ local *FH;
+ open(FH, "/etc/$filename") || die "$filename: $!";
+ $file{$filename} = [ $i++, *FH ];
}
-Internally, Perl believes filehandles to be of class IO::Handle. You
-may use that module directly if you'd like (see L<IO::Handle>), or
-one of its more specific derived classes.
+ # Using the filehandles in the array
+ foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) {
+ my $fh = $file{$name}[1];
+ my $line = <$fh>;
+ print "$name $. $line";
+ }
+
+If you want to create many, anonymous handles, you should check out the
+Symbol, FileHandle, or IO::Handle (etc.) modules. Here's the equivalent
+code with Symbol::gensym, which is reasonably light-weight:
+
+ foreach $filename (@names) {
+ use Symbol;
+ my $fh = gensym();
+ open($fh, "/etc/$filename") || die "open /etc/$filename: $!";
+ $file{$filename} = [ $i++, $fh ];
+ }
-Once you have IO::File or FileHandle objects, you can pass them
-between subroutines or store them in hashes as you would any other
-scalar values:
+Or here using the semi-object-oriented FileHandle, which certainly isn't
+light-weight:
use FileHandle;
- # Storing filehandles in a hash and array
foreach $filename (@names) {
- my $fh = new FileHandle($filename) or die;
- $file{$filename} = $fh;
- push(@files, $fh);
+ my $fh = FileHandle->new("/etc/$filename") or die "$filename: $!";
+ $file{$filename} = [ $i++, $fh ];
}
- # Using the filehandles in the array
- foreach $file (@files) {
- print $file "Testing\n";
+Please understand that whether the filehandle happens to be a (probably
+localized) typeglob or an anonymous handle from one of the modules,
+in no way affects the bizarre rules for managing indirect handles.
+See the next question.
+
+=head2 How can I use a filehandle indirectly?
+
+An indirect filehandle is using something other than a symbol
+in a place that a filehandle is expected. Here are ways
+to get those:
+
+ $fh = SOME_FH; # bareword is strict-subs hostile
+ $fh = "SOME_FH"; # strict-refs hostile; same package only
+ $fh = *SOME_FH; # typeglob
+ $fh = \*SOME_FH; # ref to typeglob (bless-able)
+ $fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob
+
+Or to use the C<new> method from the FileHandle or IO modules to
+create an anonymous filehandle, store that in a scalar variable,
+and use it as though it were a normal filehandle.
+
+ use FileHandle;
+ $fh = FileHandle->new();
+
+ use IO::Handle; # 5.004 or higher
+ $fh = IO::Handle->new();
+
+Then use any of those as you would a normal filehandle. Anywhere that
+Perl is expecting a filehandle, an indirect filehandle may be used
+instead. An indirect filehandle is just a scalar variable that contains
+a filehandle. Functions like C<print>, C<open>, C<seek>, or the functions or
+the C<E<lt>FHE<gt>> diamond operator will accept either a read filehandle
+or a scalar variable containing one:
+
+ ($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR);
+ print $ofh "Type it: ";
+ $got = <$ifh>
+ print $efh "What was that: $got";
+
+Of you're passing a filehandle to a function, you can write
+the function in two ways:
+
+ sub accept_fh {
+ my $fh = shift;
+ print $fh "Sending to indirect filehandle\n";
}
- # You have to do the { } ugliness when you're specifying the
- # filehandle by anything other than a simple scalar variable.
- print { $files[2] } "Testing\n";
+Or it can localize a typeglob and use the filehandle directly:
- # Passing filehandles to subroutines
- sub debug {
- my $filehandle = shift;
- printf $filehandle "DEBUG: ", @_;
+ sub accept_fh {
+ local *FH = shift;
+ print FH "Sending to localized filehandle\n";
}
- debug($fh, "Testing\n");
+Both styles work with either objects or typeglobs of real filehandles.
+(They might also work with strings under some circumstances, but this
+is risky.)
+
+ accept_fh(*STDOUT);
+ accept_fh($handle);
+
+In the examples above, we assigned the filehandle to a scalar variable
+before using it. That is because only simple scalar variables,
+not expressions or subscripts into hashes or arrays, can be used with
+built-ins like C<print>, C<printf>, or the diamond operator. These are
+illegal and won't even compile:
+
+ @fd = (*STDIN, *STDOUT, *STDERR);
+ print $fd[1] "Type it: "; # WRONG
+ $got = <$fd[0]> # WRONG
+ print $fd[2] "What was that: $got"; # WRONG
+
+With C<print> and C<printf>, you get around this by using a block and
+an expression where you would place the filehandle:
+
+ print { $fd[1] } "funny stuff\n";
+ printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559;
+ # Pity the poor deadbeef.
+
+That block is a proper block like any other, so you can put more
+complicated code there. This sends the message out to one of two places:
+
+ $ok = -x "/bin/cat";
+ print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";
+ print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";
+
+This approach of treating C<print> and C<printf> like object methods
+calls doesn't work for the diamond operator. That's because it's a
+real operator, not just a function with a comma-less argument. Assuming
+you've been storing typeglobs in your structure as we did above, you
+can use the built-in function named C<readline> to reads a record just
+as C<E<lt>E<gt>> does. Given the initialization shown above for @fd, this
+would work, but only because readline() require a typeglob. It doesn't
+work with objects or strings, which might be a bug we haven't fixed yet.
+
+ $got = readline($fd[0]);
+
+Let it be noted that the flakiness of indirect filehandles is not
+related to whether they're strings, typeglobs, objects, or anything else.
+It's the syntax of the fundamental operators. Playing the object
+game doesn't help you at all here.
=head2 How can I set up a footer format to be used with write()?
@@ -326,23 +476,72 @@ Within Perl, you may use this directly:
: ( $ENV{HOME} || $ENV{LOGDIR} )
}ex;
-=head2 How come when I open the file read-write it wipes it out?
+=head2 How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file and
I<then> gives you read-write access:
- open(FH, "+> /path/name"); # WRONG
+ open(FH, "+> /path/name"); # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
-doesn't exist.
+doesn't exist. Using "E<gt>" always clobbers or creates.
+Using "E<lt>" never does either. The "+" doesn't change this.
- open(FH, "+< /path/name"); # open for update
+Here are examples of many kinds of file opens. Those using sysopen()
+all assume
-If this is an issue, try:
+ use Fcntl;
- sysopen(FH, "/path/name", O_RDWR|O_CREAT, 0644);
+To open file for reading:
-Error checking is left as an exercise for the reader.
+ open(FH, "< $path") || die $!;
+ sysopen(FH, $path, O_RDONLY) || die $!;
+
+To open file for writing, create new file if needed or else truncate old file:
+
+ open(FH, "> $path") || die $!;
+ sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT) || die $!;
+ sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT, 0666) || die $!;
+
+To open file for writing, create new file, file must not exist:
+
+ sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT) || die $!;
+ sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT, 0666) || die $!;
+
+To open file for appending, create if necessary:
+
+ open(FH, ">> $path") || die $!;
+ sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT) || die $!;
+ sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT, 0666) || die $!;
+
+To open file for appending, file must exist:
+
+ sysopen(FH, $path, O_WRONLY|O_APPEND) || die $!;
+
+To open file for update, file must exist:
+
+ open(FH, "+< $path") || die $!;
+ sysopen(FH, $path, O_RDWR) || die $!;
+
+To open file for update, create file if necessary:
+
+ sysopen(FH, $path, O_RDWR|O_CREAT) || die $!;
+ sysopen(FH, $path, O_RDWR|O_CREAT, 0666) || die $!;
+
+To open file for update, file must not exist:
+
+ sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT) || die $!;
+ sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT, 0666) || die $!;
+
+To open a file without blocking, creating if necessary:
+
+ sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT)
+ or die "can't open /tmp/somefile: $!":
+
+Be warned that neither creation nor deletion of files is guaranteed to
+be an atomic operation over NFS. That is, two processes might both
+successful create or unlink the same file! Therefore O_EXCL
+isn't so exclusive as you might wish.
=head2 Why do I sometimes get an "Argument list too long" when I use <*>?
@@ -398,6 +597,8 @@ then delete the old one. This isn't really the same semantics as a
real rename(), though, which preserves metainformation like
permissions, timestamps, inode info, etc.
+The newer version of File::Copy export a move() function.
+
=head2 How can I lock a file?
Perl's builtin flock() function (see L<perlfunc> for details) will call
@@ -428,10 +629,6 @@ this.
=back
-The CPAN module File::Lock offers similar functionality and (if you
-have dynamic loading) won't require you to rebuild perl if your
-flock() can't lock network files.
-
=head2 What can't I just open(FH, ">file.lock")?
A common bit of code B<NOT TO USE> is this:
@@ -443,7 +640,7 @@ This is a classic race condition: you take two steps to do something
which must be done in one. That's why computer hardware provides an
atomic test-and-set instruction. In theory, this "ought" to work:
- sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT, 0644)
+ sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT)
or die "can't open file.lock: $!":
except that lamentably, file creation (and deletion) is not atomic
@@ -454,11 +651,14 @@ these tend to involve busy-wait, which is also subdesirable.
=head2 I still don't get locking. I just want to increment the number in the file. How can I do this?
Didn't anyone ever tell you web-page hit counters were useless?
+They don't count number of hits, they're a waste of time, and they serve
+only to stroke the writer's vanity. Better to pick a random number.
+It's more realistic.
-Anyway, this is what to do:
+Anyway, this is what you can do if you can't help yourself.
use Fcntl;
- sysopen(FH, "numfile", O_RDWR|O_CREAT, 0644) or die "can't open numfile: $!";
+ sysopen(FH, "numfile", O_RDWR|O_CREAT) or die "can't open numfile: $!";
flock(FH, 2) or die "can't flock numfile: $!";
$num = <FH> || 0;
seek(FH, 0, 0) or die "can't rewind numfile: $!";
@@ -496,10 +696,6 @@ like this:
Locking and error checking are left as an exercise for the reader.
Don't forget them, or you'll be quite sorry.
-Don't forget to set binmode() under DOS-like platforms when operating
-on files that have anything other than straight text in them. See the
-docs on open() and on binmode() for more details.
-
=head2 How do I get a file's timestamp in perl?
If you want to retrieve the time at which the file was last read,
@@ -558,13 +754,18 @@ of the multiplexing:
open (FH, "| tee file1 file2 file3");
-Otherwise you'll have to write your own multiplexing print function --
-or your own tee program -- or use Tom Christiansen's, at
-http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is
-written in Perl.
+Or even:
+
+ # make STDOUT go to three files, plus original STDOUT
+ open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n";
+ print "whatever\n" or die "Writing: $!\n";
+ close(STDOUT) or die "Closing: $!\n";
-In theory a IO::Tee class could be written, but to date we haven't
-seen such.
+Otherwise you'll have to write your own multiplexing print
+function -- or your own tee program -- or use Tom Christiansen's,
+at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is
+written in Perl and offers much greater functionality
+than the stock version.
=head2 How can I read in a file by paragraphs?
@@ -691,7 +892,11 @@ file that worked.
=head2 How can I tell if there's a character waiting on a filehandle?
-You should check out the Frequently Asked Questions list in
+The very first thing you should do is look into getting the Term::ReadKey
+extension from CPAN. It now even has limited support for closed, proprietary
+(read: not open systems, not POSIX, not Unix, etc) systems.
+
+You should also check out the Frequently Asked Questions list in
comp.unix.* for things like this: the answer is essentially the same.
It's very system dependent. Here's one solution that works on BSD
systems:
@@ -702,29 +907,47 @@ systems:
return $nfd = select($rin,undef,undef,0);
}
-You should look into getting the Term::ReadKey extension from CPAN.
+If you want to find out how many characters are waiting,
+there's also the FIONREAD ioctl call to be looked at.
-=head2 How do I open a file without blocking?
+The I<h2ph> tool that comes with Perl tries to convert C include
+files to Perl code, which can be C<require>d. FIONREAD ends
+up defined as a function in the I<sys/ioctl.ph> file:
-You need to use the O_NDELAY or O_NONBLOCK flag from the Fcntl module
-in conjunction with sysopen():
+ require 'sys/ioctl.ph';
- use Fcntl;
- sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644)
- or die "can't open /tmp/somefile: $!":
+ $size = pack("L", 0);
+ ioctl(FH, FIONREAD(), $size) or die "Couldn't call ioctl: $!\n";
+ $size = unpack("L", $size);
-=head2 How do I create a file only if it doesn't exist?
+If I<h2ph> wasn't installed or doesn't work for you, you can
+I<grep> the include files by hand:
-You need to use the O_CREAT and O_EXCL flags from the Fcntl module in
-conjunction with sysopen():
+ % grep FIONREAD /usr/include/*/*
+ /usr/include/asm/ioctls.h:#define FIONREAD 0x541B
- use Fcntl;
- sysopen(FH, "/tmp/somefile", O_WRONLY|O_EXCL|O_CREAT, 0644)
- or die "can't open /tmp/somefile: $!":
+Or write a small C program using the editor of champions:
-Be warned that neither creation nor deletion of files is guaranteed to
-be an atomic operation over NFS. That is, two processes might both
-successful create or unlink the same file!
+ % cat > fionread.c
+ #include <sys/ioctl.h>
+ main() {
+ printf("%#08x\n", FIONREAD);
+ }
+ ^D
+ % cc -o fionread fionread
+ % ./fionread
+ 0x4004667f
+
+And then hard-code it, leaving porting as an exercise to your successor.
+
+ $FIONREAD = 0x4004667f; # XXX: opsys dependent
+
+ $size = pack("L", 0);
+ ioctl(FH, $FIONREAD, $size) or die "Couldn't call ioctl: $!\n";
+ $size = unpack("L", $size);
+
+FIONREAD requires a filehandle connected to a stream, meaning sockets,
+pipes, and tty devices work, but I<not> files.
=head2 How do I do a C<tail -f> in perl?
@@ -765,7 +988,12 @@ Or even with a literal numeric descriptor:
$fd = $ENV{MHCONTEXTFD};
open(MHCONTEXT, "<&=$fd"); # like fdopen(3S)
-Error checking has been left as an exercise for the reader.
+Note that "E<lt>&STDIN" makes a copy, but "E<lt>&=STDIN" make
+an alias. That means if you close an aliased handle, all
+aliases become inaccessible. This is not true with
+a copied one.
+
+Error checking, as always, has been left as an exercise for the reader.
=head2 How do I close a file descriptor by number?
@@ -797,7 +1025,7 @@ awk, Tcl, Java, or Python, just to mention a few.
Because even on non-Unix ports, Perl's glob function follows standard
Unix globbing semantics. You'll need C<glob("*")> to get all (non-hidden)
-files.
+files. This makes glob() portable.
=head2 Why does Perl let me delete read-only files? Why does C<-i> clobber protected files? Isn't this a bug in Perl?
@@ -821,10 +1049,23 @@ Here's an algorithm from the Camel Book:
rand($.) < 1 && ($line = $_) while <>;
This has a significant advantage in space over reading the whole
-file in.
+file in. A simple proof by induction is available upon
+request if you doubt its correctness.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
-All rights reserved. See L<perlfaq> for distribution information.
-
+Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
+
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod
index 535e464455..0e94f9b2bb 100644
--- a/pod/perlfaq6.pod
+++ b/pod/perlfaq6.pod
@@ -25,7 +25,7 @@ comments.
# turn the line into the first word, a colon, and the
# number of characters on the rest of the line
- s/^(\w+)(.*)/ lc($1) . ":" . length($2) /ge;
+ s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;
=item Comments Inside the Regexp
@@ -69,8 +69,9 @@ delimiter within the pattern:
=head2 I'm having trouble matching over more than one line. What's wrong?
-Either you don't have newlines in your string, or you aren't using the
-correct modifier(s) on your pattern.
+Either you don't have more than one line in the string you're looking at
+(probably), or else you aren't using the correct modifier(s) on your
+pattern (possibly).
There are many ways to get multiline data into a string. If you want
it to happen automatically while reading input, you'll want to set $/
@@ -94,7 +95,7 @@ record read in.
$/ = ''; # read in more whole paragraph, not just one line
while ( <> ) {
- while ( /\b(\w\S+)(\s+\1)+\b/gi ) {
+ while ( /\b([\w'-]+)(\s+\1)+\b/gi ) { # word starts alpha
print "Duplicate $1 at paragraph $.\n";
}
}
@@ -133,6 +134,16 @@ But if you want nested occurrences of C<START> through C<END>, you'll
run up against the problem described in the question in this section
on matching balanced text.
+Here's another example of using C<..>:
+
+ while (<>) {
+ $in_header = 1 .. /^$/;
+ $in_body = /^$/ .. eof();
+ # now choose between them
+ } continue {
+ reset if eof(); # fix $.
+ }
+
=head2 I put a regular expression into $/ but it didn't work. What's wrong?
$/ must be a string, not a regular expression. Awk has to be better
@@ -211,7 +222,7 @@ This prints:
this is a SUcCESS case
-=head2 How can I make C<\w> match accented characters?
+=head2 How can I make C<\w> match national character sets?
See L<perllocale>.
@@ -600,6 +611,18 @@ all mixed.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
-All rights reserved. See L<perlfaq> for distribution information.
-
+Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
+
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
diff --git a/pod/perlfaq7.pod b/pod/perlfaq7.pod
index d62ee36621..291ca3670d 100644
--- a/pod/perlfaq7.pod
+++ b/pod/perlfaq7.pod
@@ -713,5 +713,18 @@ Use embedded POD to discard it:
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
-All rights reserved. See L<perlfaq> for distribution information.
+Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
+
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
diff --git a/pod/perlfaq8.pod b/pod/perlfaq8.pod
index dbc1bcd10e..f4addd8c4c 100644
--- a/pod/perlfaq8.pod
+++ b/pod/perlfaq8.pod
@@ -848,5 +848,18 @@ included with the 5.002 release of Perl.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
-All rights reserved. See L<perlfaq> for distribution information.
+Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
+
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
diff --git a/pod/perlfaq9.pod b/pod/perlfaq9.pod
index aa942c2da0..deeb6508ef 100644
--- a/pod/perlfaq9.pod
+++ b/pod/perlfaq9.pod
@@ -166,7 +166,7 @@ C<eval> or C<system> calls. In addition to tainting, never use the
single-argument form of system() or exec(). Instead, supply the
command and arguments as a list, which prevents shell globbing.
-=head2 How do I parse an email header?
+=head2 How do I parse a mail header?
For a quick-and-dirty solution, try this solution derived
from page 222 of the 2nd edition of "Programming Perl":
@@ -193,33 +193,33 @@ CGI.pm or CGI_Lite.pm (available from CPAN), or if you're trapped in
the module-free land of perl1 .. perl4, you might look into cgi-lib.pl
(available from http://www.bio.cam.ac.uk/web/form.html).
-=head2 How do I check a valid email address?
+=head2 How do I check a valid mail address?
You can't.
Without sending mail to the address and seeing whether it bounces (and
even then you face the halting problem), you cannot determine whether
-an email address is valid. Even if you apply the email header
+an mail address is valid. Even if you apply the mail header
standard, you can have problems, because there are deliverable
addresses that aren't RFC-822 (the mail header standard) compliant,
and addresses that aren't deliverable which are compliant.
-Many are tempted to try to eliminate many frequently-invalid email
+Many are tempted to try to eliminate many frequently-invalid mail
addresses with a simple regexp, such as
C</^[\w.-]+\@([\w.-]\.)+\w+$/>. However, this also throws out many
valid ones, and says nothing about potential deliverability, so is not
suggested. Instead, see
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz ,
which actually checks against the full RFC spec (except for nested
-comments), looks for addresses you may not wish to accept email to
+comments), looks for addresses you may not wish to accept mail to
(say, Bill Clinton or your postmaster), and then makes sure that the
hostname given can be looked up in DNS. It's not fast, but it works.
Here's an alternative strategy used by many CGI script authors: Check
-the email address with a simple regexp (such as the one above). If
+the mail address with a simple regexp (such as the one above). If
the regexp matched the address, accept the address. If the regexp
didn't match the address, request confirmation from the user that the
-email address they entered was correct.
+mail address they entered was correct.
=head2 How do I decode a MIME/BASE64 string?
@@ -237,7 +237,7 @@ format after minor transliterations:
$len = pack("c", 32 + 0.75*length); # compute length byte
print unpack("u", $len . $_); # uudecode and print
-=head2 How do I return the user's email address?
+=head2 How do I return the user's mail address?
On systems that support getpwuid, the $E<lt> variable and the
Sys::Hostname module (which is part of the standard perl distribution),
@@ -246,9 +246,9 @@ you can probably try using something like this:
use Sys::Hostname;
$address = sprintf('%s@%s', getpwuid($<), hostname);
-Company policies on email address can mean that this generates addresses
-that the company's email system will not accept, so you should ask for
-users' email addresses when this matters. Furthermore, not all systems
+Company policies on mail address can mean that this generates addresses
+that the company's mail system will not accept, so you should ask for
+users' mail addresses when this matters. Furthermore, not all systems
on which Perl runs are so forthcoming with this information as is Unix.
The Mail::Util module from CPAN (part of the MailTools package) provides a
@@ -326,6 +326,18 @@ CPAN). No ONC::RPC module is known.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
-All rights reserved. See L<perlfaq> for distribution information.
-
+Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington.
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
+
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
diff --git a/pod/perlform.pod b/pod/perlform.pod
index 0b2a68c3d4..6b65e04303 100644
--- a/pod/perlform.pod
+++ b/pod/perlform.pod
@@ -233,11 +233,11 @@ of the page, however wide it is." You have to specify where it goes.
The truly desperate can generate their own format on the fly, based
on the current number of columns, and then eval() it:
- $format = "format STDOUT = \n";
- . '^' . '<' x $cols . "\n";
- . '$entry' . "\n";
- . "\t^" . "<" x ($cols-8) . "~~\n";
- . '$entry' . "\n";
+ $format = "format STDOUT = \n"
+ . '^' . '<' x $cols . "\n"
+ . '$entry' . "\n"
+ . "\t^" . "<" x ($cols-8) . "~~\n"
+ . '$entry' . "\n"
. ".\n";
print $format if $Debugging;
eval $format;
@@ -295,7 +295,7 @@ For example:
print "Wow, I just stored `$^A' in the accumulator!\n";
-Or to make an swrite() subroutine which is to write() what sprintf()
+Or to make an swrite() subroutine, which is to write() what sprintf()
is to printf(), do this:
use Carp;
@@ -315,18 +315,18 @@ is to printf(), do this:
=head1 WARNINGS
-The lone dot that ends a format can also prematurely end an email
+The lone dot that ends a format can also prematurely end a mail
message passing through a misconfigured Internet mailer (and based on
experience, such misconfiguration is the rule, not the exception). So
-when sending format code through email, you should indent it so that
+when sending format code through mail, you should indent it so that
the format-ending dot is not on the left margin; this will prevent
-email cutoff.
+SMTP cutoff.
Lexical variables (declared with "my") are not visible within a
format unless the format is declared within the scope of the lexical
variable. (They weren't visible at all before version 5.001.)
-Formats are the only part of Perl which unconditionally use information
+Formats are the only part of Perl that unconditionally use information
from a program's locale; if a program's environment specifies an
LC_NUMERIC locale, it is always used to specify the decimal point
character in formatted output. Perl ignores all other aspects of locale
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index e867a0c65d..25a97ffddb 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -1,4 +1,3 @@
-
=head1 NAME
perlfunc - Perl builtin functions
@@ -53,26 +52,36 @@ nonabortive failure is generally indicated in a scalar context by
returning the undefined value, and in a list context by returning the
null list.
-Remember the following rule:
-
-=over 8
-
-=item I<THERE IS NO GENERAL RULE FOR CONVERTING A LIST INTO A SCALAR!>
-
-=back
-
+Remember the following important rule: There is B<no rule> that relates
+the behavior of an expression in list context to its behavior in scalar
+context, or vice versa. It might do two totally different things.
Each operator and function decides which sort of value it would be most
appropriate to return in a scalar context. Some operators return the
-length of the list that would have been returned in a list context. Some
+length of the list that would have been returned in list context. Some
operators return the first value in the list. Some operators return the
last value in the list. Some operators return a count of successful
operations. In general, they do what you want, unless you want
consistency.
+An named array in scalar context is quite different from what would at
+first glance appear to be a list in scalar context. You can't get a list
+like C<(1,2,3)> into being in scalar context, because the compiler knows
+the context at compile time. It would generate the scalar comma operator
+there, not the list construction version of the comma. That means it
+was never a list to start with.
+
+In general, functions in Perl that serve as wrappers for system calls
+of the same name (like chown(2), fork(2), closedir(2), etc.) all return
+true when they succeed and C<undef> otherwise, as is usually mentioned
+in the descriptions below. This is different from the C interfaces,
+which return -1 on failure. Exceptions to this rule are wait(),
+waitpid(), and syscall(). System calls also set the special C<$!>
+variable on failure. Other functions do not, except accidentally.
+
=head2 Perl Functions by Category
Here are Perl's functions (including things that look like
-functions, like some of the keywords and named operators)
+functions, like some keywords and named operators)
arranged by category. Some functions appear in more
than one place.
@@ -189,7 +198,7 @@ C<qw>, C<readline>, C<readpipe>, C<ref>, C<sub*>, C<sysopen>, C<tie>,
C<tied>, C<uc>, C<ucfirst>, C<untie>, C<use>
* - C<sub> was a keyword in perl4, but in perl5 it is an
-operator which can be used in expressions.
+operator, which can be used in expressions.
=item Functions obsoleted in perl5
@@ -254,7 +263,7 @@ operator may be any of:
The interpretation of the file permission operators C<-r>, C<-R>, C<-w>,
C<-W>, C<-x>, and C<-X> is based solely on the mode of the file and the
uids and gids of the user. There may be other reasons you can't actually
-read, write or execute the file. Also note that, for the superuser,
+read, write, or execute the file, such as AFS access control lists. Also note that, for the superuser,
C<-r>, C<-R>, C<-w>, and C<-W> always return 1, and C<-x> and C<-X> return
1 if any execute bit is set in the mode. Scripts run by the superuser may
thus need to do a stat() to determine the actual mode of the
@@ -265,7 +274,7 @@ Example:
while (<>) {
chop;
next unless -f $_; # ignore specials
- ...
+ #...
}
Note that C<-s/a/b/> does not do a negated substitution. Saying
@@ -274,7 +283,7 @@ following a minus are interpreted as file tests.
The C<-T> and C<-B> switches work as follows. The first block or so of the
file is examined for odd characters such as strange control codes or
-characters with the high bit set. If too many odd characters (E<gt>30%)
+characters with the high bit set. If too many strange characters (E<gt>30%)
are found, it's a C<-B> file, otherwise it's a C<-T> file. Also, any file
containing null in the first block is considered a binary file. If C<-T>
or C<-B> is used on a filehandle, the current stdio buffer is examined
@@ -336,17 +345,18 @@ and sleep() calls.
If you want to use alarm() to time out a system call you need to use an
eval/die pair. You can't rely on the alarm causing the system call to
-fail with $! set to EINTR because Perl sets up signal handlers to
-restart system calls on some systems. Using eval/die always works.
+fail with C<$!> set to EINTR because Perl sets up signal handlers to
+restart system calls on some systems. Using eval/die always works,
+modulo the caveats given in L<perlipc/"Signals">.
eval {
- local $SIG{ALRM} = sub { die "alarm\n" }; # NB \n required
+ local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
alarm $timeout;
$nread = sysread SOCKET, $buffer, $size;
alarm 0;
};
- die if $@ && $@ ne "alarm\n"; # propagate errors
if ($@) {
+ die unless $@ eq "alarm\n"; # propagate unexpected errors
# timed out
}
else {
@@ -378,7 +388,7 @@ translated to CR LF on output. Binmode has no effect under Unix; in MS-DOS
and similarly archaic systems, it may be imperative--otherwise your
MS-DOS-damaged C library may mangle your file. The key distinction between
systems that need binmode and those that don't is their text file
-formats. Systems like Unix and Plan9 that delimit lines with a single
+formats. Systems like Unix, MacOS, and Plan9 that delimit lines with a single
character, and that encode that character in C as '\n', do not need
C<binmode>. The rest need it. If FILEHANDLE is an expression, the value
is taken as the name of the filehandle.
@@ -392,17 +402,17 @@ an object in the CLASSNAME package--or the current package if no CLASSNAME
is specified, which is often the case. It returns the reference for
convenience, because a bless() is often the last thing in a constructor.
Always use the two-argument version if the function doing the blessing
-might be inherited by a derived class. See L<perlobj> for more about the
-blessing (and blessings) of objects.
+might be inherited by a derived class. See L<perltoot> and L<perlobj>
+for more about the blessing (and blessings) of objects.
=item caller EXPR
=item caller
-Returns the context of the current subroutine call. In a scalar context,
+Returns the context of the current subroutine call. In scalar context,
returns the caller's package name if there is a caller, that is, if
we're in a subroutine or eval() or require(), and the undefined value
-otherwise. In a list context, returns
+otherwise. In list context, returns
($package, $filename, $line) = caller;
@@ -464,7 +474,7 @@ VARIABLE is omitted, it chomps $_. Example:
while (<>) {
chomp; # avoid \n on last field
@array = split(/:/);
- ...
+ # ...
}
You can actually chomp anything that's an lvalue, including an assignment:
@@ -490,7 +500,7 @@ Example:
while (<>) {
chop; # avoid \n on last field
@array = split(/:/);
- ...
+ #...
}
You can actually chop anything that's an lvalue, including an assignment:
@@ -517,13 +527,13 @@ Here's an example that looks up nonnumeric uids in the passwd file:
print "User: ";
chop($user = <STDIN>);
- print "Files: "
+ print "Files: ";
chop($pattern = <STDIN>);
($login,$pass,$uid,$gid) = getpwnam($user)
or die "$user not in passwd file";
- @ary = <${pattern}>; # expand filenames
+ @ary = glob($pattern); # expand filenames
chown $uid, $gid, @ary;
On most systems, you are not allowed to change the ownership of the
@@ -544,12 +554,12 @@ If NUMBER is omitted, uses $_.
=item chroot
-This function works as the system call by the same name: it makes the
+This function works like the system call by the same name: it makes the
named directory the new root directory for all further pathnames that
-begin with a "/" by your process and all of its children. (It doesn't
+begin with a "/" by your process and all its children. (It doesn't
change your current working directory, which is unaffected.) For security
reasons, this call is restricted to the superuser. If FILENAME is
-omitted, does chroot to $_.
+omitted, does a chroot to $_.
=item close FILEHANDLE
@@ -565,26 +575,32 @@ counter ($.), while the implicit close done by open() does not.
If the file handle came from a piped open C<close> will additionally
return FALSE if one of the other system calls involved fails or if the
program exits with non-zero status. (If the only problem was that the
-program exited non-zero $! will be set to 0.) Also, closing a pipe will
-wait for the process executing on the pipe to complete, in case you
+program exited non-zero $! will be set to 0.) Also, closing a pipe
+waits for the process executing on the pipe to complete, in case you
want to look at the output of the pipe afterwards. Closing a pipe
explicitly also puts the exit status value of the command into C<$?>.
+
Example:
open(OUTPUT, '|sort >foo') # pipe to sort
or die "Can't start sort: $!";
- ... # print stuff to output
+ #... # print stuff to output
close OUTPUT # wait for sort to finish
or warn $! ? "Error closing sort pipe: $!"
: "Exit status $? from sort";
open(INPUT, 'foo') # get sort's results
or die "Can't open 'foo' for input: $!";
-FILEHANDLE may be an expression whose value gives the real filehandle name.
+FILEHANDLE may be an expression whose value can be used as an indirect
+filehandle, usually the real filehandle name.
=item closedir DIRHANDLE
-Closes a directory opened by opendir().
+Closes a directory opened by opendir() and returns the success of that
+system call.
+
+DIRHANDLE may be an expression whose value can be used as an indirect
+dirhandle, usually the real dirhandle name.
=item connect SOCKET,NAME
@@ -624,7 +640,7 @@ to check the condition at the top of the loop.
=item cos EXPR
-Returns the cosine of EXPR (expressed in radians). If EXPR is omitted
+Returns the cosine of EXPR (expressed in radians). If EXPR is omitted,
takes cosine of $_.
For the inverse cosine operation, you may use the POSIX::acos()
@@ -725,10 +741,10 @@ doesn't I<necessarily> indicate an exceptional condition: pop()
returns C<undef> when its argument is an empty array, I<or> when the
element to return happens to be C<undef>.
-You may also use defined() to check whether a subroutine exists. On
-the other hand, use of defined() upon aggregates (hashes and arrays)
-is not guaranteed to produce intuitive results, and should probably be
-avoided.
+You may also use defined() to check whether a subroutine exists, by
+saying C<defined &func> without parentheses. On the other hand, use
+of defined() upon aggregates (hashes and arrays) is not guaranteed to
+produce intuitive results, and should probably be avoided.
When used on a hash element, it tells you whether the value is defined,
not whether the key exists in the hash. Use L</exists> for the latter
@@ -749,7 +765,7 @@ defined values. For example, if you say
"ab" =~ /a(.*)b/;
-the pattern match succeeds, and $1 is defined, despite the fact that it
+The pattern match succeeds, and $1 is defined, despite the fact that it
matched "nothing". But it didn't really match nothing--rather, it
matched something that happened to be 0 characters long. This is all
very above-board and honest. When a function returns an undefined value,
@@ -768,11 +784,12 @@ should instead use a simple test for size:
if (%a_hash) { print "has hash members\n" }
Using undef() on these, however, does clear their memory and then report
-them as not defined anymore, but you shoudln't do that unless you don't
+them as not defined anymore, but you shouldn't do that unless you don't
plan to use them again, because it saves time when you load them up
-again to have memory already ready to be filled.
+again to have memory already ready to be filled. The normal way to
+free up space used by an aggregate is to assign the empty list.
-This counterintuitive behaviour of defined() on aggregates may be
+This counterintuitive behavior of defined() on aggregates may be
changed, fixed, or broken in a future release of Perl.
See also L</undef>, L</exists>, L</ref>.
@@ -796,20 +813,20 @@ And so does this:
delete @HASH{keys %HASH}
-(But both of these are slower than the undef() command.) Note that the
-EXPR can be arbitrarily complicated as long as the final operation is a
-hash element lookup or hash slice:
+(But both of these are slower than just assigning the empty list, or
+using undef().) Note that the EXPR can be arbitrarily complicated as
+long as the final operation is a hash element lookup or hash slice:
delete $ref->[$x][$y]{$key};
delete @{$ref->[$x][$y]}{$key1, $key2, @morekeys};
=item die LIST
-Outside of an eval(), prints the value of LIST to C<STDERR> and exits with
+Outside an eval(), prints the value of LIST to C<STDERR> and exits with
the current value of C<$!> (errno). If C<$!> is 0, exits with the value of
C<($? E<gt>E<gt> 8)> (backtick `command` status). If C<($? E<gt>E<gt> 8)>
is 0, exits with 255. Inside an eval(), the error message is stuffed into
-C<$@>, and the eval() is terminated with the undefined value; this makes
+C<$@> and the eval() is terminated with the undefined value. This makes
die() the way to raise an exception.
Equivalent examples:
@@ -842,7 +859,7 @@ This is useful for propagating exceptions:
If $@ is empty then the string "Died" is used.
-You can arrange for a callback to be called just before the die() does
+You can arrange for a callback to be run just before the die() does
its deed, by setting the C<$SIG{__DIE__}> hook. The associated handler
will be called with the error text and can change the error message, if
it sees fit, by calling die() again. See L<perlvar/$SIG{expr}> for details on
@@ -879,7 +896,7 @@ is just like
scalar eval `cat stat.pl`;
-except that it's more efficient, more concise, keeps track of the
+except that it's more efficient and concise, keeps track of the
current filename for error messages, and searches all the B<-I>
libraries if the file isn't in the current directory (see also the @INC
array in L<perlvar/Predefined Names>). It is also different in how
@@ -889,9 +906,21 @@ reparse the file every time you call it, so you probably don't want to
do this inside a loop.
Note that inclusion of library modules is better done with the
-use() and require() operators, which also do error checking
+use() and require() operators, which also do automatic error checking
and raise an exception if there's a problem.
+You might like to use C<do> to read in a program configuration
+file. Manual error checking can be done this way:
+
+ # read in config files: system first, then user
+ for $file ('/share/prog/defaults.rc", "$ENV{HOME}/.someprogrc") {
+ unless ($return = do $file) {
+ warn "couldn't parse $file: $@" if $@;
+ warn "couldn't do $file: $!" unless defined $return;
+ warn "couldn't run $file" unless $return;
+ }
+ }
+
=item dump LABEL
This causes an immediate core dump. Primarily this is so that you can
@@ -900,7 +929,7 @@ after having initialized all your variables at the beginning of the
program. When the new binary is executed it will begin by executing a
C<goto LABEL> (with all the restrictions that C<goto> suffers). Think of
it as a goto with an intervening core dump and reincarnation. If LABEL
-is omitted, restarts the program from the top. WARNING: any files
+is omitted, restarts the program from the top. WARNING: Any files
opened at the time of the dump will NOT be open any more when the
program is reincarnated, with possible resulting confusion on the part
of Perl. See also B<-u> option in L<perlrun>.
@@ -925,18 +954,22 @@ Example:
QUICKSTART:
Getopt('f');
+This operator is largely obsolete, partly because it's very hard to
+convert a core file into an executable, and because the real perl-to-C
+compiler has superseded it.
+
=item each HASH
-When called in a list context, returns a 2-element list consisting of the
+When called in list context, returns a 2-element list consisting of the
key and value for the next element of a hash, so that you can iterate over
-it. When called in a scalar context, returns the key for only the next
+it. When called in scalar context, returns the key for only the "next"
element in the hash. (Note: Keys may be "0" or "", which are logically
false; you may wish to avoid constructs like C<while ($k = each %foo) {}>
for this reason.)
Entries are returned in an apparently random order. When the hash is
entirely read, a null array is returned in list context (which when
-assigned produces a FALSE (0) value), and C<undef> is returned in a
+assigned produces a FALSE (0) value), and C<undef> in
scalar context. The next call to each() after that will start iterating
again. There is a single iterator for each hash, shared by all each(),
keys(), and values() function calls in the program; it can be reset by
@@ -961,14 +994,14 @@ See also keys() and values().
Returns 1 if the next read on FILEHANDLE will return end of file, or if
FILEHANDLE is not open. FILEHANDLE may be an expression whose value
-gives the real filehandle name. (Note that this function actually
-reads a character and then ungetc()s it, so it is not very useful in an
+gives the real filehandle. (Note that this function actually
+reads a character and then ungetc()s it, so isn't very useful in an
interactive context.) Do not read from a terminal file (or call
C<eof(FILEHANDLE)> on it) after end-of-file is reached. Filetypes such
as terminals may lose the end-of-file condition if you do.
An C<eof> without an argument uses the last file read as argument.
-Empty parentheses () may be used to indicate the pseudo file formed of
+Using C<eof()> with empty parentheses is very different. It indicates the pseudo file formed of
the files listed on the command line, i.e., C<eof()> is reasonable to
use inside a C<while (E<lt>E<gt>)> loop to detect the end of only the
last file. Use C<eof(ARGV)> or eof without the parentheses to test
@@ -976,13 +1009,15 @@ I<EACH> file in a while (E<lt>E<gt>) loop. Examples:
# reset line numbering on each input file
while (<>) {
+ next if /^\s*#/; # skip comments
print "$.\t$_";
- close(ARGV) if (eof); # Not eof().
+ } continue {
+ close ARGV if eof; # Not eof()!
}
# insert dashes just before last line of last file
while (<>) {
- if (eof()) {
+ if (eof()) { # check for end of current file
print "--------------\n";
close(ARGV); # close or break; is needed if we
# are reading from the terminal
@@ -991,7 +1026,7 @@ I<EACH> file in a while (E<lt>E<gt>) loop. Examples:
}
Practical hint: you almost never need to use C<eof> in Perl, because the
-input operators return undef when they run out of data.
+input operators return C<undef> when they run out of data.
=item eval EXPR
@@ -999,7 +1034,7 @@ input operators return undef when they run out of data.
In the first form, the return value of EXPR is parsed and executed as if it
were a little Perl program. The value of the expression (which is itself
-determined within a scalar context) is first parsed, and if there are no
+determined within scalar context) is first parsed, and if there weren't any
errors, executed in the context of the current Perl program, so that any
variable settings or subroutine and format definitions remain afterwards.
Note that the value is parsed every time the eval executes. If EXPR is
@@ -1017,9 +1052,9 @@ The final semicolon, if any, may be omitted from the value of EXPR or within
the BLOCK.
In both forms, the value returned is the value of the last expression
-evaluated inside the mini-program, or a return statement may be used, just
+evaluated inside the mini-program; a return statement may be also used, just
as with subroutines. The expression providing the return value is evaluated
-in void, scalar or array context, depending on the context of the eval itself.
+in void, scalar, or list context, depending on the context of the eval itself.
See L</wantarray> for more on how the evaluation context can be determined.
If there is a syntax error or runtime error, or a die() statement is
@@ -1047,7 +1082,7 @@ Examples:
eval '$answer = $a / $b'; warn $@ if $@;
# a compile-time error
- eval { $answer = };
+ eval { $answer = }; # WRONG
# a run-time error
eval '$answer ='; # sets $@
@@ -1079,7 +1114,7 @@ being looked at when:
eval '$x'; # CASE 3
eval { $x }; # CASE 4
- eval "\$$x++" # CASE 5
+ eval "\$$x++"; # CASE 5
$$x++; # CASE 6
Cases 1 and 2 above behave identically: they run the code contained in
@@ -1103,24 +1138,24 @@ returns FALSE only if the command does not exist I<and> it is executed
directly instead of via your system's command shell (see below).
Since it's a common mistake to use system() instead of exec(), Perl
-warns you if there is a following statement which isn't die(), warn()
+warns you if there is a following statement which isn't die(), warn(),
or exit() (if C<-w> is set - but you always do that). If you
I<really> want to follow an exec() with some other statement, you
can use one of these styles to avoid the warning:
- exec ('foo') or print STDERR "couldn't exec foo";
- { exec ('foo') }; print STDERR "couldn't exec foo";
+ exec ('foo') or print STDERR "couldn't exec foo: $!";
+ { exec ('foo') }; print STDERR "couldn't exec foo: $!";
-If there is more than one argument in LIST, or if LIST is an array with
-more than one value, calls execvp(3) with the arguments in LIST. If
-there is only one scalar argument, the argument is checked for shell
-metacharacters, and if there are any, the entire argument is passed to
-the system's command shell for parsing (this is C</bin/sh -c> on Unix
-platforms, but varies on other platforms). If there are no shell
-metacharacters in the argument, it is split into words and passed
-directly to execvp(), which is more efficient. Note: exec() and
-system() do not flush your output buffer, so you may need to set C<$|>
-to avoid lost output. Examples:
+If there is more than one argument in LIST, or if LIST is an array
+with more than one value, calls execvp(3) with the arguments in LIST.
+If there is only one scalar argument or an array with one element in it,
+the argument is checked for shell metacharacters, and if there are any,
+the entire argument is passed to the system's command shell for parsing
+(this is C</bin/sh -c> on Unix platforms, but varies on other platforms).
+If there are no shell metacharacters in the argument, it is split into
+words and passed directly to execvp(), which is more efficient. Note:
+exec() and system() do not flush your output buffer, so you may need to
+set C<$|> to avoid lost output. Examples:
exec '/bin/echo', 'Your arguments are: ', @ARGV;
exec "sort $outfile | uniq";
@@ -1143,6 +1178,21 @@ When the arguments get executed via the system shell, results will
be subject to its quirks and capabilities. See L<perlop/"`STRING`">
for details.
+Using an indirect object with C<exec> or C<system> is also more secure.
+This usage forces interpretation of the arguments as a multivalued list,
+even if the list had just one argument. That way you're safe from the
+shell expanding wildcards or splitting up words with whitespace in them.
+
+ @args = ( "echo surprise" );
+
+ system @args; # subject to shell escapes if @args == 1
+ system { $args[0] } @args; # safe even with one-arg list
+
+The first version, the one without the indirect object, ran the I<echo>
+program, passing it C<"surprise"> an argument. The second version
+didn't--it tried to run a program literally called I<"echo surprise">,
+didn't find it, and set C<$?> to a non-zero value indicating failure.
+
=item exists EXPR
Returns TRUE if the specified hash key exists in its hash array, even
@@ -1158,7 +1208,13 @@ it exists, but the reverse doesn't necessarily hold true.
Note that the EXPR can be arbitrarily complicated as long as the final
operation is a hash key lookup:
- if (exists $ref->[$x][$y]{$key}) { ... }
+ if (exists $ref->{"A"}{"B"}{$key}) { ... }
+
+Although the last element will not spring into existence just because its
+existence was tested, intervening ones will. Thus C<$ref-E<gt>{"A"}>
+C<$ref-E<gt>{"B"}> will spring into existence due to the existence
+test for a $key element. This autovivification may be fixed in a later
+release.
=item exit EXPR
@@ -1179,6 +1235,8 @@ You shouldn't use exit() to abort a subroutine if there's any chance that
someone might want to trap whatever error happened. Use die() instead,
which can be trapped by an eval().
+All C<END{}> blocks are run at exit time. See L<perlsub> for details.
+
=item exp EXPR
=item exp
@@ -1193,18 +1251,36 @@ Implements the fcntl(2) function. You'll probably have to say
use Fcntl;
first to get the correct function definitions. Argument processing and
-value return works just like ioctl() below. Note that fcntl() will produce
-a fatal error if used on a machine that doesn't implement fcntl(2).
+value return works just like ioctl() below.
For example:
use Fcntl;
- fcntl($filehandle, F_GETLK, $packed_return_buffer);
+ fcntl($filehandle, F_GETFL, $packed_return_buffer)
+ or die "can't fcntl F_GETFL: $!";
+
+You don't have to check for C<defined> on the return from
+fnctl. Like ioctl, it maps a 0 return from the system
+call into "0 but true" in Perl. This string is true in
+boolean context and 0 in numeric context. It is also
+exempt from the normal B<-w> warnings on improper numeric
+conversions.
+
+Note that fcntl() will produce a fatal error if used on a machine that
+doesn't implement fcntl(2).
=item fileno FILEHANDLE
Returns the file descriptor for a filehandle. This is useful for
-constructing bitmaps for select(). If FILEHANDLE is an expression, the
-value is taken as the name of the filehandle.
+constructing bitmaps for select() and low-level POSIX tty-handling
+operations. If FILEHANDLE is an expression, the value is taken as
+an indirect filehandle, generally its name.
+
+You can use this to find out whether two handles refer to the
+same underlying descriptor:
+
+ if (fileno(THIS) == fileno(THAT)) {
+ print "THIS and THAT are dups\n";
+ }
=item flock FILEHANDLE,OPERATION
@@ -1215,10 +1291,11 @@ is Perl's portable file locking interface, although it locks only entire
files, not records.
On many platforms (including most versions or clones of Unix), locks
-established by flock() are B<merely advisory>. This means that files
-locked with flock() may be modified by programs which do not also use
-flock(). Windows NT and OS/2, however, are among the platforms which
-supply mandatory locking. See your local documentation for details.
+established by flock() are B<merely advisory>. Such discretionary locks
+are more flexible, but offer fewer guarantees. This means that files
+locked with flock() may be modified by programs that do not also use
+flock(). Windows NT and OS/2 are among the platforms which
+enforce mandatory locking. See your local documentation for details.
OPERATION is one of LOCK_SH, LOCK_EX, or LOCK_UN, possibly combined with
LOCK_NB. These constants are traditionally valued 1, 2, 8 and 4, but
@@ -1271,8 +1348,9 @@ See also L<DB_File> for other flock() examples.
=item fork
-Does a fork(2) system call. Returns the child pid to the parent process
-and 0 to the child process, or C<undef> if the fork is unsuccessful.
+Does a fork(2) system call. Returns the child pid to the parent process,
+0 to the child process, or C<undef> if the fork is unsuccessful.
+
Note: unflushed buffers remain unflushed in both processes, which means
you may need to set C<$|> ($AUTOFLUSH in English) or call the autoflush()
method of IO::Handle to avoid duplicate output.
@@ -1302,7 +1380,7 @@ moribund children.
Note that if your forked child inherits system file descriptors like
STDIN and STDOUT that are actually connected by a pipe or socket, even
-if you exit, the remote server (such as, say, httpd or rsh) won't think
+if you exit, then the remote server (such as, say, httpd or rsh) won't think
you're done. You should reopen those to /dev/null if it's any issue.
=item format
@@ -1322,10 +1400,9 @@ example:
See L<perlform> for many details and examples.
-
=item formline PICTURE,LIST
-This is an internal function used by C<format>s, though you may call it
+This is an internal function used by C<format>s, though you may call it,
too. It formats (see L<perlform>) a list of values according to the
contents of PICTURE, placing the output into the format output
accumulator, C<$^A> (or $ACCUMULATOR in English).
@@ -1372,14 +1449,15 @@ Determination of whether $BSD_STYLE should be set
is left as an exercise to the reader.
The POSIX::getattr() function can do this more portably on systems
-alleging POSIX compliance.
+purporting POSIX compliance.
See also the C<Term::ReadKey> module from your nearest CPAN site;
details on CPAN can be found on L<perlmod/CPAN>.
=item getlogin
-Returns the current login from F</etc/utmp>, if any. If null, use
-getpwuid().
+Implements the C library function of the same name, which on most
+systems returns the current login from F</etc/utmp>, if any. If null,
+use getpwuid().
$login = getlogin || getpwuid($<) || "Kilroy";
@@ -1476,7 +1554,7 @@ machine that doesn't implement getpriority(2).
=item endservent
These routines perform the same functions as their counterparts in the
-system library. Within a list context, the return values from the
+system library. In list context, the return values from the
various get routines are as follows:
($name,$passwd,$uid,$gid,
@@ -1489,17 +1567,17 @@ various get routines are as follows:
(If the entry doesn't exist you get a null list.)
-Within a scalar context, you get the name, unless the function was a
+In scalar context, you get the name, unless the function was a
lookup by name, in which case you get the other thing, whatever it is.
(If the entry doesn't exist you get the undefined value.) For example:
- $uid = getpwnam
- $name = getpwuid
- $name = getpwent
- $gid = getgrnam
- $name = getgrgid
- $name = getgrent
- etc.
+ $uid = getpwnam($name);
+ $name = getpwuid($num);
+ $name = getpwent();
+ $gid = getgrnam($name);
+ $name = getgrgid($num;
+ $name = getgrent();
+ #etc.
In I<getpw*()> the fields $quota, $comment, and $expire are special
cases in the sense that in many systems they are unsupported. If the
@@ -1529,6 +1607,20 @@ by saying something like:
($a,$b,$c,$d) = unpack('C4',$addr[0]);
+If you get tired of remembering which element of the return list contains
+which return value, by-name interfaces are also provided in modules:
+File::stat, Net::hostent, Net::netent, Net::protoent, Net::servent,
+Time::gmtime, Time::localtime, and User::grent. These override the
+normal built-in, replacing them with versions that return objects with
+the appropriate names for each field. For example:
+
+ use File::stat;
+ use User::pwent;
+ $is_his = (stat($filename)->uid == pwent($whoever)->uid);
+
+Even though it looks like they're the same method calls (uid),
+they aren't, because a File::stat object is different from a User::pwent object.
+
=item getsockname SOCKET
Returns the packed sockaddr address of this end of the SOCKET connection.
@@ -1539,13 +1631,13 @@ Returns the packed sockaddr address of this end of the SOCKET connection.
=item getsockopt SOCKET,LEVEL,OPTNAME
-Returns the socket option requested, or undefined if there is an error.
+Returns the socket option requested, or undef if there is an error.
=item glob EXPR
=item glob
-Returns the value of EXPR with filename expansions such as a shell would
+Returns the value of EXPR with filename expansions such as the standard Unix shell /bin/sh would
do. This is the internal function implementing the C<E<lt>*.cE<gt>>
operator, but you can use it directly. If EXPR is omitted, $_ is used.
The C<E<lt>*.cE<gt>> operator is discussed in more detail in
@@ -1568,7 +1660,7 @@ years since 1900, I<not> simply the last two digits of the year.
If EXPR is omitted, does C<gmtime(time())>.
-In a scalar context, returns the ctime(3) value:
+In scalar context, returns the ctime(3) value:
$now_string = gmtime; # e.g., "Thu Oct 13 04:54:34 1994"
@@ -1648,7 +1740,7 @@ see L</oct>.) If EXPR is omitted, uses $_.
=item import
-There is no builtin import() function. It is merely an ordinary
+There is no builtin import() function. It is just an ordinary
method (subroutine) defined (or inherited) by modules that wish to export
names to another module. The use() function calls the import() method
for the package used. See also L</use()>, L<perlmod>, and L<Exporter>.
@@ -1668,6 +1760,10 @@ one less than the base, ordinarily -1.
=item int
Returns the integer portion of EXPR. If EXPR is omitted, uses $_.
+You should not use this for rounding, because it truncates
+towards 0, and because machine representations of floating point
+numbers can sometimes produce counterintuitive results. Usually sprintf() or printf(),
+or the POSIX::floor or POSIX::ceil functions, would serve you better.
=item ioctl FILEHANDLE,FUNCTION,SCALAR
@@ -1678,7 +1774,7 @@ Implements the ioctl(2) function. You'll probably have to say
first to get the correct function definitions. If F<ioctl.ph> doesn't
exist or doesn't have the correct definitions you'll have to roll your
own, based on your C header files such as F<E<lt>sys/ioctl.hE<gt>>.
-(There is a Perl script called B<h2ph> that comes with the Perl kit which
+(There is a Perl script called B<h2ph> that comes with the Perl kit that
may help you in this, but it's nontrivial.) SCALAR will be read and/or
written depending on the FUNCTION--a pointer to the string value of SCALAR
will be passed as the third argument of the actual ioctl call. (If SCALAR
@@ -1714,6 +1810,9 @@ system:
($retval = ioctl(...)) || ($retval = -1);
printf "System returned %d\n", $retval;
+The special string "0 but true" is excempt from B<-w> complaints
+about improper numeric conversions.
+
=item join EXPR,LIST
Joins the separate strings of LIST into a single string with
@@ -1749,7 +1848,7 @@ or how about sorted by key:
To sort an array by value, you'll need to use a C<sort> function.
Here's a descending numeric sort of a hash by its values:
- foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash)) {
+ foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) {
printf "%4d %s\n", $hash{$key}, $key;
}
@@ -1760,7 +1859,8 @@ an array by assigning a larger number to $#array.) If you say
keys %hash = 200;
-then C<%hash> will have at least 200 buckets allocated for it. These
+then C<%hash> will have at least 200 buckets allocated for it--256 of them, in fact, since
+it rounds up to the next power of two. These
buckets will be retained even if you do C<%hash = ()>, use C<undef
%hash> if you want to free the storage while C<%hash> is still in scope.
You can't shrink the number of buckets allocated for the hash using
@@ -1793,7 +1893,7 @@ C<continue> block, if any, is not executed:
LINE: while (<STDIN>) {
last LINE if /^$/; # exit when done with header
- ...
+ #...
}
See also L</continue> for an illustration of how C<last>, C<next>, and
@@ -1823,13 +1923,13 @@ If EXPR is omitted, uses $_.
=item length
-Returns the length in characters of the value of EXPR. If EXPR is
+Returns the length in bytes of the value of EXPR. If EXPR is
omitted, returns length of $_.
=item link OLDFILE,NEWFILE
-Creates a new filename linked to the old filename. Returns 1 for
-success, 0 otherwise.
+Creates a new filename linked to the old filename. Returns TRUE for
+success, FALSE otherwise.
=item listen SOCKET,QUEUESIZE
@@ -1838,10 +1938,10 @@ it succeeded, FALSE otherwise. See example in L<perlipc/"Sockets: Client/Server
=item local EXPR
-A local modifies the listed variables to be local to the enclosing block,
-subroutine, C<eval{}>, or C<do>. If more than one value is listed, the
-list must be placed in parentheses. See L<perlsub/"Temporary Values via
-local()"> for details, including issues with tied arrays and hashes.
+A local modifies the listed variables to be local to the enclosing
+block, file, or eval. If more than one value is listed, the list must
+be placed in parentheses. See L<perlsub/"Temporary Values via local()">
+for details, including issues with tied arrays and hashes.
You really probably want to be using my() instead, because local() isn't
what most people think of as "local". See L<perlsub/"Private Variables
@@ -1864,7 +1964,7 @@ years since 1900, that is, $year is 123 in year 2023.
If EXPR is omitted, uses the current time (C<localtime(time)>).
-In a scalar context, returns the ctime(3) value:
+In scalar context, returns the ctime(3) value:
$now_string = localtime; # e.g., "Thu Oct 13 04:54:34 1994"
@@ -1873,9 +1973,9 @@ instead a Perl builtin. Also see the Time::Local module, and the
strftime(3) and mktime(3) function available via the POSIX module. To
get somewhat similar but locale dependent date strings, set up your
locale environment variables appropriately (please see L<perllocale>)
-and try for example
+and try for example:
- use POSIX qw(strftime)
+ use POSIX qw(strftime);
$now_string = strftime "%a %b %e %H:%M:%S %Y", localtime;
Note that the C<%a> and C<%b>, the short forms of the day of the week
@@ -1885,7 +1985,7 @@ and the month of the year, may not necessarily be three characters wide.
=item log
-Returns logarithm (base I<e>) of EXPR. If EXPR is omitted, returns log
+Returns the natural logarithm (base I<e>) of EXPR. If EXPR is omitted, returns log
of $_.
=item lstat FILEHANDLE
@@ -1894,9 +1994,10 @@ of $_.
=item lstat
-Does the same thing as the stat() function, but stats a symbolic link
-instead of the file the symbolic link points to. If symbolic links are
-unimplemented on your system, a normal stat() is done.
+Does the same thing as the stat() function (including setting the
+special C<_> filehandle) but stats a symbolic link instead of the file
+the symbolic link points to. If symbolic links are unimplemented on
+your system, a normal stat() is done.
If EXPR is omitted, stats $_.
@@ -1935,13 +2036,13 @@ original list for which the BLOCK or EXPR evaluates to true.
=item mkdir FILENAME,MODE
Creates the directory specified by FILENAME, with permissions specified
-by MODE (as modified by umask). If it succeeds it returns 1, otherwise
-it returns 0 and sets C<$!> (errno).
+by MODE (as modified by umask). If it succeeds it returns TRUE, otherwise
+it returns FALSE and sets C<$!> (errno).
=item msgctl ID,CMD,ARG
Calls the System V IPC function msgctl(2). If CMD is &IPC_STAT, then ARG
-must be a variable which will hold the returned msqid_ds structure.
+must be a variable that will hold the returned msqid_ds structure.
Returns like ioctl: the undefined value for error, "0 but true" for
zero, or the actual return value otherwise.
@@ -1969,7 +2070,7 @@ an error.
=item my EXPR
A "my" declares the listed variables to be local (lexically) to the
-enclosing block, subroutine, C<eval>, or C<do/require/use>'d file. If
+enclosing block, file, or C<eval>. If
more than one value is listed, the list must be placed in parentheses. See
L<perlsub/"Private Variables via my()"> for details.
@@ -1982,7 +2083,7 @@ the next iteration of the loop:
LINE: while (<STDIN>) {
next LINE if /^#/; # discard comments
- ...
+ #...
}
Note that if there were a C<continue> block on the above, it would get
@@ -2030,17 +2131,20 @@ output. If the filename begins with '>>', the file is opened for
appending. You can put a '+' in front of the '>' or '<' to indicate that
you want both read and write access to the file; thus '+<' is almost
always preferred for read/write updates--the '+>' mode would clobber the
-file first. The prefix and the filename may be separated with spaces.
+file first. You can't usually use either read-write mode for updating
+textfiles, since they have variable length records. See the B<-i>
+switch in L<perlrun> for a better approach.
+
+The prefix and the filename may be separated with spaces.
These various prefixes correspond to the fopen(3) modes of 'r', 'r+', 'w',
'w+', 'a', and 'a+'.
-If the filename begins with "|", the filename is interpreted as a command
-to which output is to be piped, and if the filename ends with a "|", the
-filename is interpreted See L<perlipc/"Using open() for IPC"> for more
-examples of this. as command which pipes input to us. (You may not have
-a raw open() to a command that pipes both in I<and> out, but see
-L<IPC::Open2>, L<IPC::Open3>, and L<perlipc/"Bidirectional Communication">
-for alternatives.)
+If the filename begins with "|", the filename is interpreted as a
+command to which output is to be piped, and if the filename ends with a
+"|", the filename is interpreted See L<perlipc/"Using open() for IPC">
+for more examples of this. (You are not allowed to open() to a command
+that pipes both in I<and> out, but see L<IPC::Open2>, L<IPC::Open3>,
+and L<perlipc/"Bidirectional Communication"> for alternatives.)
Opening '-' opens STDIN and opening 'E<gt>-' opens STDOUT. Open returns
nonzero upon success, the undefined value otherwise. If the open
@@ -2051,15 +2155,15 @@ If you're unfortunate enough to be running Perl on a system that
distinguishes between text files and binary files (modern operating
systems don't care), then you should check out L</binmode> for tips for
dealing with this. The key distinction between systems that need binmode
-and those that don't is their text file formats. Systems like Unix and
-Plan9 that delimit lines with a single character, and that encode that
+and those that don't is their text file formats. Systems like Unix, MacOS, and
+Plan9, which delimit lines with a single character, and which encode that
character in C as '\n', do not need C<binmode>. The rest need it.
When opening a file, it's usually a bad idea to continue normal execution
if the request failed, so C<open> is frequently used in connection with
C<die>. Even if C<die> won't do what you want (say, in a CGI script,
where you want to make a nicely formatted error message (but there are
-modules which can help with that problem)) you should always check
+modules that can help with that problem)) you should always check
the return value from opening a file. The infrequent exception is when
working with an unopened filehandle is actually what you want to do.
@@ -2088,25 +2192,26 @@ Examples:
}
sub process {
- local($filename, $input) = @_;
+ my($filename, $input) = @_;
$input++; # this is a string increment
unless (open($input, $filename)) {
print STDERR "Can't open $filename: $!\n";
return;
}
+ local $_;
while (<$input>) { # note use of indirection
if (/^#include "(.*)"/) {
process($1, $input);
next;
}
- ... # whatever
+ #... # whatever
}
}
You may also, in the Bourne shell tradition, specify an EXPR beginning
with "E<gt>&", in which case the rest of the string is interpreted as the
-name of a filehandle (or file descriptor, if numeric) which is to be
+name of a filehandle (or file descriptor, if numeric) to be
duped and opened. You may use & after E<gt>, E<gt>E<gt>, E<lt>, +E<gt>,
+E<gt>E<gt>, and +E<lt>. The
mode you specify should match the mode of the original filehandle.
@@ -2116,8 +2221,8 @@ Here is a script that saves, redirects, and restores STDOUT and
STDERR:
#!/usr/bin/perl
- open(SAVEOUT, ">&STDOUT");
- open(SAVEERR, ">&STDERR");
+ open(OLDOUT, ">&STDOUT");
+ open(OLDERR, ">&STDERR");
open(STDOUT, ">foo.out") || die "Can't redirect stdout";
open(STDERR, ">&STDOUT") || die "Can't dup stdout";
@@ -2131,8 +2236,8 @@ STDERR:
close(STDOUT);
close(STDERR);
- open(STDOUT, ">&SAVEOUT");
- open(STDERR, ">&SAVEERR");
+ open(STDOUT, ">&OLDOUT");
+ open(STDERR, ">&OLDERR");
print STDOUT "stdout 2\n";
print STDERR "stderr 2\n";
@@ -2165,21 +2270,47 @@ The following pairs are more or less equivalent:
See L<perlipc/"Safe Pipe Opens"> for more examples of this.
-NOTE: On any operation which may do a fork, unflushed buffers remain
+NOTE: On any operation that may do a fork, any unflushed buffers remain
unflushed in both processes, which means you may need to set C<$|> to
avoid duplicate output.
Closing any piped filehandle causes the parent process to wait for the
child to finish, and returns the status value in C<$?>.
+The filename passed to open will have leading and trailing
+whitespace deleted, and the normal redirection chararacters
+honored. This property, known as "magic open",
+can often be used to good effect. A user could specify a filename of
+"rsh cat file |", or you could change certain filenames as needed:
+
+ $filename =~ s/(.*\.gz)\s*$/gzip -dc < $1|/;
+ open(FH, $filename) or die "Can't open $filename: $!";
+
+However, to open a file with arbitrary weird characters in it, it's
+necessary to protect any leading and trailing whitespace:
+
+ $file =~ s#^(\s)#./$1#;
+ open(FOO, "< $file\0");
+
+If you want a "real" C open() (see L<open(2)> on your system), then you
+should use the sysopen() function, which involves no such magic. This is
+another way to protect your filenames from interpretation. For example:
+
+ use IO::Handle;
+ sysopen(HANDLE, $path, O_RDWR|O_CREAT|O_EXCL)
+ or die "sysopen $path: $!";
+ $oldfh = select(HANDLE); $| = 1; select($oldfh);
+ print HANDLE "stuff $$\n");
+ seek(HANDLE, 0, 0);
+ print "File contains: ", <HANDLE>;
+
Using the constructor from the IO::Handle package (or one of its
-subclasses, such as IO::File or IO::Socket),
-you can generate anonymous filehandles which have the scope of whatever
-variables hold references to them, and automatically close whenever
-and however you leave that scope:
+subclasses, such as IO::File or IO::Socket), you can generate anonymous
+filehandles that have the scope of whatever variables hold references to
+them, and automatically close whenever and however you leave that scope:
use IO::File;
- ...
+ #...
sub read_myfile_munged {
my $ALL = shift;
my $handle = new IO::File;
@@ -2191,26 +2322,6 @@ and however you leave that scope:
$first; # Or here.
}
-The filename that is passed to open will have leading and trailing
-whitespace deleted. To open a file with arbitrary weird
-characters in it, it's necessary to protect any leading and trailing
-whitespace thusly:
-
- $file =~ s#^(\s)#./$1#;
- open(FOO, "< $file\0");
-
-If you want a "real" C open() (see L<open(2)> on your system), then
-you should use the sysopen() function. This is another way to
-protect your filenames from interpretation. For example:
-
- use IO::Handle;
- sysopen(HANDLE, $path, O_RDWR|O_CREAT|O_EXCL, 0700)
- or die "sysopen $path: $!";
- HANDLE->autoflush(1);
- HANDLE->print("stuff $$\n");
- seek(HANDLE, 0, 0);
- print "File contains: ", <HANDLE>;
-
See L</seek()> for some details about mixing reading and writing.
=item opendir DIRHANDLE,EXPR
@@ -2283,7 +2394,7 @@ follows:
X Back up a byte.
@ Null fill to absolute position.
-Each letter may optionally be followed by a number which gives a repeat
+Each letter may optionally be followed by a number giving a repeat
count. With all types except "a", "A", "b", "B", "h", "H", and "P" the
pack function will gobble up that many values from the LIST. A * for the
repeat count means to use however many items are left. The "a" and "A"
@@ -2340,6 +2451,8 @@ Examples:
The same template may generally also be used in the unpack function.
+=item package
+
=item package NAMESPACE
Declares the compilation unit as being in the given namespace. The scope
@@ -2350,12 +2463,16 @@ statement affects only dynamic variables--including those you've used
local() on--but I<not> lexical variables created with my(). Typically it
would be the first declaration in a file to be included by the C<require>
or C<use> operator. You can switch into a package in more than one place;
-it influences merely which symbol table is used by the compiler for the
+it merely influences which symbol table is used by the compiler for the
rest of that block. You can refer to variables and filehandles in other
packages by prefixing the identifier with the package name and a double
colon: C<$Package::Variable>. If the package name is null, the C<main>
package as assumed. That is, C<$::sail> is equivalent to C<$main::sail>.
+If NAMESPACE is omitted, then there is no current package, and all
+identifiers must be fully qualified or lexicals. This is stricter
+than C<use strict>, since it also extends to function names.
+
See L<perlmod/"Packages"> for more information about packages, modules,
and classes. See L<perlsub> for other scoping issues.
@@ -2408,11 +2525,11 @@ token is a term, it may be misinterpreted as an operator unless you
interpose a + or put parentheses around the arguments.) If FILEHANDLE is
omitted, prints by default to standard output (or to the last selected
output channel--see L</select>). If LIST is also omitted, prints $_ to
-STDOUT. To set the default output channel to something other than
+the currently selected output channel. To set the default output channel to something other than
STDOUT use the select operation. Note that, because print takes a
-LIST, anything in the LIST is evaluated in a list context, and any
+LIST, anything in the LIST is evaluated in list context, and any
subroutine that you call will have one or more of its expressions
-evaluated in a list context. Also be careful not to follow the print
+evaluated in list context. Also be careful not to follow the print
keyword with a left parenthesis unless you want the corresponding right
parenthesis to terminate the arguments to the print--interpose a + or
put parentheses around all the arguments.
@@ -2434,7 +2551,7 @@ in effect, the character used for the decimal point in formatted real numbers
is affected by the LC_NUMERIC locale. See L<perllocale>.
Don't fall into the trap of using a printf() when a simple
-print() would do. The print() is more efficient, and less
+print() would do. The print() is more efficient and less
error prone.
=item prototype FUNCTION
@@ -2507,15 +2624,15 @@ specified FILEHANDLE. Returns the number of bytes actually read, or
undef if there was an error. SCALAR will be grown or shrunk to the
length actually read. An OFFSET may be specified to place the read
data at some other place than the beginning of the string. This call
-is actually implemented in terms of stdio's fread call. To get a true
-read system call, see sysread().
+is actually implemented in terms of stdio's fread(3) call. To get a true
+read(2) system call, see sysread().
=item readdir DIRHANDLE
Returns the next directory entry for a directory opened by opendir().
-If used in a list context, returns all the rest of the entries in the
+If used in list context, returns all the rest of the entries in the
directory. If there are no more entries, returns an undefined value in
-a scalar context or a null list in a list context.
+scalar context or a null list in list context.
If you're planning to filetest the return values out of a readdir(), you'd
better prepend the directory in question. Otherwise, because we didn't
@@ -2527,7 +2644,7 @@ chdir() there, it would have been testing the wrong file.
=item readline EXPR
-Reads from the file handle EXPR. In scalar context, a single line
+Reads from the filehandle whose typeglob is contained in EXPR. In scalar context, a single line
is read and returned. In list context, reads until end-of-file is
reached and returns a list of lines (however you've defined lines
with $/ or $INPUT_RECORD_SEPARATOR).
@@ -2535,6 +2652,9 @@ This is the internal function implementing the C<E<lt>EXPRE<gt>>
operator, but you can use it directly. The C<E<lt>EXPRE<gt>>
operator is discussed in more detail in L<perlop/"I/O Operators">.
+ $line = <STDIN>;
+ $line = readline(*STDIN); # same thing
+
=item readlink EXPR
=item readlink
@@ -2546,7 +2666,7 @@ omitted, uses $_.
=item readpipe EXPR
-EXPR is interpolated and then executed as a system command.
+EXPR is executed as a system command.
The collected standard output of the command is returned.
In scalar context, it comes back as a single (potentially
multi-line) string. In list context, returns a list of lines
@@ -2584,7 +2704,7 @@ themselves about what was just input:
$front = $_;
while (<STDIN>) {
if (/}/) { # end of comment?
- s|^|$front{|;
+ s|^|$front\{|;
redo LINE;
}
}
@@ -2617,7 +2737,7 @@ name is returned instead. You can think of ref() as a typeof() operator.
if (ref($r) eq "HASH") {
print "r is a reference to a hash.\n";
}
- if (!ref ($r) {
+ if (!ref($r)) {
print "r is not a reference at all.\n";
}
@@ -2642,9 +2762,9 @@ essentially just a variety of eval(). Has semantics similar to the following
subroutine:
sub require {
- local($filename) = @_;
+ my($filename) = @_;
return 1 if $INC{$filename};
- local($realfilename,$result);
+ my($realfilename,$result);
ITER: {
foreach $prefix (@INC) {
$realfilename = "$prefix/$filename";
@@ -2658,7 +2778,7 @@ subroutine:
die $@ if $@;
die "$filename did not return true value" unless $result;
$INC{$filename} = $realfilename;
- $result;
+ return $result;
}
Note that the file will not be included twice under the same specified
@@ -2675,20 +2795,20 @@ modules does not risk altering your namespace.
In other words, if you try this:
- require Foo::Bar ; # a splendid bareword
+ require Foo::Bar; # a splendid bareword
The require function will actually look for the "Foo/Bar.pm" file in the
directories specified in the @INC array.
-But if you try this :
+But if you try this:
$class = 'Foo::Bar';
- require $class ; # $class is not a bareword
-or
- require "Foo::Bar" ; # not a bareword because of the ""
+ require $class; # $class is not a bareword
+ #or
+ require "Foo::Bar"; # not a bareword because of the ""
The require function will look for the "Foo::Bar" file in the @INC array and
-will complain about not finding "Foo::Bar" there. In this case you can do :
+will complain about not finding "Foo::Bar" there. In this case you can do:
eval "require $class";
@@ -2720,20 +2840,20 @@ so you'll probably want to use them instead. See L</my>.
=item return
-Returns from a subroutine, eval(), or do FILE with the value of the
-given EXPR. Evaluation of EXPR may be in a list, scalar, or void
+Returns from a subroutine, eval(), or C<do FILE> with the value
+given in EXPR. Evaluation of EXPR may be in list, scalar, or void
context, depending on how the return value will be used, and the context
may vary from one execution to the next (see wantarray()). If no EXPR
-is given, returns an empty list in a list context, an undefined value in
-a scalar context, or nothing in a void context.
+is given, returns an empty list in list context, an undefined value in
+scalar context, or nothing in a void context.
(Note that in the absence of a return, a subroutine, eval, or do FILE
will automatically return the value of the last expression evaluated.)
=item reverse LIST
-In a list context, returns a list value consisting of the elements
-of LIST in the opposite order. In a scalar context, concatenates the
+In list context, returns a list value consisting of the elements
+of LIST in the opposite order. In scalar context, concatenates the
elements of LIST, and returns a string value consisting of those bytes,
but in the opposite order.
@@ -2767,8 +2887,8 @@ last occurrence at or before that position.
=item rmdir
-Deletes the directory specified by FILENAME if it is empty. If it
-succeeds it returns 1, otherwise it returns 0 and sets C<$!> (errno). If
+Deletes the directory specified by FILENAME if that directory is empty. If it
+succeeds it returns TRUE, otherwise it returns FALSE and sets C<$!> (errno). If
FILENAME is omitted, uses $_.
=item s///
@@ -2777,13 +2897,13 @@ The substitution operator. See L<perlop>.
=item scalar EXPR
-Forces EXPR to be interpreted in a scalar context and returns the value
+Forces EXPR to be interpreted in scalar context and returns the value
of EXPR.
@counts = ( scalar @a, scalar @b, scalar @c );
There is no equivalent operator to force an expression to
-be interpolated in a list context because it's in practice never
+be interpolated in list context because it's in practice never
needed. If you really wanted to do so, however, you could use
the construction C<@{[ (some expression) ]}>, but usually a simple
C<(some expression)> suffices.
@@ -2875,8 +2995,8 @@ If you want to select on many filehandles you might wish to write a
subroutine:
sub fhbits {
- local(@fhlist) = split(' ',$_[0]);
- local($bits);
+ my(@fhlist) = split(' ',$_[0]);
+ my($bits);
for (@fhlist) {
vec($bits,fileno($_),1) = 1;
}
@@ -2894,7 +3014,7 @@ or to block until something becomes ready just do this
$nfound = select($rout=$rin, $wout=$win, $eout=$ein, undef);
Most systems do not bother to return anything useful in $timeleft, so
-calling select() in a scalar context just returns $nfound.
+calling select() in scalar context just returns $nfound.
Any of the bit masks can also be undef. The timeout, if specified, is
in seconds, which may be fractional. Note: not all implementations are
@@ -2905,13 +3025,14 @@ You can effect a sleep of 250 milliseconds this way:
select(undef, undef, undef, 0.25);
-B<WARNING>: Do not attempt to mix buffered I/O (like read() or E<lt>FHE<gt>)
-with select(). You have to use sysread() instead.
+B<WARNING>: One should not attempt to mix buffered I/O (like read()
+or E<lt>FHE<gt>) with select(), except as permitted by POSIX, and even
+then only on POSIX systems. You have to use sysread() instead.
=item semctl ID,SEMNUM,CMD,ARG
Calls the System V IPC function semctl. If CMD is &IPC_STAT or
-&GETALL, then ARG must be a variable which will hold the returned
+&GETALL, then ARG must be a variable that will hold the returned
semid_ds structure or semaphore value array. Returns like ioctl: the
undefined value for error, "0 but true" for zero, or the actual return
value otherwise.
@@ -2984,7 +3105,7 @@ right end.
=item shmctl ID,CMD,ARG
Calls the System V IPC function shmctl. If CMD is &IPC_STAT, then ARG
-must be a variable which will hold the returned shmid_ds structure.
+must be a variable that will hold the returned shmid_ds structure.
Returns like ioctl: the undefined value for error, "0 but true" for
zero, or the actual return value otherwise.
@@ -2999,7 +3120,7 @@ segment id, or the undefined value if there is an error.
Reads or writes the System V shared memory segment ID starting at
position POS for size SIZE by attaching to it, copying in/out, and
-detaching from it. When reading, VAR must be a variable which will
+detaching from it. When reading, VAR must be a variable that will
hold the data read. When writing, if STRING is too long, only SIZE
bytes are used; if STRING is too short, nulls are written to fill out
SIZE bytes. Return TRUE if successful, or FALSE if there is an error.
@@ -3009,6 +3130,16 @@ SIZE bytes. Return TRUE if successful, or FALSE if there is an error.
Shuts down a socket connection in the manner indicated by HOW, which
has the same interpretation as in the system call of the same name.
+ shutdown(SOCKET, 0); # I/we have stopped reading data
+ shutdown(SOCKET, 1); # I/we have stopped writing data
+ shutdown(SOCKET, 2); # I/we have stopped using this socket
+
+This is useful with sockets when you want to tell the other
+side you're done writing but not done reading, or vice versa.
+It's also a more insistent form of close because it also
+disables the filedescriptor in any forked copies in other
+processes.
+
=item sin EXPR
=item sin
@@ -3033,7 +3164,9 @@ using alarm().
On some older systems, it may sleep up to a full second less than what
you requested, depending on how it counts seconds. Most modern systems
-always sleep the full amount.
+always sleep the full amount. They may appear to sleep longer than that,
+however, because your process might not be scheduled right away in a
+busy multitasking system.
For delays of finer granularity than one second, you may use Perl's
syscall() interface to access setitimer(2) if your system supports it,
@@ -3055,6 +3188,16 @@ specified type. DOMAIN, TYPE, and PROTOCOL are specified the same as
for the system call of the same name. If unimplemented, yields a fatal
error. Returns TRUE if successful.
+Some systems defined pipe() in terms of socketpair, in which a call
+to C<pipe(Rdr, Wtr)> is essentially:
+
+ use Socket;
+ socketpair(Rdr, Wtr, AF_UNIX, SOCK_STREAM, PF_UNSPEC);
+ shutdown(Rdr, 1); # no more writing for reader
+ shutdown(Wtr, 0); # no more reading for writer
+
+See L<perlipc> for an example of socketpair use.
+
=item sort SUBNAME LIST
=item sort BLOCK LIST
@@ -3186,8 +3329,8 @@ sanity checks in the interest of speed.
=item splice ARRAY,OFFSET
Removes the elements designated by OFFSET and LENGTH from an array, and
-replaces them with the elements of LIST, if any. In a list context,
-returns the elements removed from the array. In a scalar context,
+replaces them with the elements of LIST, if any. In list context,
+returns the elements removed from the array. In scalar context,
returns the last element removed, or C<undef> if no elements are
removed. The array grows or shrinks as necessary. If LENGTH is
omitted, removes everything from OFFSET onward. The following
@@ -3197,13 +3340,13 @@ equivalences hold (assuming C<$[ == 0>):
pop(@a) splice(@a,-1)
shift(@a) splice(@a,0,1)
unshift(@a,$x,$y) splice(@a,0,0,$x,$y)
- $a[$x] = $y splice(@a,$x,1,$y);
+ $a[$x] = $y splice(@a,$x,1,$y)
Example, assuming array lengths are passed before arrays:
sub aeq { # compare two list values
- local(@a) = splice(@_,0,shift);
- local(@b) = splice(@_,0,shift);
+ my(@a) = splice(@_,0,shift);
+ my(@b) = splice(@_,0,shift);
return 0 unless @a == @b; # same len?
while (@a) {
return 0 if pop(@a) ne pop(@b);
@@ -3220,19 +3363,21 @@ Example, assuming array lengths are passed before arrays:
=item split
-Splits a string into an array of strings, and returns it.
+Splits a string into an array of strings, and returns it. By default,
+empty leading fields are preserved, and empty trailing ones are deleted.
-If not in a list context, returns the number of fields found and splits into
-the @_ array. (In a list context, you can force the split into @_ by
+If not in list context, returns the number of fields found and splits into
+the @_ array. (In list context, you can force the split into @_ by
using C<??> as the pattern delimiters, but it still returns the list
-value.) The use of implicit split to @_ is deprecated, however.
+value.) The use of implicit split to @_ is deprecated, however, because
+it clobbers your subroutine arguments.
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted,
splits on whitespace (after skipping any leading whitespace). Anything
matching PATTERN is taken to be a delimiter separating the fields. (Note
that the delimiter may be longer than one character.)
-If LIMIT is specified and is positive, splits into no more than that
+If LIMIT is specified and positive, splits into no more than that
many fields (though it may split into fewer). If LIMIT is unspecified
or zero, trailing null fields are stripped (which potential users
of pop() would do well to remember). If LIMIT is negative, it is
@@ -3286,11 +3431,10 @@ really does a C<split(' ', $_)> internally.
Example:
- open(passwd, '/etc/passwd');
- while (<passwd>) {
- ($login, $passwd, $uid, $gid, $gcos,
- $home, $shell) = split(/:/);
- ...
+ open(PASSWD, '/etc/passwd');
+ while (<PASSWD>) {
+ ($login, $passwd, $uid, $gid, $gcos,$home, $shell) = split(/:/);
+ #...
}
(Note that $shell above will still have a newline on it. See L</chop>,
@@ -3302,7 +3446,7 @@ Returns a string formatted by the usual printf conventions of the
C library function sprintf(). See L<sprintf(3)> or L<printf(3)>
on your system for an explanation of the general principles.
-Perl does all of its own sprintf() formatting -- it emulates the C
+Perl does its own sprintf() formatting -- it emulates the C
function sprintf(), but it doesn't use it (except for floating-point
numbers, and even then only the standard modifiers are allowed). As a
result, any non-standard extensions in your local sprintf() are not
@@ -3485,7 +3629,7 @@ the rarest character is selected, based on some static frequency tables
constructed from some C programs and English text. Only those places
that contain this "rarest" character are examined.)
-For example, here is a loop which inserts index producing entries
+For example, here is a loop that inserts index producing entries
before any line containing a certain pattern:
while (<>) {
@@ -3493,11 +3637,11 @@ before any line containing a certain pattern:
print ".IX foo\n" if /\bfoo\b/;
print ".IX bar\n" if /\bbar\b/;
print ".IX blurfl\n" if /\bblurfl\b/;
- ...
+ # ...
print;
}
-In searching for /\bfoo\b/, only those locations in $_ that contain "f"
+In searching for C</\bfoo\b/>, only those locations in $_ that contain "f"
will be looked at, because "f" is rarer than "o". In general, this is
a big win except in pathological cases. The only question is whether
it saves you more time than it took to build the linked list in the
@@ -3549,7 +3693,7 @@ that far from the end of the string. If LEN is omitted, returns
everything to the end of the string. If LEN is negative, leaves that
many characters off the end of the string.
-If you specify a substring which is partly outside the string, the part
+If you specify a substring that is partly outside the string, the part
within the string is returned. If the substring is totally outside
the string a warning is produced.
@@ -3573,7 +3717,7 @@ Returns 1 for success, 0 otherwise. On systems that don't support
symbolic links, produces a fatal error at run time. To check for that,
use eval:
- $symlink_exists = (eval {symlink("","")};, $@ eq '');
+ $symlink_exists = eval { symlink("",""); 1 };
=item syscall LIST
@@ -3589,7 +3733,7 @@ because Perl has to assume that any string pointer might be written
through. If your
integer arguments are not literals and have never been interpreted in a
numeric context, you may need to add 0 to them to force them to look
-like numbers.
+like numbers. This emulates the syswrite() function (or vice versa):
require 'syscall.ph'; # may need to run h2ph
$s = "hi there\n";
@@ -3624,11 +3768,26 @@ system-dependent; they are available via the standard module C<Fcntl>.
However, for historical reasons, some values are universal: zero means
read-only, one means write-only, and two means read/write.
-If the file named by FILENAME does not exist and the C<open> call
-creates it (typically because MODE includes the O_CREAT flag), then
-the value of PERMS specifies the permissions of the newly created
-file. If PERMS is omitted, the default value is 0666, which allows
-read and write for all. This default is reasonable: see C<umask>.
+If the file named by FILENAME does not exist and the C<open> call creates
+it (typically because MODE includes the O_CREAT flag), then the value of
+PERMS specifies the permissions of the newly created file. If you omit
+the PERMS argument to C<sysopen>, Perl uses the octal value C<0666>.
+These permission values need to be in octal, and are modified by your
+process's current C<umask>. The C<umask> value is a number representing
+disabled permissions bits--if your C<umask> were 027 (group can't write;
+others can't read, write, or execute), then passing C<sysopen> 0666 would
+create a file with mode 0640 (C<0666 &~ 027> is 0640).
+
+If you find this C<umask> talk confusing, here's some advice: supply a
+creation mode of 0666 for regular files and one of 0777 for directories
+(in C<mkdir>) and executable files. This gives users the freedom of
+choice: if they want protected files, they might choose process umasks
+of 022, 027, or even the particularly antisocial mask of 077. Programs
+should rarely if ever make policy decisions better left to the user.
+The exception to this is when writing files that should be kept private:
+mail files, web browser cookies, I<.rhosts> files, and so on. In short,
+seldom if ever use 0644 as argument to C<sysopen> because that takes
+away the user's option to have a more permissive umask. Better to omit it.
The IO::File module provides a more object-oriented approach, if you're
into that kind of thing.
@@ -3692,40 +3851,16 @@ program they're running doesn't actually interrupt your program.
system(@args) == 0
or die "system @args failed: $?"
-Here's a more elaborate example of analysing the return value from
-system() on a Unix system to check for all possibilities, including for
-signals and core dumps.
+You can check all the failure possibilities by inspecting
+C<$?> like this:
- $! = 0;
- $rc = system @args;
- printf "system(%s) returned %#04x: ", "@args", $rc;
- if ($rc == 0) {
- print "ran with normal exit\n";
- }
- elsif ($rc == 0xff00) {
- # Note that $! can be an empty string if the command that
- # system() tried to execute was not found, not executable, etc.
- # These errors occur in the child process after system() has
- # forked, so the errno value is not visible in the parent.
- printf "command failed: %s\n", ($! || "Unknown system() error");
- }
- elsif (($rc & 0xff) == 0) {
- $rc >>= 8;
- print "ran with non-zero exit status $rc\n";
- }
- else {
- print "ran with ";
- if ($rc & 0x80) {
- $rc &= ~0x80;
- print "core dump from ";
- }
- print "signal $rc\n"
- }
- $ok = ($rc != 0);
+ $exit_value = $? >> 8;
+ $signal_num = $? & 127;
+ $dumped_core = $? & 128;
When the arguments get executed via the system shell, results will
be subject to its quirks and capabilities. See L<perlop/"`STRING`">
-for details.
+and L</exec> for details.
=item syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET
@@ -3819,7 +3954,7 @@ For further details see L<perltie>, L<tied VARIABLE>.
=item tied VARIABLE
Returns a reference to the object underlying VARIABLE (the same value
-that was originally returned by the tie() call which bound the variable
+that was originally returned by the tie() call that bound the variable
to a package.) Returns the undefined value if VARIABLE isn't tied to a
package.
@@ -3904,6 +4039,8 @@ parameter. Examples:
select undef, undef, undef, 0.25;
($a, $b, undef, $c) = &foo; # Ignore third value returned
+Note that this is a unary operator, not a list operator.
+
=item unlink LIST
=item unlink
@@ -3926,12 +4063,12 @@ If LIST is omitted, uses $_.
Unpack does the reverse of pack: it takes a string representing a
structure and expands it out into a list value, returning the array
-value. (In a scalar context, it returns merely the first value
+value. (In scalar context, it returns merely the first value
produced.) The TEMPLATE has the same format as in the pack function.
Here's a subroutine that does substring:
sub substr {
- local($what,$where,$howmuch) = @_;
+ my($what,$where,$howmuch) = @_;
unpack("x$where a$howmuch", $what);
}
@@ -3989,7 +4126,7 @@ If the first argument to C<use> is a number, it is treated as a version
number instead of a module name. If the version of the Perl interpreter
is less than VERSION, then an error message is printed and Perl exits
immediately. This is often useful if you need to check the current
-Perl version before C<use>ing library modules which have changed in
+Perl version before C<use>ing library modules that have changed in
incompatible ways from older versions of Perl. (We try not to do
this more than we have to.)
@@ -4010,7 +4147,7 @@ If you don't want your namespace altered, explicitly supply an empty list:
That is exactly equivalent to
- BEGIN { require Module; }
+ BEGIN { require Module }
If the VERSION argument is present between Module and LIST, then the
C<use> will call the VERSION method in class Module with the given
@@ -4028,9 +4165,10 @@ are also implemented this way. Currently implemented pragmas are:
use strict qw(subs vars refs);
use subs qw(afunc blurfl);
-These pseudo-modules import semantics into the current block scope, unlike
-ordinary modules, which import symbols into the current package (which are
-effective through the end of the file).
+Some of these these pseudo-modules import semantics into the current
+block scope (like C<strict> or C<integer>, unlike ordinary modules,
+which import symbols into the current package (which are effective
+through the end of the file).
There's a corresponding "no" command that unimports meanings imported
by use, i.e., it calls C<unimport Module LIST> instead of C<import>.
@@ -4115,7 +4253,7 @@ of the deceased process, or -1 if there is no such child process. The
status is returned in C<$?>. If you say
use POSIX ":sys_wait_h";
- ...
+ #...
waitpid(-1,&WNOHANG);
then you can do a non-blocking wait for any process. Non-blocking wait
@@ -4125,6 +4263,8 @@ FLAGS of 0 is implemented everywhere. (Perl emulates the system call
by remembering the status values of processes that have exited but have
not been harvested by the Perl script yet.)
+See L<perlipc> for other examples.
+
=item wantarray
Returns TRUE if the context of the currently executing subroutine is
@@ -4184,7 +4324,7 @@ examples.
=item write
-Writes a formatted record (possibly multi-line) to the specified file,
+Writes a formatted record (possibly multi-line) to the specified FILEHANDLE,
using the format associated with that file. By default the format for
a file is the one having the same name as the filehandle, but the
format for the current output channel (see the select() function) may be set
diff --git a/pod/perlipc.pod b/pod/perlipc.pod
index 65818961d8..09b011ee6a 100644
--- a/pod/perlipc.pod
+++ b/pod/perlipc.pod
@@ -163,7 +163,7 @@ systems, mkfifo(1). These may not be in your normal path.
if ( system('mknod', $path, 'p')
&& system('mkfifo', $path) )
{
- die "mk{nod,fifo} $path failed;
+ die "mk{nod,fifo} $path failed";
}
@@ -196,6 +196,33 @@ to find out whether anyone (or anything) has accidentally removed our fifo.
sleep 2; # to avoid dup signals
}
+=head2 WARNING
+
+By installing Perl code to deal with signals, you're exposing yourself
+to danger from two things. First, few system library functions are
+re-entrant. If the signal interrupts while Perl is executing one function
+(like malloc(3) or printf(3)), and your signal handler then calls the
+same function again, you could get unpredictable behavior--often, a
+core dump. Second, Perl isn't itself re-entrant at the lowest levels.
+If the signal interrupts Perl while Perl is changing its own internal
+data structures, similarly unpredictable behaviour may result.
+
+There are two things you can do, knowing this: be paranoid or be
+pragmatic. The paranoid approach is to do as little as possible in your
+signal handler. Set an existing integer variable that already has a
+value, and return. This doesn't help you if you're in a slow system call,
+which will just restart. That means you have to C<die> to longjump(3) out
+of the handler. Even this is a little cavalier for the true paranoiac,
+who avoids C<die> in a handler because the system I<is> out to get you.
+The pragmatic approach is to say ``I know the risks, but prefer the
+convenience'', and to do anything you want in your signal handler,
+prepared to clean up core dumps now and again.
+
+To forbid signal handlers altogether would bars you from
+many interesting programs, including virtually everything in this manpage,
+since you could no longer even write SIGCHLD handlers. Their dodginess
+is expected to be addresses in the 5.005 release.
+
=head1 Using open() for IPC
@@ -224,7 +251,7 @@ If one can be sure that a particular program is a Perl script that is
expecting filenames in @ARGV, the clever programmer can write something
like this:
- $ program f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile
+ % program f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile
and irrespective of which shell it's called from, the Perl program will
read from the file F<f1>, the process F<cmd1>, standard input (F<tmpfile>
@@ -254,18 +281,27 @@ while readers of bogus commands return just a quick end of file, writers
to bogus command will trigger a signal they'd better be prepared to
handle. Consider:
- open(FH, "|bogus");
- print FH "bang\n";
- close FH;
+ open(FH, "|bogus") or die "can't fork: $!";
+ print FH "bang\n" or die "can't write: $!";
+ close FH or die "can't close: $!";
+
+That won't blow up until the close, and it will blow up with a SIGPIPE.
+To catch it, you could use this:
+
+ $SIG{PIPE} = 'IGNORE';
+ open(FH, "|bogus") or die "can't fork: $!";
+ print FH "bang\n" or die "can't write: $!";
+ close FH or die "can't close: status=$?";
=head2 Filehandles
-Both the main process and the child process share the same STDIN,
-STDOUT and STDERR filehandles. If both processes try to access them
-at once, strange things can happen. You may want to close or reopen
-the filehandles for the child. You can get around this by opening
-your pipe with open(), but on some systems this means that the child
-process cannot outlive the parent.
+Both the main process and any child processes it forks share the same
+STDIN, STDOUT, and STDERR filehandles. If both processes try to access
+them at once, strange things can happen. You'll certainly want to any
+stdio flush output buffers before forking. You may also want to close
+or reopen the filehandles for the child. You can get around this by
+opening your pipe with open(), but on some systems this means that the
+child process cannot outlive the parent.
=head2 Background Processes
@@ -281,9 +317,15 @@ details).
=head2 Complete Dissociation of Child from Parent
In some cases (starting server processes, for instance) you'll want to
-complete dissociate the child process from the parent. The following
-process is reported to work on most Unixish systems. Non-Unix users
-should check their Your_OS::Process module for other solutions.
+complete dissociate the child process from the parent. The easiest
+way is to use:
+
+ use POSIX qw(setsid);
+ setsid() or die "Can't start a new session: $!";
+
+However, you may not be on POSIX. The following process is reported
+to work on most Unixish systems. Non-Unix users should check their
+Your_OS::Process module for other solutions.
=over 4
@@ -307,6 +349,13 @@ Background yourself like this:
fork && exit;
+=item *
+
+Ignore hangup signals in case you're running on a shell that doesn't
+automatically no-hup you:
+
+ $SIG{HUP} = 'IGNORE'; # or whatever you'd like
+
=back
=head2 Safe Pipe Opens
@@ -416,7 +465,7 @@ awkward select() loop and wouldn't allow you to use normal Perl input
operations.
If you look at its source, you'll see that open2() uses low-level
-primitives like Unix pipe() and exec() to create all the connections.
+primitives like Unix pipe() and exec() calls to create all the connections.
While it might have been slightly more efficient by using socketpair(), it
would have then been even less portable than it already is. The open2()
and open3() functions are unlikely to work anywhere except on a Unix
@@ -426,7 +475,7 @@ Here's an example of using open2():
use FileHandle;
use IPC::Open2;
- $pid = open2( \*Reader, \*Writer, "cat -u -n" );
+ $pid = open2(*Reader, *Writer, "cat -u -n" );
Writer->autoflush(); # default here, actually
print Writer "stuff\n";
$got = <Reader>;
@@ -457,6 +506,74 @@ and interact() functions. Find the library (and we hope its
successor F<IPC::Chat>) at your nearest CPAN archive as detailed
in the SEE ALSO section below.
+=head2 Bidirectional Communication with Yourself
+
+If you want, you may make low-level pipe() and fork()
+to stitch this together by hand. This example only
+talks to itself, but you could reopen the appropriate
+handles to STDIN and STDOUT and call other processes.
+
+ #!/usr/bin/perl -w
+ # pipe1 - bidirectional communication using two pipe pairs
+ # designed for the socketpair-challenged
+ use IO::Handle; # thousands of lines just for autoflush :-(
+ pipe(PARENT_RDR, CHILD_WTR); # XXX: failure?
+ pipe(CHILD_RDR, PARENT_WTR); # XXX: failure?
+ CHILD_WTR->autoflush(1);
+ PARENT_WTR->autoflush(1);
+
+ if ($pid = fork) {
+ close PARENT_RDR; close PARENT_WTR;
+ print CHILD_WTR "Parent Pid $$ is sending this\n";
+ chomp($line = <CHILD_RDR>);
+ print "Parent Pid $$ just read this: `$line'\n";
+ close CHILD_RDR; close CHILD_WTR;
+ waitpid($pid,0);
+ } else {
+ die "cannot fork: $!" unless defined $pid;
+ close CHILD_RDR; close CHILD_WTR;
+ chomp($line = <PARENT_RDR>);
+ print "Child Pid $$ just read this: `$line'\n";
+ print PARENT_WTR "Child Pid $$ is sending this\n";
+ close PARENT_RDR; close PARENT_WTR;
+ exit;
+ }
+
+But you don't actually have to make two pipe calls. If you
+have the socketpair() system call, it will do this all for you.
+
+ #!/usr/bin/perl -w
+ # pipe2 - bidirectional communication using socketpair
+ # "the best ones always go both ways"
+
+ use Socket;
+ use IO::Handle; # thousands of lines just for autoflush :-(
+ # We say AF_UNIX because although *_LOCAL is the
+ # POSIX 1003.1g form of the constant, many machines
+ # still don't have it.
+ socketpair(CHILD, PARENT, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
+ or die "socketpair: $!";
+
+ CHILD->autoflush(1);
+ PARENT->autoflush(1);
+
+ if ($pid = fork) {
+ close PARENT;
+ print CHILD "Parent Pid $$ is sending this\n";
+ chomp($line = <CHILD>);
+ print "Parent Pid $$ just read this: `$line'\n";
+ close CHILD;
+ waitpid($pid,0);
+ } else {
+ die "cannot fork: $!" unless defined $pid;
+ close CHILD;
+ chomp($line = <PARENT>);
+ print "Child Pid $$ just read this: `$line'\n";
+ print PARENT "Child Pid $$ is sending this\n";
+ close PARENT;
+ exit;
+ }
+
=head1 Sockets: Client/Server Communication
While not limited to Unix-derived operating systems (e.g., WinSock on PCs
@@ -487,6 +604,17 @@ knows the other has finished when a "\n" is received) or multi-line
messages and responses that end with a period on an empty line
("\n.\n" terminates a message/response).
+=head2 Internet Line Terminators
+
+The Internet line terminator is "\015\012". Under ASCII variants of
+Unix, that could usually be written as "\r\n", but under other systems,
+"\r\n" might at times be "\015\015\012", "\012\012\015", or something
+completely different. The standards specify writing "\015\012" to be
+conformant (be strict in what you provide), but they also recommend
+accepting a lone "\012" on input (but be lenient in what you require).
+We haven't always been very good about that in the code in this manpage,
+but unless you're on a Mac, you'll probably be ok.
+
=head2 Internet TCP Clients and Servers
Use Internet-domain sockets when you want to do client-server
@@ -495,7 +623,6 @@ communication that might extend to machines outside of your own system.
Here's a sample TCP client using Internet-domain sockets:
#!/usr/bin/perl -w
- require 5.002;
use strict;
use Socket;
my ($remote,$port, $iaddr, $paddr, $proto, $line);
@@ -525,11 +652,11 @@ or firewall machine), you should fill this in with your real address
instead.
#!/usr/bin/perl -Tw
- require 5.002;
use strict;
BEGIN { $ENV{PATH} = '/usr/ucb:/bin' }
use Socket;
use Carp;
+ $EOL = "\015\012";
sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }
@@ -558,7 +685,7 @@ instead.
at port $port";
print Client "Hello there, $name, it's now ",
- scalar localtime, "\n";
+ scalar localtime, $EOL;
}
And here's a multithreaded version. It's multithreaded in that
@@ -567,11 +694,11 @@ handle the client request so that the master server can quickly
go back to service a new client.
#!/usr/bin/perl -Tw
- require 5.002;
use strict;
BEGIN { $ENV{PATH} = '/usr/ucb:/bin' }
use Socket;
use Carp;
+ $EOL = "\015\012";
sub spawn; # forward declaration
sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }
@@ -612,8 +739,8 @@ go back to service a new client.
at port $port";
spawn sub {
- print "Hello there, $name, it's now ", scalar localtime, "\n";
- exec '/usr/games/fortune'
+ print "Hello there, $name, it's now ", scalar localtime, $EOL;
+ exec '/usr/games/fortune' # XXX: `wrong' line terminators
or confess "can't exec fortune: $!";
};
@@ -661,7 +788,6 @@ service on a number of different machines and shows how far their clocks
differ from the system on which it's being run:
#!/usr/bin/perl -w
- require 5.002;
use strict;
use Socket;
@@ -698,7 +824,7 @@ want to. Unix-domain sockets are local to the current host, and are often
used internally to implement pipes. Unlike Internet domain sockets, Unix
domain sockets can show up in the file system with an ls(1) listing.
- $ ls -l /dev/log
+ % ls -l /dev/log
srw-rw-rw- 1 root 0 Oct 31 07:23 /dev/log
You can test for these with Perl's B<-S> file test:
@@ -710,7 +836,6 @@ You can test for these with Perl's B<-S> file test:
Here's a sample Unix-domain client:
#!/usr/bin/perl -w
- require 5.002;
use Socket;
use strict;
my ($rendezvous, $line);
@@ -723,15 +848,17 @@ Here's a sample Unix-domain client:
}
exit;
-And here's a corresponding server.
+And here's a corresponding server. You don't have to worry about silly
+network terminators here because Unix domain sockets are guaranteed
+to be on the localhost, and thus everything works right.
#!/usr/bin/perl -Tw
- require 5.002;
use strict;
use Socket;
use Carp;
BEGIN { $ENV{PATH} = '/usr/ucb:/bin' }
+ sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }
my $NAME = '/tmp/catsock';
my $uaddr = sockaddr_un($NAME);
@@ -744,8 +871,17 @@ And here's a corresponding server.
logmsg "server started on $NAME";
+ my $waitedpid;
+
+ sub REAPER {
+ $waitedpid = wait;
+ $SIG{CHLD} = \&REAPER; # loathe sysV
+ logmsg "reaped $waitedpid" . ($? ? " with exit $?" : '');
+ }
+
$SIG{CHLD} = \&REAPER;
+
for ( $waitedpid = 0;
accept(Client,Server) || $waitedpid;
$waitedpid = 0, close Client)
@@ -866,6 +1002,8 @@ something to the server before fetching the server's response.
use IO::Socket;
unless (@ARGV > 1) { die "usage: $0 host document ..." }
$host = shift(@ARGV);
+ $EOL = "\015\012";
+ $BLANK = $EOL x 2;
foreach $document ( @ARGV ) {
$remote = IO::Socket::INET->new( Proto => "tcp",
PeerAddr => $host,
@@ -873,7 +1011,7 @@ something to the server before fetching the server's response.
);
unless ($remote) { die "cannot connect to http daemon on $host" }
$remote->autoflush(1);
- print $remote "GET $document HTTP/1.0\n\n";
+ print $remote "GET $document HTTP/1.0" . $BLANK;
while ( <$remote> ) { print }
close $remote;
}
@@ -900,7 +1038,7 @@ such a request.
Here's an example of running that program, which we'll call I<webget>:
- shell_prompt$ webget www.perl.com /guanaco.html
+ % webget www.perl.com /guanaco.html
HTTP/1.1 404 File Not Found
Date: Thu, 08 May 1997 18:02:32 GMT
Server: Apache/1.2b6
@@ -935,9 +1073,8 @@ simultaneously copies everything from standard input to the socket.
To accomplish the same thing using just one process would be I<much>
harder, because it's easier to code two processes to do one thing than it
is to code one process to do two things. (This keep-it-simple principle
-is one of the cornerstones of the Unix philosophy, and good software
-engineering as well, which is probably why it's spread to other systems
-as well.)
+a cornerstones of the Unix philosophy, and good software engineering as
+well, which is probably why it's spread to other systems.)
Here's the code:
@@ -997,7 +1134,7 @@ well.
=head1 TCP Servers with IO::Socket
-Setting up server is little bit more involved than running a client.
+As always, setting up a server is little bit more involved than running a client.
The model is that the server creates a special kind of socket that
does nothing but listen on a particular port for incoming connections.
It does this by calling the C<IO::Socket::INET-E<gt>new()> method with
@@ -1111,7 +1248,6 @@ with TCP, you'd have to use a different socket handle for each host.
#!/usr/bin/perl -w
use strict;
- require 5.002;
use Socket;
use Sys::Hostname;
@@ -1239,22 +1375,15 @@ on CPAN.
=head1 NOTES
-If you are running under version 5.000 (dubious) or 5.001, you can still
-use most of the examples in this document. You may have to remove the
-C<use strict> and some of the my() statements for 5.000, and for both
-you'll have to load in version 1.2 or older of the F<Socket.pm> module, which
-is included in I<perl5.002>.
-
-Most of these routines quietly but politely return C<undef> when they fail
-instead of causing your program to die right then and there due to an
-uncaught exception. (Actually, some of the new I<Socket> conversion
-functions croak() on bad arguments.) It is therefore essential
-that you should check the return values of these functions. Always begin
-your socket programs this way for optimal success, and don't forget to add
-B<-T> taint checking flag to the pound-bang line for servers:
+Most of these routines quietly but politely return C<undef> when they
+fail instead of causing your program to die right then and there due to
+an uncaught exception. (Actually, some of the new I<Socket> conversion
+functions croak() on bad arguments.) It is therefore essential to
+check return values from these functions. Always begin your socket
+programs this way for optimal success, and don't forget to add B<-T>
+taint checking flag to the #! line for servers:
- #!/usr/bin/perl -w
- require 5.002;
+ #!/usr/bin/perl -Tw
use strict;
use sigtrap;
use Socket;
@@ -1268,14 +1397,14 @@ signals and to stick with simple TCP and UDP socket operations; e.g., don't
try to pass open file descriptors over a local UDP datagram socket if you
want your code to stand a chance of being portable.
-Because few vendors provide C libraries that are safely re-entrant,
-the prudent programmer will do little else within a handler beyond
-setting a numeric variable that already exists; or, if locked into
-a slow (restarting) system call, using die() to raise an exception
-and longjmp(3) out. In fact, even these may in some cases cause a
-core dump. It's probably best to avoid signals except where they are
-absolutely inevitable. This perilous problems will be addressed in a
-future release of Perl.
+As mentioned in the signals section, because few vendors provide C
+libraries that are safely re-entrant, the prudent programmer will do
+little else within a handler beyond setting a numeric variable that
+already exists; or, if locked into a slow (restarting) system call,
+using die() to raise an exception and longjmp(3) out. In fact, even
+these may in some cases cause a core dump. It's probably best to avoid
+signals except where they are absolutely inevitable. This
+will be addressed in a future release of Perl.
=head1 AUTHOR
@@ -1287,10 +1416,10 @@ version and suggestions from the Perl Porters.
There's a lot more to networking than this, but this should get you
started.
-For intrepid programmers, the classic textbook I<Unix Network Programming>
-by Richard Stevens (published by Addison-Wesley). Note that most books
-on networking address networking from the perspective of a C programmer;
-translation to Perl is left as an exercise for the reader.
+For intrepid programmers, the indispensable textbook is I<Unix Network
+Programming> by W. Richard Stevens (published by Addison-Wesley). Note
+that most books on networking address networking from the perspective of
+a C programmer; translation to Perl is left as an exercise for the reader.
The IO::Socket(3) manpage describes the object library, and the Socket(3)
manpage describes the low-level interface to sockets. Besides the obvious
diff --git a/pod/perllocale.pod b/pod/perllocale.pod
index 2a08835fe8..f4ca0dd607 100644
--- a/pod/perllocale.pod
+++ b/pod/perllocale.pod
@@ -4,17 +4,18 @@ perllocale - Perl locale handling (internationalization and localization)
=head1 DESCRIPTION
-Perl supports language-specific notions of data such as "is this a
-letter", "what is the uppercase equivalent of this letter", and "which
-of these letters comes first". These are important issues, especially
-for languages other than English - but also for English: it would be
-very naE<iuml>ve to think that C<A-Za-z> defines all the "letters". Perl
-is also aware that some character other than '.' may be preferred as a
-decimal point, and that output date representations may be
-language-specific. The process of making an application take account of
-its users' preferences in such matters is called B<internationalization>
-(often abbreviated as B<i18n>); telling such an application about a
-particular set of preferences is known as B<localization> (B<l10n>).
+Perl supports language-specific notions of data such as "is this
+a letter", "what is the uppercase equivalent of this letter", and
+"which of these letters comes first". These are important issues,
+especially for languages other than English--but also for English: it
+would be naE<iuml>ve to imagine that C<A-Za-z> defines all the "letters"
+needed to write in English. Perl is also aware that some character other
+than '.' may be preferred as a decimal point, and that output date
+representations may be language-specific. The process of making an
+application take account of its users' preferences in such matters is
+called B<internationalization> (often abbreviated as B<i18n>); telling
+such an application about a particular set of preferences is known as
+B<localization> (B<l10n>).
Perl can understand language-specific data via the standardized (ISO C,
XPG4, POSIX 1.c) method called "the locale system". The locale system is
@@ -22,13 +23,13 @@ controlled per application using one pragma, one function call, and
several environment variables.
B<NOTE>: This feature is new in Perl 5.004, and does not apply unless an
-application specifically requests it - see L<Backward compatibility>.
+application specifically requests it--see L<Backward compatibility>.
The one exception is that write() now B<always> uses the current locale
- see L<"NOTES">.
=head1 PREPARING TO USE LOCALES
-If Perl applications are to be able to understand and present your data
+If Perl applications are to understand and present your data
correctly according a locale of your choice, B<all> of the following
must be true:
@@ -42,15 +43,15 @@ its C library.
=item *
-B<Definitions for the locales which you use must be installed>. You, or
+B<Definitions for locales that you use must be installed>. You, or
your system administrator, must make sure that this is the case. The
available locales, the location in which they are kept, and the manner
-in which they are installed, vary from system to system. Some systems
-provide only a few, hard-wired, locales, and do not allow more to be
-added; others allow you to add "canned" locales provided by the system
-supplier; still others allow you or the system administrator to define
+in which they are installed all vary from system to system. Some systems
+provide only a few, hard-wired locales and do not allow more to be
+added. Others allow you to add "canned" locales provided by the system
+supplier. Still others allow you or the system administrator to define
and add arbitrary locales. (You may have to ask your supplier to
-provide canned locales which are not delivered with your operating
+provide canned locales that are not delivered with your operating
system.) Read your system documentation for further illumination.
=item *
@@ -71,8 +72,8 @@ appropriate, and B<at least one> of the following must be true:
=item *
B<The locale-determining environment variables (see L<"ENVIRONMENT">)
-must be correctly set up>, either by yourself, or by the person who set
-up your system account, at the time the application is started.
+must be correctly set up> at the time the application is started, either
+by yourself or by whoever set up your system account.
=item *
@@ -94,16 +95,16 @@ pragma tells Perl to use the current locale for some operations:
B<The comparison operators> (C<lt>, C<le>, C<cmp>, C<ge>, and C<gt>) and
the POSIX string collation functions strcoll() and strxfrm() use
-C<LC_COLLATE>. sort() is also affected if it is used without an
-explicit comparison function because it uses C<cmp> by default.
+C<LC_COLLATE>. sort() is also affected if used without an
+explicit comparison function, because it uses C<cmp> by default.
-B<Note:> C<eq> and C<ne> are unaffected by the locale: they always
+B<Note:> C<eq> and C<ne> are unaffected by locale: they always
perform a byte-by-byte comparison of their scalar operands. What's
more, if C<cmp> finds that its operands are equal according to the
collation sequence specified by the current locale, it goes on to
perform a byte-by-byte comparison, and only returns I<0> (equal) if the
operands are bit-for-bit identical. If you really want to know whether
-two strings - which C<eq> and C<cmp> may consider different - are equal
+two strings--which C<eq> and C<cmp> may consider different--are equal
as far as collation in the locale is concerned, see the discussion in
L<Category LC_COLLATE: Collation>.
@@ -126,10 +127,10 @@ B<The POSIX date formatting function> (strftime()) uses C<LC_TIME>.
C<LC_COLLATE>, C<LC_CTYPE>, and so on, are discussed further in L<LOCALE
CATEGORIES>.
-The default behavior returns with S<C<no locale>> or on reaching the
-end of the enclosing block.
+The default behavior is restored with the S<C<no locale>> pragma, or
+upon reaching the end of block enclosing C<use locale>.
-Note that the string result of any operation that uses locale
+The string result of any operation that uses locale
information is tainted, as it is possible for a locale to be
untrustworthy. See L<"SECURITY">.
@@ -173,17 +174,17 @@ the current locale for the category. You can use this value as the
second argument in a subsequent call to setlocale(). If a second
argument is given and it corresponds to a valid locale, the locale for
the category is set to that value, and the function returns the
-now-current locale value. You can use this in a subsequent call to
+now-current locale value. You can then use this in yet another call to
setlocale(). (In some implementations, the return value may sometimes
-differ from the value you gave as the second argument - think of it as
-an alias for the value that you gave.)
+differ from the value you gave as the second argument--think of it as
+an alias for the value you gave.)
As the example shows, if the second argument is an empty string, the
category's locale is returned to the default specified by the
corresponding environment variables. Generally, this results in a
-return to the default which was in force when Perl started up: changes
+return to the default that was in force when Perl started up: changes
to the environment made by the application after startup may or may not
-be noticed, depending on the implementation of your system's C library.
+be noticed, depending on your system's C library.
If the second argument does not correspond to a valid locale, the locale
for the category is not changed, and the function returns I<undef>.
@@ -192,10 +193,9 @@ For further information about the categories, consult L<setlocale(3)>.
=head2 Finding locales
-For the locales available in your system, also consult L<setlocale(3)>
-and see whether it leads you to the list of the available locales
-(search for the I<SEE ALSO> section). If that fails, try the following
-command lines:
+For locales available in your system, consult also L<setlocale(3)> to
+see whether it leads to the list of available locales (search for the
+I<SEE ALSO> section). If that fails, try the following command lines:
locale -a
@@ -215,25 +215,25 @@ and see whether they list something resembling these
english german russian
english.iso88591 german.iso88591 russian.iso88595
-Sadly, even though the calling interface for setlocale() has been
-standardized, the names of the locales and the directories where the
-configuration is, have not. The basic form of the name is
-I<language_country/territory>B<.>I<codeset>, but the latter parts
-after the I<language> are not always present. The I<language> and the
-I<country> are usually from the standards B<ISO 3166> and B<ISO 639>,
-respectively, the two-letter abbreviations for the countries and the
-languages of the world. The I<codeset> part often mentions some B<ISO
-8859> character set, the Latin codesets. For example the C<ISO
-8859-1> is the so-called "Western codeset" that can be used to encode
-most of the Western European languages. Again, sadly, as you can see,
-there are several ways to write even the name of that one standard.
+Sadly, even though the calling interface for setlocale() has
+been standardized, names of locales and the directories where the
+configuration resides have not been. The basic form of the name is
+I<language_country/territory>B<.>I<codeset>, but the latter parts after
+I<language> are not always present. The I<language> and I<country> are
+usually from the standards B<ISO 3166> and B<ISO 639>, the two-letter
+abbreviations for the countries and the languages of the world,
+respectively. The I<codeset> part often mentions some B<ISO 8859>
+character set, the Latin codesets. For example, C<ISO 8859-1> is the
+so-called "Western codeset" that can be used to encode most Western
+European languages. Again, there are several ways to write even the
+name of that one standard. Lamentably.
Two special locales are worth particular mention: "C" and "POSIX".
Currently these are effectively the same locale: the difference is
-mainly that the first one is defined by the C standard and the second by
-the POSIX standard. What they define is the B<default locale> in which
+mainly that the first one is defined by the C standard, the second by
+the POSIX standard. They define the B<default locale> in which
every program starts in the absence of locale information in its
-environment. (The default default locale, if you will.) Its language
+environment. (The I<default> default locale, if you will.) Its language
is (American) English and its character codeset ASCII.
B<NOTE>: Not all systems have the "POSIX" locale (not all systems are
@@ -242,7 +242,7 @@ default locale.
=head2 LOCALE PROBLEMS
-You may meet the following warning message at Perl startup:
+You may encounter the following warning message at Perl startup:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
@@ -251,83 +251,80 @@ You may meet the following warning message at Perl startup:
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
-This means that your locale settings were that LC_ALL equals "En_US"
-and LANG exists but has no value. Perl tried to believe you but it
-could not. Instead Perl gave up and fell back to the "C" locale, the
-default locale that is supposed to work no matter what. This usually
-means either or both of the two problems: either your locale settings
-were wrong, they talk of locales your system has never heard of, or
-that the locale installation in your system has problems, for example
-some system files are broken or missing. For the problems there are
-quick and temporary fixes and more thorough and lasting fixes.
+This means that your locale settings had LC_ALL set to "En_US" and
+LANG exists but has no value. Perl tried to believe you but could not.
+Instead, Perl gave up and fell back to the "C" locale, the default locale
+that is supposed to work no matter what. This usually means your locale
+settings were wrong, they mention locales your system has never heard
+of, or the locale installation in your system has problems (for example,
+some system files are broken or missing). There are quick and temporary
+fixes to these problems, as well as more thorough and lasting fixes.
=head2 Temporarily fixing locale problems
-The two quickest fixes are either to make Perl be silent about any
+The two quickest fixes are either to render Perl silent about any
locale inconsistencies or to run Perl under the default locale "C".
Perl's moaning about locale problems can be silenced by setting the
environment variable PERL_BADLANG to a non-zero value, for example
"1". This method really just sweeps the problem under the carpet: you
tell Perl to shut up even when Perl sees that something is wrong. Do
-not be surprised if later something locale-dependent works funny.
+not be surprised if later something locale-dependent misbehaves.
Perl can be run under the "C" locale by setting the environment
-variable LC_ALL to "C". This method is perhaps a bit more civilised
-than the PERL_BADLANG one but please note that setting the LC_ALL (or
-the other locale variables) may affect also other programs, not just
-Perl. Especially external programs run from within Perl will see
+variable LC_ALL to "C". This method is perhaps a bit more civilized
+than the PERL_BADLANG approach, but setting LC_ALL (or
+other locale variables) may affect other programs as well, not just
+Perl. In particular, external programs run from within Perl will see
these changes. If you make the new settings permanent (read on), all
-the programs you run will see the changes. See L<ENVIRONMENT> for for
-the full list of all the environment variables and L<USING LOCALES>
-for their effects in Perl. The effects in other programs are quite
-easily deducible: for example the variable LC_COLLATE may well affect
-your "sort" program (or whatever the program that arranges `records'
+programs you run see the changes. See L<ENVIRONMENT> for for
+the full list of relevant environment variables and L<USING LOCALES>
+for their effects in Perl. Effects in other programs are
+easily deducible. For example, the variable LC_COLLATE may well affect
+your B<sort> program (or whatever the program that arranges `records'
alphabetically in your system is called).
-You can first try out changing these variables temporarily and if the
-new settings seem to help then put the settings into the startup files
-of your environment. Please consult your local documentation for the
-exact details but very shortly for UNIXish systems: in Bourneish
-shells (sh, ksh, bash, zsh) for example
+You can test out changing these variables temporarily, and if the
+new settings seem to help, put those settings into your shell startup
+files. Consult your local documentation for the exact details. For in
+Bourne-like shells (B<sh>, B<ksh>, B<bash>, B<zsh>):
LC_ALL=en_US.ISO8859-1
export LC_ALL
-We assume here that we saw with the above discussed commands the
-locale "en_US.ISO8859-1" and decided to try that instead of the above
-faulty locale "En_US" -- and in Cshish shells (csh, tcsh)
+This assumes that we saw the locale "en_US.ISO8859-1" using the commands
+discussed above. We decided to try that instead of the above faulty
+locale "En_US"--and in Cshish shells (B<csh>, B<tcsh>)
setenv LC_ALL en_US.ISO8859-1
-If you do not know what shell you have, please consult your local
+If you do not know what shell you have, consult your local
helpdesk or the equivalent.
=head2 Permanently fixing locale problems
-Then the slower but better fixes: the misconfiguration of your own
-environment variables you may be able to fix yourself; the
+The slower but superior fixes are when you may be able to yourself
+fix the misconfiguration of your own environment variables. The
mis(sing)configuration of the whole system's locales usually requires
the help of your friendly system administrator.
-First, see earlier in this document about L<Finding locales>. That
-tells how you can find which locales really are supported and more
-importantly, installed, in your system. In our example error message
-the environment variables affecting the locale are listed in the order
-of decreasing importance and unset variables do not matter, therefore
-in the above error message the LC_ALL being "En_US" must have been the
-bad choice. Always try fixing first the locale settings listed first.
+First, see earlier in this document about L<Finding locales>. That tells
+how to find which locales are really supported--and more importantly,
+installed--on your system. In our example error message, environment
+variables affecting the locale are listed in the order of decreasing
+importance (and unset variables do not matter). Therefore, having
+LC_ALL set to "En_US" must have been the bad choice, as shown by the
+error message. First try fixing locale settings listed first.
-Second, if you see with the listed commands something B<exactly> (for
-example prefix matches do not count and case usually matters) like
-"En_US" (without the quotes), then you should be okay because you are
-using a locale name that should be installed and available in your
-system. In this case skip forward to L<Fixing the system locale
-configuration>.
+Second, if using the listed commands you see something B<exactly>
+(prefix matches do not count and case usually counts) like "En_US"
+without the quotes, then you should be okay because you are using a
+locale name that should be installed and available in your system.
+In this case, see L<Fixing system locale configuration>.
-=head2 Permantently fixing your locale configuration
+=head2 Permanently fixing your locale configuration
-This is the case when for example you see
+This is when you see something like:
perl: warning: Please check that your locale settings:
LC_ALL = "En_US",
@@ -335,21 +332,21 @@ This is the case when for example you see
are supported and installed on your system.
but then cannot see that "En_US" listed by the above-mentioned
-commands. You may see things like "en_US.ISO8859-1" but that is not
-the same thing. In this case you might try running under a locale
-that you could list and somehow matches with what you tried. The
+commands. You may see things like "en_US.ISO8859-1", but that isn't
+the same. In this case, try running under a locale
+that you can list and which somehow matches what you tried. The
rules for matching locale names are a bit vague because
-standardisation is weak in this area. See again the L<Finding
-locales> about the general rules.
+standardization is weak in this area. See again the L<Finding
+locales> about general rules.
-=head2 Permanently fixing the system locale configuration
+=head2 Permanently fixing system locale configuration
-Please contact your system administrator and tell her the exact error
-message you get and ask her to read this same documentation you are
-now reading. She should be able to check whether there is something
-wrong with the locale configuration of the system. The L<Finding
-locales> section is unfortunately a bit vague about the exact commands
-and places because these things are not that standardised.
+Contact a system administrator (preferably your own) and report the exact
+error message you get, and ask them to read this same documentation you
+are now reading. They should be able to check whether there is something
+wrong with the locale configuration of the system. The L<Finding locales>
+section is unfortunately a bit vague about the exact commands and places
+because these things are not that standardized.
=head2 The localeconv function
@@ -357,7 +354,7 @@ The POSIX::localeconv() function allows you to get particulars of the
locale-dependent numeric formatting information specified by the current
C<LC_NUMERIC> and C<LC_MONETARY> locales. (If you just want the name of
the current locale for a particular category, use POSIX::setlocale()
-with a single parameter - see L<The setlocale function>.)
+with a single parameter--see L<The setlocale function>.)
use POSIX qw(locale_h);
@@ -370,16 +367,15 @@ with a single parameter - see L<The setlocale function>.)
}
localeconv() takes no arguments, and returns B<a reference to> a hash.
-The keys of this hash are formatting variable names such as
-C<decimal_point> and C<thousands_sep>; the values are the corresponding
-values. See L<POSIX (3)/localeconv> for a longer example, which lists
-all the categories an implementation might be expected to provide; some
-provide more and others fewer, however. Note that you don't need C<use
-locale>: as a function with the job of querying the locale, localeconv()
-always observes the current locale.
+The keys of this hash are variable names for formatting, such as
+C<decimal_point> and C<thousands_sep>. The values are the corresponding,
+er, values. See L<POSIX (3)/localeconv> for a longer example listing
+the categories an implementation might be expected to provide; some
+provide more and others fewer, however. You don't need an explicit C<use
+locale>, because localeconv() always observes the current locale.
-Here's a simple-minded example program which rewrites its command line
-parameters as integers formatted correctly in the current locale:
+Here's a simple-minded example program that rewrites its command-line
+parameters as integers correctly formatted in the current locale:
# See comments in previous example
require 5.004;
@@ -404,20 +400,20 @@ parameters as integers formatted correctly in the current locale:
=head1 LOCALE CATEGORIES
-The subsections which follow describe basic locale categories. As well
-as these, there are some combination categories which allow the
-manipulation of more than one basic category at a time. See
-L<"ENVIRONMENT"> for a discussion of these.
+The following subsections describe basic locale categories. Beyond these,
+some combination categories allow manipulation of more than one
+basic category at a time. See L<"ENVIRONMENT"> for a discussion of these.
=head2 Category LC_COLLATE: Collation
-When in the scope of S<C<use locale>>, Perl looks to the C<LC_COLLATE>
-environment variable to determine the application's notions on the
-collation (ordering) of characters. ('b' follows 'a' in Latin
-alphabets, but where do 'E<aacute>' and 'E<aring>' belong?)
+In the scope of S<C<use locale>>, Perl looks to the C<LC_COLLATE>
+environment variable to determine the application's notions on collation
+(ordering) of characters. For example, 'b' follows 'a' in Latin
+alphabets, but where do 'E<aacute>' and 'E<aring>' belong? And while
+'color' follows 'chocolate' in English, what about in Spanish?
-Here is a code snippet that will tell you what are the alphanumeric
-characters in the current locale, in the locale order:
+Here is a code snippet to tell what alphanumeric
+characters are in the current locale, in that locale's order:
use locale;
print +(sort grep /\w/, map { chr() } 0..255), "\n";
@@ -435,7 +431,7 @@ first example is useful for natural text.
As noted in L<USING LOCALES>, C<cmp> compares according to the current
collation locale when C<use locale> is in effect, but falls back to a
-byte-by-byte comparison for strings which the locale says are equal. You
+byte-by-byte comparison for strings that the locale says are equal. You
can use POSIX::strcoll() if you don't want this fall-back:
use POSIX qw(strcoll);
@@ -443,10 +439,10 @@ can use POSIX::strcoll() if you don't want this fall-back:
!strcoll("space and case ignored", "SpaceAndCaseIgnored");
$equal_in_locale will be true if the collation locale specifies a
-dictionary-like ordering which ignores space characters completely, and
+dictionary-like ordering that ignores space characters completely and
which folds case.
-If you have a single string which you want to check for "equality in
+If you have a single string that you want to check for "equality in
locale" against several others, you might think you could gain a little
efficiency by using POSIX::strxfrm() in conjunction with C<eq>:
@@ -462,30 +458,30 @@ efficiency by using POSIX::strxfrm() in conjunction with C<eq>:
strxfrm() takes a string and maps it into a transformed string for use
in byte-by-byte comparisons against other transformed strings during
collation. "Under the hood", locale-affected Perl comparison operators
-call strxfrm() for both their operands, then do a byte-by-byte
-comparison of the transformed strings. By calling strxfrm() explicitly,
+call strxfrm() for both operands, then do a byte-by-byte
+comparison of the transformed strings. By calling strxfrm() explicitly
and using a non locale-affected comparison, the example attempts to save
-a couple of transformations. In fact, it doesn't save anything: Perl
+a couple of transformations. But in fact, it doesn't save anything: Perl
magic (see L<perlguts/Magic Variables>) creates the transformed version of a
-string the first time it's needed in a comparison, then keeps it around
+string the first time it's needed in a comparison, then keeps this version around
in case it's needed again. An example rewritten the easy way with
C<cmp> runs just about as fast. It also copes with null characters
embedded in strings; if you call strxfrm() directly, it treats the first
-null it finds as a terminator. And don't expect the transformed strings
-it produces to be portable across systems - or even from one revision
+null it finds as a terminator. don't expect the transformed strings
+it produces to be portable across systems--or even from one revision
of your operating system to the next. In short, don't call strxfrm()
directly: let Perl do it for you.
-Note: C<use locale> isn't shown in some of these examples, as it isn't
+Note: C<use locale> isn't shown in some of these examples because it isn't
needed: strcoll() and strxfrm() exist only to generate locale-dependent
results, and so always obey the current C<LC_COLLATE> locale.
=head2 Category LC_CTYPE: Character Types
-When in the scope of S<C<use locale>>, Perl obeys the C<LC_CTYPE> locale
+In the scope of S<C<use locale>>, Perl obeys the C<LC_CTYPE> locale
setting. This controls the application's notion of which characters are
alphabetic. This affects Perl's C<\w> regular expression metanotation,
-which stands for alphanumeric characters - that is, alphabetic and
+which stands for alphanumeric characters--that is, alphabetic and
numeric characters. (Consult L<perlre> for more information about
regular expressions.) Thanks to C<LC_CTYPE>, depending on your locale
setting, characters like 'E<aelig>', 'E<eth>', 'E<szlig>', and
@@ -493,35 +489,34 @@ setting, characters like 'E<aelig>', 'E<eth>', 'E<szlig>', and
The C<LC_CTYPE> locale also provides the map used in transliterating
characters between lower and uppercase. This affects the case-mapping
-functions - lc(), lcfirst, uc() and ucfirst(); case-mapping
-interpolation with C<\l>, C<\L>, C<\u> or C<\U> in double-quoted strings
-and in C<s///> substitutions; and case-independent regular expression
+functions--lc(), lcfirst, uc(), and ucfirst(); case-mapping
+interpolation with C<\l>, C<\L>, C<\u>, or C<\U> in double-quoted strings
+and C<s///> substitutions; and case-independent regular expression
pattern matching using the C<i> modifier.
-Finally, C<LC_CTYPE> affects the POSIX character-class test functions -
-isalpha(), islower() and so on. For example, if you move from the "C"
-locale to a 7-bit Scandinavian one, you may find - possibly to your
-surprise - that "|" moves from the ispunct() class to isalpha().
+Finally, C<LC_CTYPE> affects the POSIX character-class test
+functions--isalpha(), islower(), and so on. For example, if you move
+from the "C" locale to a 7-bit Scandinavian one, you may find--possibly
+to your surprise--that "|" moves from the ispunct() class to isalpha().
B<Note:> A broken or malicious C<LC_CTYPE> locale definition may result
in clearly ineligible characters being considered to be alphanumeric by
-your application. For strict matching of (unaccented) letters and
-digits - for example, in command strings - locale-aware applications
+your application. For strict matching of (mundane) letters and
+digits--for example, in command strings--locale-aware applications
should use C<\w> inside a C<no locale> block. See L<"SECURITY">.
=head2 Category LC_NUMERIC: Numeric Formatting
-When in the scope of S<C<use locale>>, Perl obeys the C<LC_NUMERIC>
-locale information, which controls application's idea of how numbers
-should be formatted for human readability by the printf(), sprintf(),
-and write() functions. String to numeric conversion by the
-POSIX::strtod() function is also affected. In most implementations the
-only effect is to change the character used for the decimal point -
-perhaps from '.' to ',': these functions aren't aware of such niceties
-as thousands separation and so on. (See L<The localeconv function> if
-you care about these things.)
-
-Note that output produced by print() is B<never> affected by the
+In the scope of S<C<use locale>>, Perl obeys the C<LC_NUMERIC> locale
+information, which controls an application's idea of how numbers should
+be formatted for human readability by the printf(), sprintf(), and
+write() functions. String-to-numeric conversion by the POSIX::strtod()
+function is also affected. In most implementations the only effect is to
+change the character used for the decimal point--perhaps from '.' to ','.
+These functions aren't aware of such niceties as thousands separation and
+so on. (See L<The localeconv function> if you care about these things.)
+
+Output produced by print() is B<never> affected by the
current locale: it is independent of whether C<use locale> or C<no
locale> is in effect, and corresponds to what you'd get from printf()
in the "C" locale. The same is true for Perl's internal conversions
@@ -543,23 +538,23 @@ between numeric and string formats:
=head2 Category LC_MONETARY: Formatting of monetary amounts
-The C standard defines the C<LC_MONETARY> category, but no function that
-is affected by its contents. (Those with experience of standards
+The C standard defines the C<LC_MONETARY> category, but no function
+that is affected by its contents. (Those with experience of standards
committees will recognize that the working group decided to punt on the
issue.) Consequently, Perl takes no notice of it. If you really want
-to use C<LC_MONETARY>, you can query its contents - see L<The localeconv
-function> - and use the information that it returns in your
-application's own formatting of currency amounts. However, you may well
-find that the information, though voluminous and complex, does not quite
-meet your requirements: currency formatting is a hard nut to crack.
+to use C<LC_MONETARY>, you can query its contents--see L<The localeconv
+function>--and use the information that it returns in your application's
+own formatting of currency amounts. However, you may well find that
+the information, voluminous and complex though it may be, still does not
+quite meet your requirements: currency formatting is a hard nut to crack.
=head2 LC_TIME
-The output produced by POSIX::strftime(), which builds a formatted
+Output produced by POSIX::strftime(), which builds a formatted
human-readable date/time string, is affected by the current C<LC_TIME>
locale. Thus, in a French locale, the output produced by the C<%B>
format element (full month name) for the first month of the year would
-be "janvier". Here's how to get a list of the long month names in the
+be "janvier". Here's how to get a list of long month names in the
current locale:
use POSIX qw(strftime);
@@ -568,24 +563,24 @@ current locale:
strftime("%B", 0, 0, 0, 1, $_, 96);
}
-Note: C<use locale> isn't needed in this example: as a function which
+Note: C<use locale> isn't needed in this example: as a function that
exists only to generate locale-dependent results, strftime() always
obeys the current C<LC_TIME> locale.
=head2 Other categories
-The remaining locale category, C<LC_MESSAGES> (possibly supplemented by
-others in particular implementations) is not currently used by Perl -
-except possibly to affect the behavior of library functions called by
-extensions which are not part of the standard Perl distribution.
+The remaining locale category, C<LC_MESSAGES> (possibly supplemented
+by others in particular implementations) is not currently used by
+Perl--except possibly to affect the behavior of library functions called
+by extensions outside the standard Perl distribution.
=head1 SECURITY
-While the main discussion of Perl security issues can be found in
+Although the main discussion of Perl security issues can be found in
L<perlsec>, a discussion of Perl's locale handling would be incomplete
if it did not draw your attention to locale-dependent security issues.
-Locales - particularly on systems which allow unprivileged users to
-build their own locales - are untrustworthy. A malicious (or just plain
+Locales--particularly on systems that allow unprivileged users to
+build their own locales--are untrustworthy. A malicious (or just plain
broken) locale can make a locale-aware application give unexpected
results. Here are a few possibilities:
@@ -594,7 +589,7 @@ results. Here are a few possibilities:
=item *
Regular expression checks for safe file names or mail addresses using
-C<\w> may be spoofed by an C<LC_CTYPE> locale which claims that
+C<\w> may be spoofed by an C<LC_CTYPE> locale that claims that
characters such as "E<gt>" and "|" are alphanumeric.
=item *
@@ -618,32 +613,32 @@ A sneaky C<LC_COLLATE> locale could result in the names of students with
=item *
-An application which takes the trouble to use the information in
+An application that takes the trouble to use information in
C<LC_MONETARY> may format debits as if they were credits and vice versa
-if that locale has been subverted. Or it make may make payments in US
+if that locale has been subverted. Or it might make payments in US
dollars instead of Hong Kong dollars.
=item *
The date and day names in dates formatted by strftime() could be
manipulated to advantage by a malicious user able to subvert the
-C<LC_DATE> locale. ("Look - it says I wasn't in the building on
+C<LC_DATE> locale. ("Look--it says I wasn't in the building on
Sunday.")
=back
Such dangers are not peculiar to the locale system: any aspect of an
-application's environment which may maliciously be modified presents
+application's environment which may be modified maliciously presents
similar challenges. Similarly, they are not specific to Perl: any
-programming language which allows you to write programs which take
+programming language that allows you to write programs that take
account of their environment exposes you to these issues.
-Perl cannot protect you from all of the possibilities shown in the
-examples - there is no substitute for your own vigilance - but, when
+Perl cannot protect you from all possibilities shown in the
+examples--there is no substitute for your own vigilance--but, when
C<use locale> is in effect, Perl uses the tainting mechanism (see
-L<perlsec>) to mark string results which become locale-dependent, and
+L<perlsec>) to mark string results that become locale-dependent, and
which may be untrustworthy in consequence. Here is a summary of the
-tainting behavior of operators and functions which may be affected by
+tainting behavior of operators and functions that may be affected by
the locale:
=over 4
@@ -661,11 +656,11 @@ C<use locale> is in effect.
Scalar true/false result never tainted.
-Subpatterns, either delivered as an array-context result, or as $1 etc.
+Subpatterns, either delivered as a list-context result or as $1 etc.
are tainted if C<use locale> is in effect, and the subpattern regular
expression contains C<\w> (to match an alphanumeric character), C<\W>
(non-alphanumeric character), C<\s> (white-space character), or C<\S>
-(non white-space character). The matched pattern variable, $&, $`
+(non white-space character). The matched-pattern variable, $&, $`
(pre-match), $' (post-match), and $+ (last match) are also tainted if
C<use locale> is in effect and the regular expression contains C<\w>,
C<\W>, C<\s>, or C<\S>.
@@ -673,8 +668,8 @@ C<\W>, C<\s>, or C<\S>.
=item B<Substitution operator> (C<s///>):
Has the same behavior as the match operator. Also, the left
-operand of C<=~> becomes tainted when C<use locale> in effect,
-if it is modified as a result of a substitution based on a regular
+operand of C<=~> becomes tainted when C<use locale> in effect
+if modified as a result of a substitution based on a regular
expression match involving C<\w>, C<\W>, C<\s>, or C<\S>; or of
case-mapping with C<\l>, C<\L>,C<\u> or C<\U>.
@@ -718,8 +713,8 @@ when taint checks are enabled.
or warn "Open of $untainted_output_file failed: $!\n";
The program can be made to run by "laundering" the tainted value through
-a regular expression: the second example - which still ignores locale
-information - runs, creating the file named on its command line
+a regular expression: the second example--which still ignores locale
+information--runs, creating the file named on its command line
if it can.
#/usr/local/bin/perl -T
@@ -731,7 +726,7 @@ if it can.
open(F, ">$untainted_output_file")
or warn "Open of $untainted_output_file failed: $!\n";
-Compare this with a very similar program which is locale-aware:
+Compare this with a similar but locale-aware program:
#/usr/local/bin/perl -T
@@ -744,7 +739,7 @@ Compare this with a very similar program which is locale-aware:
or warn "Open of $localized_output_file failed: $!\n";
This third program fails to run because $& is tainted: it is the result
-of a match involving C<\w> when C<use locale> is in effect.
+of a match involving C<\w> while C<use locale> is in effect.
=head1 ENVIRONMENT
@@ -754,10 +749,10 @@ of a match involving C<\w> when C<use locale> is in effect.
A string that can suppress Perl's warning about failed locale settings
at startup. Failure can occur if the locale support in the operating
-system is lacking (broken) in some way - or if you mistyped the name of
+system is lacking (broken) in some way--or if you mistyped the name of
a locale when you set up your environment. If this environment variable
-is absent, or has a value which does not evaluate to integer zero - that
-is "0" or "" - Perl will complain about locale setting failures.
+is absent, or has a value that does not evaluate to integer zero--that
+is, "0" or ""--Perl will complain about locale setting failures.
B<NOTE>: PERL_BADLANG only gives you a way to hide the warning message.
The message tells about some problem in your system's locale support,
@@ -773,7 +768,7 @@ for controlling an application's opinion on data.
=item LC_ALL
-C<LC_ALL> is the "override-all" locale environment variable. If it is
+C<LC_ALL> is the "override-all" locale environment variable. If
set, it overrides all the rest of the locale environment variables.
=item LC_CTYPE
@@ -819,23 +814,22 @@ category-specific C<LC_...>.
=head2 Backward compatibility
Versions of Perl prior to 5.004 B<mostly> ignored locale information,
-generally behaving as if something similar to the C<"C"> locale (see
-L<The setlocale function>) was always in force, even if the program
-environment suggested otherwise. By default, Perl still behaves this
-way so as to maintain backward compatibility. If you want a Perl
-application to pay attention to locale information, you B<must> use
-the S<C<use locale>> pragma (see L<The use locale Pragma>) to
-instruct it to do so.
+generally behaving as if something similar to the C<"C"> locale were
+always in force, even if the program environment suggested otherwise
+(see L<The setlocale function>). By default, Perl still behaves this
+way for backward compatibility. If you want a Perl application to pay
+attention to locale information, you B<must> use the S<C<use locale>>
+pragma (see L<The use locale Pragma>) to instruct it to do so.
Versions of Perl from 5.002 to 5.003 did use the C<LC_CTYPE>
-information if that was available, that is, C<\w> did understand what
-are the letters according to the locale environment variables.
+information if available; that is, C<\w> did understand what
+were the letters according to the locale environment variables.
The problem was that the user had no control over the feature:
if the C library supported locales, Perl used them.
=head2 I18N:Collate obsolete
-In versions of Perl prior to 5.004 per-locale collation was possible
+In versions of Perl prior to 5.004, per-locale collation was possible
using the C<I18N::Collate> library module. This module is now mildly
obsolete and should be avoided in new applications. The C<LC_COLLATE>
functionality is now integrated into the Perl core language: One can
@@ -856,7 +850,7 @@ system's implementation of the locale system than by Perl.
=head2 write() and LC_NUMERIC
-Formats are the only part of Perl which unconditionally use information
+Formats are the only part of Perl that unconditionally use information
from a program's locale; if a program's environment specifies an
LC_NUMERIC locale, it is always used to specify the decimal point
character in formatted output. Formatted output cannot be controlled by
@@ -869,7 +863,7 @@ structure.
There is a large collection of locale definitions at
C<ftp://dkuug.dk/i18n/WG15-collection>. You should be aware that it is
unsupported, and is not claimed to be fit for any purpose. If your
-system allows the installation of arbitrary locales, you may find the
+system allows installation of arbitrary locales, you may find the
definitions useful as they are, or as a basis for the development of
your own locales.
@@ -895,12 +889,12 @@ standard we've got. This may be construed as a bug.
=head2 Broken systems
-In certain system environments the operating system's locale support
+In certain systems, the operating system's locale support
is broken and cannot be fixed or used by Perl. Such deficiencies can
and will result in mysterious hangs and/or Perl core dumps when the
C<use locale> is in effect. When confronted with such a system,
please report in excruciating detail to <F<perlbug@perl.com>>, and
-complain to your vendor: maybe some bug fixes exist for these problems
+complain to your vendor: bug fixes may exist for these problems
in your operating system. Sometimes such bug fixes are called an
operating system upgrade.
@@ -941,6 +935,7 @@ L<POSIX (3)/strxfrm>
=head1 HISTORY
Jarkko Hietaniemi's original F<perli18n.pod> heavily hacked by Dominic
-Dunlop, assisted by the perl5-porters.
+Dunlop, assisted by the perl5-porters. Prose worked over a bit by
+Tom Christiansen.
-Last update: Mon Nov 17 22:48:48 EET 1997
+Last update: Thu Jun 11 08:44:13 MDT 1998
diff --git a/pod/perllol.pod b/pod/perllol.pod
index 1de3b1ad74..0e6796b50f 100644
--- a/pod/perllol.pod
+++ b/pod/perllol.pod
@@ -26,7 +26,7 @@ a declaration of the array:
bart
Now you should be very careful that the outer bracket type
-is a round one, that is, parentheses. That's because you're assigning to
+is a round one, that is, a parenthesis. That's because you're assigning to
an @list, so you need parentheses. If you wanted there I<not> to be an @LoL,
but rather just a reference to it, you could do something more like this:
@@ -144,17 +144,7 @@ you'd have to do something like this:
push @$ref_to_LoL, [ split ];
}
-Actually, if you were using strict, you'd have to declare not only
-$ref_to_LoL as you had to declare @LoL, but you'd I<also> having to
-initialize it to a reference to an empty list. (This was a bug in
-perl version 5.001m that's been fixed for the 5.002 release.)
-
- my $ref_to_LoL = [];
- while (<>) {
- push @$ref_to_LoL, [ split ];
- }
-
-Ok, now you can add new rows. What about adding new columns? If you're
+Now you can add new rows. What about adding new columns? If you're
dealing with just matrices, it's often easiest to use simple assignment:
for $x (1 .. 10) {
@@ -310,4 +300,4 @@ perldata(1), perlref(1), perldsc(1)
Tom Christiansen <F<tchrist@perl.com>>
-Last udpate: Sat Oct 7 19:35:26 MDT 1995
+Last update: Thu Jun 4 16:16:23 MDT 1998
diff --git a/pod/perlmod.pod b/pod/perlmod.pod
index 942f216dda..2a0f6fecb6 100644
--- a/pod/perlmod.pod
+++ b/pod/perlmod.pod
@@ -7,19 +7,20 @@ perlmod - Perl modules (packages and symbol tables)
=head2 Packages
Perl provides a mechanism for alternative namespaces to protect packages
-from stomping on each other's variables. In fact, apart from certain
-magical variables, there's really no such thing as a global variable
-in Perl. The package statement declares the compilation unit as
+from stomping on each other's variables. In fact, there's really no such
+thing as a global variable in Perl (although some identifiers default
+to the main package instead of the current one). The package statement
+declares the compilation unit as
being in the given namespace. The scope of the package declaration
is from the declaration itself through the end of the enclosing block,
C<eval>, C<sub>, or end of file, whichever comes first (the same scope
as the my() and local() operators). All further unqualified dynamic
-identifiers will be in this namespace. A package statement affects
-only dynamic variables--including those you've used local() on--but
+identifiers will be in this namespace. A package statement only affects
+dynamic variables--including those you've used local() on--but
I<not> lexical variables created with my(). Typically it would be
the first declaration in a file to be included by the C<require> or
C<use> operator. You can switch into a package in more than one place;
-it influences merely which symbol table is used by the compiler for the
+it merely influences which symbol table is used by the compiler for the
rest of that block. You can refer to variables and filehandles in other
packages by prefixing the identifier with the package name and a double
colon: C<$Package::Variable>. If the package name is null, the C<main>
@@ -39,13 +40,13 @@ It would treat package C<INNER> as a totally separate global package.
Only identifiers starting with letters (or underscore) are stored in a
package's symbol table. All other symbols are kept in package C<main>,
-including all of the punctuation variables like $_. In addition, the
-identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC, and SIG are
-forced to be in package C<main>, even when used for other purposes than
-their builtin one. Note also that, if you have a package called C<m>,
-C<s>, or C<y>, then you can't use the qualified form of an identifier
-because it will be interpreted instead as a pattern match, a substitution,
-or a transliteration.
+including all of the punctuation variables like $_. In addition, when
+unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV,
+INC, and SIG are forced to be in package C<main>, even when used for other
+purposes than their builtin one. Note also that, if you have a package
+called C<m>, C<s>, or C<y>, then you can't use the qualified form of an
+identifier because it will be interpreted instead as a pattern match,
+a substitution, or a transliteration.
(Variables beginning with underscore used to be forced into package
main, but we decided it was more useful for package writers to be able
@@ -85,62 +86,29 @@ table lookups at compile time:
local $main::{foo} = $main::{bar};
You can use this to print out all the variables in a package, for
-instance. Here is F<dumpvar.pl> from the Perl library:
-
- package dumpvar;
- sub main::dumpvar {
- ($package) = @_;
- local(*stab) = eval("*${package}::");
- while (($key,$val) = each(%stab)) {
- local(*entry) = $val;
- if (defined $entry) {
- print "\$$key = '$entry'\n";
- }
-
- if (defined @entry) {
- print "\@$key = (\n";
- foreach $num ($[ .. $#entry) {
- print " $num\t'",$entry[$num],"'\n";
- }
- print ")\n";
- }
-
- if ($key ne "${package}::" && defined %entry) {
- print "\%$key = (\n";
- foreach $key (sort keys(%entry)) {
- print " $key\t'",$entry{$key},"'\n";
- }
- print ")\n";
- }
- }
- }
-
-Note that even though the subroutine is compiled in package C<dumpvar>,
-the name of the subroutine is qualified so that its name is inserted into
-package C<main>. While popular many years ago, this is now considered
-very poor style; in general, you should be writing modules and using the
-normal export mechanism instead of hammering someone else's namespace,
-even main's.
+instance. The standard F<dumpvar.pl> library and the CPAN module
+Devel::Symdump make use of this.
Assignment to a typeglob performs an aliasing operation, i.e.,
*dick = *richard;
-causes variables, subroutines, and file handles accessible via the
-identifier C<richard> to also be accessible via the identifier C<dick>. If
-you want to alias only a particular variable or subroutine, you can
-assign a reference instead:
+causes variables, subroutines, formats, and file and directory handles
+accessible via the identifier C<richard> also to be accessible via the
+identifier C<dick>. If you want to alias only a particular variable or
+subroutine, you can assign a reference instead:
*dick = \$richard;
-makes $richard and $dick the same variable, but leaves
+Which makes $richard and $dick the same variable, but leaves
@richard and @dick as separate arrays. Tricky, eh?
This mechanism may be used to pass and return cheap references
into or from subroutines if you won't want to copy the whole
-thing.
+thing. It only works when assigning to dynamic variables, not
+lexicals.
- %some_hash = ();
+ %some_hash = (); # can't be my()
*some_hash = fn( \%another_hash );
sub fn {
local *hashsym = shift;
@@ -161,14 +129,15 @@ Another use of symbol tables is for making "constant" scalars.
*PI = \3.14159265358979;
Now you cannot alter $PI, which is probably a good thing all in all.
-This isn't the same as a constant subroutine (one prototyped to
-take no arguments and to return a constant expression), which is
-subject to optimization at compile-time. This isn't. See L<perlsub>
-for details on these.
+This isn't the same as a constant subroutine, which is subject to
+optimization at compile-time. This isn't. A constant subroutine is one
+prototyped to take no arguments and to return a constant expression.
+See L<perlsub> for details on these. The C<use constant> pragma is a
+convenient shorthand for these.
You can say C<*foo{PACKAGE}> and C<*foo{NAME}> to find out what name and
package the *foo symbol table entry comes from. This may be useful
-in a subroutine which is passed typeglobs as arguments
+in a subroutine that gets passed typeglobs as arguments:
sub identify_typeglob {
my $glob = shift;
@@ -200,27 +169,32 @@ files in time to be visible to the rest of the file. Once a C<BEGIN>
has run, it is immediately undefined and any code it used is returned to
Perl's memory pool. This means you can't ever explicitly call a C<BEGIN>.
-An C<END> subroutine is executed as late as possible, that is, when the
-interpreter is being exited, even if it is exiting as a result of a
-die() function. (But not if it's is being blown out of the water by a
-signal--you have to trap that yourself (if you can).) You may have
-multiple C<END> blocks within a file--they will execute in reverse
-order of definition; that is: last in, first out (LIFO).
+An C<END> subroutine is executed as late as possible, that is, when
+the interpreter is being exited, even if it is exiting as a result of
+a die() function. (But not if it's polymorphing into another program
+via C<exec>, or being blown out of the water by a signal--you have to
+trap that yourself (if you can).) You may have multiple C<END> blocks
+within a file--they will execute in reverse order of definition; that is:
+last in, first out (LIFO).
-Inside an C<END> subroutine C<$?> contains the value that the script is
+Inside an C<END> subroutine, C<$?> contains the value that the script is
going to pass to C<exit()>. You can modify C<$?> to change the exit
value of the script. Beware of changing C<$?> by accident (e.g. by
running something via C<system>).
-Note that when you use the B<-n> and B<-p> switches to Perl, C<BEGIN>
-and C<END> work just as they do in B<awk>, as a degenerate case.
+Note that when you use the B<-n> and B<-p> switches to Perl, C<BEGIN> and
+C<END> work just as they do in B<awk>, as a degenerate case. As currently
+implemented (and subject to change, since its inconvenient at best),
+both C<BEGIN> I<and> C<END> blocks are run when you use the B<-c> switch
+for a compile-only syntax check, although your main code is not.
=head2 Perl Classes
There is no special class syntax in Perl, but a package may function
-as a class if it provides subroutines that function as methods. Such a
-package may also derive some of its methods from another class package
-by listing the other package name in its @ISA array.
+as a class if it provides subroutines to act as methods. Such a
+package may also derive some of its methods from another class (package)
+by listing the other package name in its global @ISA array (which
+must be a package global, not a lexical).
For more on this, see L<perltoot> and L<perlobj>.
@@ -310,11 +284,11 @@ or
This is exactly equivalent to
- BEGIN { require "Module.pm"; import Module; }
+ BEGIN { require Module; import Module; }
or
- BEGIN { require "Module.pm"; import Module LIST; }
+ BEGIN { require Module; import Module LIST; }
As a special case
@@ -322,7 +296,7 @@ As a special case
is exactly equivalent to
- BEGIN { require "Module.pm"; }
+ BEGIN { require Module; }
All Perl module files have the extension F<.pm>. C<use> assumes this so
that you don't have to spell out "F<Module.pm>" in quotes. This also
@@ -331,6 +305,19 @@ Module names are also capitalized unless they're functioning as pragmas,
"Pragmas" are in effect compiler directives, and are sometimes called
"pragmatic modules" (or even "pragmata" if you're a classicist).
+The two statements:
+
+ require SomeModule;
+ require "SomeModule.pm";
+
+differ from each other in two ways. In the first case, any double
+colons in the module name, such as C<Some::Module>, are translated
+into your system's directory separator, usually "/". The second
+case does not, and would have to be specified literally. The other difference
+is that seeing the first C<require> clues in the compiler that uses of
+indirect object notation involving "SomeModule", as in C<$ob = purge SomeModule>,
+are method calls, not function calls. (Yes, this really can make a difference.)
+
Because the C<use> statement implies a C<BEGIN> block, the importation
of semantics happens at the moment the C<use> statement is compiled,
before the rest of the file is compiled. This is how it is able
@@ -348,7 +335,11 @@ instead of C<use>. With require you can get into this problem:
require Cwd; # make Cwd:: accessible
$here = getcwd(); # oops! no main::getcwd()
-In general C<use Module ();> is recommended over C<require Module;>.
+In general, C<use Module ()> is recommended over C<require Module>,
+because it determines module availability at compile time, not in the
+middle of your program's execution. An exception would be if two modules
+each tried to C<use> each other, and each also called a function from
+that other module. In that case, it's easy to use C<require>s instead.
Perl packages may be nested inside other package names, so we can have
package names containing C<::>. But if we used that package name
diff --git a/pod/perlmodlib.pod b/pod/perlmodlib.pod
index 6e4da5e307..9511f55df4 100644
--- a/pod/perlmodlib.pod
+++ b/pod/perlmodlib.pod
@@ -271,7 +271,7 @@ supply object methods for filehandles
=item FindBin
-locate directory of original perl script
+locate directory of original Perl script
=item GDBM_File
@@ -368,7 +368,7 @@ by-name interface to Perl's builtin getserv*() functions
=item Opcode
-disable named opcodes when compiling or running perl code
+disable named opcodes when compiling or running Perl code
=item Pod::Text
@@ -400,7 +400,7 @@ load functions only on demand
=item Shell
-run shell commands transparently within perl
+run shell commands transparently within Perl
=item Socket
@@ -432,7 +432,7 @@ interface to various C<readline> packages
=item Test::Harness
-run perl standard test scripts with statistics
+run Perl standard test scripts with statistics
=item Text::Abbrev
@@ -503,7 +503,7 @@ by-name interface to Perl's builtin getpw*() functions
To find out I<all> the modules installed on your system, including
those without documentation or outside the standard release, do this:
- find `perl -e 'print "@INC"'` -name '*.pm' -print
+ % find `perl -e 'print "@INC"'` -name '*.pm' -print
They should all have their own documentation installed and accessible via
your system man(1) command. If that fails, try the I<perldoc> program.
@@ -762,7 +762,7 @@ Avoid C<$r-E<gt>Class::func()> where using C<@ISA=qw(... Class ...)> and
C<$r-E<gt>func()> would work (see L<perlbot> for more details).
Use autosplit so little used or newly added functions won't be a
-burden to programs which don't use them. Add test functions to
+burden to programs that don't use them. Add test functions to
the module after __END__ either using AutoSplit or by saying:
eval join('',<main::DATA>) || die $@ unless caller();
@@ -779,12 +779,12 @@ information in objects.
Always use B<-w>. Try to C<use strict;> (or C<use strict qw(...);>).
Remember that you can add C<no strict qw(...);> to individual blocks
-of code which need less strictness. Always use B<-w>. Always use B<-w>!
+of code that need less strictness. Always use B<-w>. Always use B<-w>!
Follow the guidelines in the perlstyle(1) manual.
=item Some simple style guidelines
-The perlstyle manual supplied with perl has many helpful points.
+The perlstyle manual supplied with Perl has many helpful points.
Coding style is a matter of personal taste. Many people evolve their
style over several years as they learn what helps them write and
@@ -804,7 +804,7 @@ use mixed case with no underscores (need to be short and portable).
You may find it helpful to use letter case to indicate the scope
or nature of a variable. For example:
- $ALL_CAPS_HERE constants only (beware clashes with perl vars)
+ $ALL_CAPS_HERE constants only (beware clashes with Perl vars)
$Some_Caps_Here package-wide global/static
$no_caps_here function scope my() or local() variables
@@ -934,7 +934,7 @@ GPL and The Artistic Licence (see the files README, Copying, and
Artistic). Larry has good reasons for NOT just using the GNU GPL.
My personal recommendation, out of respect for Larry, Perl, and the
-perl community at large is to state something simply like:
+Perl community at large is to state something simply like:
Copyright (c) 1995 Your Name. All rights reserved.
This program is free software; you can redistribute it and/or
@@ -969,7 +969,7 @@ If possible you should place the module into a major ftp archive and
include details of its location in your announcement.
Some notes about ftp archives: Please use a long descriptive file
-name which includes the version number. Most incoming directories
+name that includes the version number. Most incoming directories
will not be readable/listable, i.e., you won't be able to see your
file after uploading it. Remember to send your email notification
message as soon as possible after uploading else your file may get
@@ -1019,7 +1019,7 @@ there is no need to convert a .pl file into a Module for just that.
=item Consider the implications.
-All the perl applications which make use of the script will need to
+All Perl applications that make use of the script will need to
be changed (slightly) if the script is converted into a module. Is
it worth it unless you plan to make other changes at the same time?
@@ -1062,7 +1062,7 @@ Don't delete the original .pl file till the new .pm one works!
=item Complete applications rarely belong in the Perl Module Library.
-=item Many applications contain some perl code which could be reused.
+=item Many applications contain some Perl code that could be reused.
Help save the world! Share your code in a form that makes it easy
to reuse.
@@ -1076,9 +1076,9 @@ to reuse.
fragment of code built on top of the reusable modules. In these cases
the application could invoked as:
- perl -e 'use Module::Name; method(@ARGV)' ...
+ % perl -e 'use Module::Name; method(@ARGV)' ...
or
- perl -mModule::Name ... (in perl5.002 or higher)
+ % perl -mModule::Name ... (in perl5.002 or higher)
=back
diff --git a/pod/perlobj.pod b/pod/perlobj.pod
index 3d7bee8647..f10fbdfe2e 100644
--- a/pod/perlobj.pod
+++ b/pod/perlobj.pod
@@ -44,12 +44,28 @@ constructor:
package Critter;
sub new { bless {} }
-The C<{}> constructs a reference to an anonymous hash containing no
-key/value pairs. The bless() takes that reference and tells the object
-it references that it's now a Critter, and returns the reference.
-This is for convenience, because the referenced object itself knows that
-it has been blessed, and the reference to it could have been returned
-directly, like this:
+That word C<new> isn't special. You could have written
+a construct this way, too:
+
+ package Critter;
+ sub spawn { bless {} }
+
+In fact, this might even be preferable, because the C++ programmers won't
+be tricked into thinking that C<new> works in Perl as it does in C++.
+It doesn't. We recommend that you name your constructors whatever
+makes sense in the context of the problem you're solving. For example,
+constructors in the Tk extension to Perl are named after the widgets
+they create.
+
+One thing that's different about Perl constructors compared with those in
+C++ is that in Perl, they have to allocate their own memory. (The other
+things is that they don't automatically call overridden base-class
+constructors.) The C<{}> allocates an anonymous hash containing no
+key/value pairs, and returns it The bless() takes that reference and
+tells the object it references that it's now a Critter, and returns
+the reference. This is for convenience, because the referenced object
+itself knows that it has been blessed, and the reference to it could
+have been returned directly, like this:
sub new {
my $self = {};
@@ -61,7 +77,7 @@ In fact, you often see such a thing in more complicated constructors
that wish to call methods in the class as part of the construction:
sub new {
- my $self = {}
+ my $self = {};
bless $self;
$self->initialize();
return $self;
@@ -75,7 +91,7 @@ so that your constructors may be inherited:
sub new {
my $class = shift;
my $self = {};
- bless $self, $class
+ bless $self, $class;
$self->initialize();
return $self;
}
@@ -89,7 +105,7 @@ object into:
my $this = shift;
my $class = ref($this) || $this;
my $self = {};
- bless $self, $class
+ bless $self, $class;
$self->initialize();
return $self;
}
@@ -103,7 +119,8 @@ A constructor may re-bless a referenced object currently belonging to
another class, but then the new class is responsible for all cleanup
later. The previous blessing is forgotten, as an object may belong
to only one class at a time. (Although of course it's free to
-inherit methods from many classes.)
+inherit methods from many classes.) If you find yourself having to
+do this, the parent class is probably misbehaving, though.
A clarification: Perl objects are blessed. References are not. Objects
know which package they belong to. References do not. The bless()
@@ -124,7 +141,7 @@ Unlike say C++, Perl doesn't provide any special syntax for class
definitions. You use a package as a class by putting method
definitions into the class.
-There is a special array within each package called @ISA which says
+There is a special array within each package called @ISA, which says
where else to look for a method if you can't find it in the current
package. This is how Perl implements inheritance. Each element of the
@ISA array is just the name of another package that happens to be a
@@ -132,33 +149,44 @@ class package. The classes are searched (depth first) for missing
methods in the order that they occur in @ISA. The classes accessible
through @ISA are known as base classes of the current class.
+All classes implicitly inherit from class C<UNIVERSAL> as their
+last base class. Several commonly used methods are automatically
+supplied in the UNIVERSAL class; see L<"Default UNIVERSAL methods"> for
+more details.
+
If a missing method is found in one of the base classes, it is cached
in the current class for efficiency. Changing @ISA or defining new
subroutines invalidates the cache and causes Perl to do the lookup again.
-If a method isn't found, but an AUTOLOAD routine is found, then
-that is called on behalf of the missing method.
-
-If neither a method nor an AUTOLOAD routine is found in @ISA, then one
-last try is made for the method (or an AUTOLOAD routine) in a class
-called UNIVERSAL. (Several commonly used methods are automatically
-supplied in the UNIVERSAL class; see L<"Default UNIVERSAL methods"> for
-more details.) If that doesn't work, Perl finally gives up and
-complains.
-
-Perl classes do only method inheritance. Data inheritance is left
-up to the class itself. By and large, this is not a problem in Perl,
-because most classes model the attributes of their object using
-an anonymous hash, which serves as its own little namespace to be
-carved up by the various classes that might want to do something
-with the object.
+If neither the current class, its named base classes, nor the UNIVERSAL
+class contains the requested method, these three places are searched
+all over again, this time looking for a method named AUTOLOAD(). If an
+AUTOLOAD is found, this method is called on behalf of the missing method,
+setting the package global $AUTOLOAD to be the fully qualified name of
+the method that was intended to be called.
+
+If none of that works, Perl finally gives up and complains.
+
+Perl classes do method inheritance only. Data inheritance is left up
+to the class itself. By and large, this is not a problem in Perl,
+because most classes model the attributes of their object using an
+anonymous hash, which serves as its own little namespace to be carved up
+by the various classes that might want to do something with the object.
+The only problem with this is that you can't sure that you aren't using
+a piece of the hash that isn't already used. A reasonable workaround
+is to prepend your fieldname in the hash with the package name.
+
+ sub bump {
+ my $self = shift;
+ $self->{ __PACKAGE__ . ".count"}++;
+ }
=head2 A Method is Simply a Subroutine
Unlike say C++, Perl doesn't provide any special syntax for method
definition. (It does provide a little syntax for method invocation
though. More on that later.) A method expects its first argument
-to be the object or package it is being invoked on. There are just two
+to be the object (reference) or package (string) it is being invoked on. There are just two
types of methods, which we'll call class and instance.
(Sometimes you'll hear these called static and virtual, in honor of
the two C++ method types they most closely resemble.)
@@ -291,7 +319,7 @@ allows the ability to check what a reference points to. Example
use UNIVERSAL qw(isa);
if(isa($ref, 'ARRAY')) {
- ...
+ #...
}
=item can(METHOD)
@@ -352,18 +380,47 @@ elsewhere.
=head2 WARNING
-An indirect object is limited to a name, a scalar variable, or a block,
-because it would have to do too much lookahead otherwise, just like any
-other postfix dereference in the language. The left side of -E<gt> is not so
-limited, because it's an infix operator, not a postfix operator.
+While indirect object syntax may well be appealing to English speakers and
+to C++ programmers, be not seduced! It suffers from two grave problems.
-That means that in the following, A and B are equivalent to each other, and
-C and D are equivalent, but A/B and C/D are different:
+The first problem is that an indirect object is limited to a name,
+a scalar variable, or a block, because it would have to do too much
+lookahead otherwise, just like any other postfix dereference in the
+language. (These are the same quirky rules as are used for the filehandle
+slot in functions like C<print> and C<printf>.) This can lead to horribly
+confusing precedence problems, as in these next two lines:
- A: method $obref->{"fieldname"}
- B: (method $obref)->{"fieldname"}
- C: $obref->{"fieldname"}->method()
- D: method {$obref->{"fieldname"}}
+ move $obj->{FIELD}; # probably wrong!
+ move $ary[$i]; # probably wrong!
+
+Those actually parse as the very surprising:
+
+ $obj->move->{FIELD}; # Well, lookee here
+ $ary->move->[$i]; # Didn't expect this one, eh?
+
+Rather than what you might have expected:
+
+ $obj->{FIELD}->move(); # You should be so lucky.
+ $ary[$i]->move; # Yeah, sure.
+
+The left side of ``-E<gt>'' is not so limited, because it's an infix operator,
+not a postfix operator.
+
+As if that weren't bad enough, think about this: Perl must guess I<at
+compile time> whether C<name> and C<move> above are functions or methods.
+Usually Perl gets it right, but when it doesn't it, you get a function
+call compiled as a method, or vice versa. This can introduce subtle
+bugs that are hard to unravel. For example, calling a method C<new>
+in indirect notation--as C++ programmers are so wont to do--can
+be miscompiled into a subroutine call if there's already a C<new>
+function in scope. You'd end up calling the current package's C<new>
+as a subroutine, rather than the desired class's method. The compiler
+tries to cheat by remembering bareword C<require>s, but the grief if it
+messes up just isn't worth the years of debugging it would likely take
+you to to track such subtle bugs down.
+
+The infix arrow notation using ``C<-E<gt>>'' doesn't suffer from either
+of these disturbing ambiguities, so we recommend you use it exclusively.
=head2 Summary
@@ -470,6 +527,11 @@ C<-DDEBUGGING> was enabled during perl build time.
A more complete garbage collection strategy will be implemented
at a future date.
+In the meantime, the best solution is to create a non-recursive container
+class that holds a pointer to the self-referential data structure.
+Define a DESTROY method for the containing object's class that manually
+breaks the circularities in the self-referential structure.
+
=head1 SEE ALSO
A kinder, gentler tutorial on object-oriented programming in Perl can
diff --git a/pod/perlop.pod b/pod/perlop.pod
index cae38ebf55..fe6ba1e90f 100644
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -38,6 +38,8 @@ operate on scalar values only, not array values.
In the following sections, these operators are covered in precedence order.
+Many operators can be overloaded for objects. See L<overload>.
+
=head1 DESCRIPTION
=head2 Terms and List Operators (Leftward)
@@ -114,7 +116,7 @@ The auto-increment operator has a little extra builtin magic to it. If
you increment a variable that is numeric, or that has ever been used in
a numeric context, you get a normal increment. If, however, the
variable has been used in only string contexts since it was set, and
-has a value that is not null and matches the pattern
+has a value that is not the empty string and matches the pattern
C</^[a-zA-Z]*[0-9]*$/>, the increment is done as a string, preserving each
character within its range, with carry:
@@ -144,8 +146,9 @@ starts with a plus or minus, a string starting with the opposite sign
is returned. One effect of these rules is that C<-bareword> is equivalent
to C<"-bareword">.
-Unary "~" performs bitwise negation, i.e., 1's complement.
-(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
+Unary "~" performs bitwise negation, i.e., 1's complement. For example,
+C<0666 &~ 027> is 0640. (See also L<Integer Arithmetic> and L<Bitwise
+String Operators>.)
Unary "+" has no effect whatsoever, even on strings. It is useful
syntactically for separating a function name from a parenthesized expression
@@ -184,16 +187,18 @@ operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is
C<$a> minus the largest multiple of C<$b> that is not greater than
C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the
smallest multiple of C<$b> that is not less than C<$a> (i.e. the
-result will be less than or equal to zero).
+result will be less than or equal to zero). If C<use integer> is
+in effect, the native hardware will be used instead of this rule,
+which may be construed a bug that will be fixed at some point.
-Note than when C<use integer> is in scope "%" give you direct access
+Note than when C<use integer> is in scope, "%" give you direct access
to the modulus operator as implemented by your C compiler. This
operator is not as well defined for negative operands, but it will
execute faster.
-Binary "x" is the repetition operator. In a scalar context, it
+Binary "x" is the repetition operator. In scalar context, it
returns a string consisting of the left operand repeated the number of
-times specified by the right operand. In a list context, if the left
+times specified by the right operand. In list context, if the left
operand is a list in parentheses, it repeats the list.
print '-' x 80; # print row of dashes
@@ -336,11 +341,18 @@ way to find out the home directory (assuming it's not "0") might be:
$home = $ENV{'HOME'} || $ENV{'LOGDIR'} ||
(getpwuid($<))[7] || die "You're homeless!\n";
-As more readable alternatives to C<&&> and C<||>, Perl provides "and" and
-"or" operators (see below). The short-circuit behavior is identical. The
-precedence of "and" and "or" is much lower, however, so that you can
-safely use them after a list operator without the need for
-parentheses:
+In particular, this means that you shouldn't use this
+for selecting between two aggregates for assignment:
+
+ @a = @b || @c; # this is wrong
+ @a = scalar(@b) || @c; # really meant this
+ @a = @b ? @b : @c; # this works fine, though
+
+As more readable alternatives to C<&&> and C<||> when used for
+control flow, Perl provides C<and> and C<or> operators (see below).
+The short-circuit behavior is identical. The precedence of "and" and
+"or" is much lower, however, so that you can safely use them after a
+list operator without the need for parentheses:
unlink "alpha", "beta", "gamma"
or gripe(), next LINE;
@@ -350,10 +362,12 @@ With the C-style operators that would have been written like this:
unlink("alpha", "beta", "gamma")
|| (gripe(), next LINE);
-=head2 Range Operator
+Use "or" for assignment is unlikely to do what you want; see below.
+
+=head2 Range Operators
Binary ".." is the range operator, which is really two different
-operators depending on the context. In a list context, it returns an
+operators depending on the context. In list context, it returns an
array of values counting (by ones) from the left value to the right
value. This is useful for writing C<for (1..10)> loops and for doing
slice operations on arrays. Be aware that under the current implementation,
@@ -364,7 +378,7 @@ write something like this:
# code
}
-In a scalar context, ".." returns a boolean value. The operator is
+In scalar context, ".." returns a boolean value. The operator is
bistable, like a flip-flop, and emulates the line-range (comma) operator
of B<sed>, B<awk>, and various editors. Each ".." operator maintains its
own boolean state. It is false as long as its left operand is false.
@@ -378,7 +392,7 @@ If you don't want it to test the right operand till the next evaluation
operand is not evaluated while the operator is in the "false" state, and
the left operand is not evaluated while the operator is in the "true"
state. The precedence is a little lower than || and &&. The value
-returned is either the null string for false, or a sequence number
+returned is either the empty string for false, or a sequence number
(beginning with 1) for true. The sequence number is reset for each range
encountered. The final sequence number in a range has the string "E0"
appended to it, which doesn't affect its numeric value, but gives you
@@ -394,13 +408,22 @@ As a scalar operator:
next line if (1 .. /^$/); # skip header lines
s/^/> / if (/^$/ .. eof()); # quote body
+ # parse mail messages
+ while (<>) {
+ $in_header = 1 .. /^$/;
+ $in_body = /^$/ .. eof();
+ # do something based on those
+ } continue {
+ close ARGV if eof; # reset $. each file
+ }
+
As a list operator:
for (101 .. 200) { print; } # print $_ 100 times
@foo = @foo[0 .. $#foo]; # an expensive no-op
@foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
-The range operator (in a list context) makes use of the magical
+The range operator (in list context) makes use of the magical
auto-increment algorithm if the operands are strings. You
can say
@@ -443,6 +466,19 @@ legal lvalues (meaning that you can assign to them):
This is not necessarily guaranteed to contribute to the readability of your program.
+Because this operator produces an assignable result, using assignments
+without parentheses will get you in trouble. For example, this:
+
+ $a % 2 ? $a += 10 : $a += 2
+
+Really means this:
+
+ (($a % 2) ? ($a += 10) : $a) += 2
+
+Rather than this:
+
+ ($a % 2) ? ($a += 10) : ($a += 2)
+
=head2 Assignment Operators
"=" is the ordinary assignment operator.
@@ -485,11 +521,11 @@ is equivalent to
=head2 Comma Operator
-Binary "," is the comma operator. In a scalar context it evaluates
+Binary "," is the comma operator. In scalar context it evaluates
its left argument, throws that value away, then evaluates its right
argument and returns that value. This is just like C's comma operator.
-In a list context, it's just the list argument separator, and inserts
+In list context, it's just the list argument separator, and inserts
both its arguments into the list.
The =E<gt> digraph is mostly just a synonym for the comma operator. It's useful for
@@ -524,9 +560,27 @@ expression is evaluated only if the left expression is true.
=head2 Logical or and Exclusive Or
Binary "or" returns the logical disjunction of the two surrounding
-expressions. It's equivalent to || except for the very low
-precedence. This means that it short-circuits: i.e., the right
-expression is evaluated only if the left expression is false.
+expressions. It's equivalent to || except for the very low precedence.
+This makes it useful for control flow
+
+ print FH $data or die "Can't write to FH: $!";
+
+This means that it short-circuits: i.e., the right expression is evaluated
+only if the left expression is false. Due to its precedence, you should
+probably avoid using this for assignment, only for control flow.
+
+ $a = $b or $c; # bug: this is wrong
+ ($a = $b) or $c; # really means this
+ $a = $b || $c; # better written this way
+
+However, when it's a list context assignment and you're trying to use
+"||" for control flow, you probably need "or" so that the assignment
+takes higher precedence.
+
+ @info = stat($file) || die; # oops, scalar sense of stat!
+ @info = stat($file) or die; # better, now @info gets its due
+
+Then again, you could always use parentheses.
Binary "xor" returns the exclusive-OR of the two surrounding expressions.
It cannot short circuit, of course.
@@ -586,7 +640,7 @@ or "C<@>" are interpolated, as are the following sequences. Within
a transliteration, the first ten of these sequences may be used.
\t tab (HT, TAB)
- \n newline (LF, NL)
+ \n newline (NL)
\r return (CR)
\f form feed (FF)
\b backspace (BS)
@@ -606,6 +660,20 @@ a transliteration, the first ten of these sequences may be used.
If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
and C<\U> is taken from the current locale. See L<perllocale>.
+All systems use the virtual C<"\n"> to represent a line terminator,
+called a "newline". There is no such thing as an unvarying, physical
+newline character. It is an illusion that the operating system,
+device drivers, C libraries, and Perl all conspire to preserve. Not all
+systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example,
+on a Mac, these are reversed, and on systems without line terminator,
+printing C<"\n"> may emit no actual data. In general, use C<"\n"> when
+you mean a "newline" for your system, but use the literal ASCII when you
+need an exact character. For example, most networking protocols expect
+and prefer a CR+LF (C<"\012\015"> or C<"\cJ\cM">) for line terminators,
+and although they often accept just C<"\012">, they seldom tolerate just
+C<"\015">. If you get in the habit of using C<"\n"> for networking,
+you may be burned some day.
+
You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
An unescaped C<$> or C<@> interpolates the corresponding variable,
while escaping will cause the literal string C<\$> to be inserted.
@@ -637,6 +705,14 @@ optimization when you want to see only the first occurrence of
something in each file of a set of files, for instance. Only C<??>
patterns local to the current package are reset.
+ while (<>) {
+ if (?^$?) {
+ # blank line between header and body
+ }
+ } continue {
+ reset if eof; # clear ?? status for next file
+ }
+
This usage is vaguely deprecated, and may be removed in some future
version of Perl.
@@ -644,13 +720,13 @@ version of Perl.
=item /PATTERN/cgimosx
-Searches a string for a pattern match, and in a scalar context returns
+Searches a string for a pattern match, and in scalar context returns
true (1) or false (''). If no string is specified via the C<=~> or
C<!~> operator, the $_ string is searched. (The string specified with
C<=~> need not be an lvalue--it may be the result of an expression
evaluation, but remember the C<=~> binds rather tightly.) See also
L<perlre>.
-See L<perllocale> for discussion of additional considerations which apply
+See L<perllocale> for discussion of additional considerations that apply
when C<use locale> is in effect.
Options are:
@@ -680,8 +756,8 @@ interpolating won't change over the life of the script. However, mentioning
C</o> constitutes a promise that you won't change the variables in the pattern.
If you change them, Perl won't even notice.
-If the PATTERN evaluates to a null string, the last
-successfully matched regular expression is used instead.
+If the PATTERN evaluates to the empty string, the last
+I<successfully> matched regular expression is used instead.
If used in a context that requires a list value, a pattern match returns a
list consisting of the subexpressions matched by the parentheses in the
@@ -714,12 +790,12 @@ the pattern matched.
The C</g> modifier specifies global pattern matching--that is, matching
as many times as possible within the string. How it behaves depends on
-the context. In a list context, it returns a list of all the
+the context. In list context, it returns a list of all the
substrings matched by all the parentheses in the regular expression.
If there are no parentheses, it returns a list of all the matched
strings, as if there were parentheses around the whole pattern.
-In a scalar context, C<m//g> iterates through the string, returning TRUE
+In scalar context, C<m//g> iterates through the string, returning TRUE
each time it matches, and FALSE when it eventually runs out of matches.
(In other words, it remembers where it left off last time and restarts
the search at that point. You can actually find the current match
@@ -823,17 +899,53 @@ A double-quoted, interpolated string.
=item `STRING`
-A string which is interpolated and then executed as a system command.
-The collected standard output of the command is returned. In scalar
-context, it comes back as a single (potentially multi-line) string.
-In list context, returns a list of lines (however you've defined lines
-with $/ or $INPUT_RECORD_SEPARATOR).
+A string which is (possibly) interpolated and then executed as a system
+command with C</bin/sh> or its equivalent. Shell wildcards, pipes,
+and redirections will be honored. The collected standard output of the
+command is returned; standard error is unaffected. In scalar context,
+it comes back as a single (potentially multi-line) string. In list
+context, returns a list of lines (however you've defined lines with $/
+or $INPUT_RECORD_SEPARATOR).
+
+Because backticks do not affect standard error, use shell file descriptor
+syntax (assuming the shell supports this) if you care to address this.
+To capture a command's STDERR and STDOUT together:
- $today = qx{ date };
+ $output = `cmd 2>&1`;
+
+To capture a command's STDOUT but discard its STDERR:
+
+ $output = `cmd 2>/dev/null`;
+
+To capture a command's STDERR but discard its STDOUT (ordering is
+important here):
+
+ $output = `cmd 2>&1 1>/dev/null`;
+
+To exchange a command's STDOUT and STDERR in order to capture the STDERR
+but leave its STDOUT to come out the old STDERR:
+
+ $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
+
+To read both a command's STDOUT and its STDERR separately, it's easiest
+and safest to redirect them separately to files, and then read from those
+files when the program is done:
+
+ system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");
+
+Using single-quote as a delimiter protects the command from Perl's
+double-quote interpolation, passing it on to the shell instead:
+
+ $perl_info = qx(ps $$); # that's Perl's $$
+ $shell_info = qx'ps $$'; # that's the new shell's $$
+
+Note that how the string gets evaluated is entirely subject to the command
+interpreter on your system. On most platforms, you will have to protect
+shell metacharacters if you want them treated literally. This is in
+practice difficult to do, as it's unclear how to escape which characters.
+See L<perlsec> for a clean and safe example of a manual fork() and exec()
+to emulate backticks safely.
-Note that how the string gets evaluated is entirely subject to the
-command interpreter on your system. On most platforms, you will have
-to protect shell metacharacters if you want them treated literally.
On some platforms (notably DOS-like ones), the shell may not be
capable of dealing with multiline commands, so putting newlines in
the string may not get you what you want. You may be able to evaluate
@@ -846,8 +958,14 @@ of the command line. You must ensure your strings don't exceed this
limit after any necessary interpolations. See the platform-specific
release notes for more details about your particular environment.
-Also realize that using this operator frequently leads to unportable
-programs.
+Using this operator can lead to programs that are difficult to port,
+because the shell commands called vary between systems, and may in
+fact not be present at all. As one example, the C<type> command under
+the POSIX shell is very different from the C<type> command under DOS.
+That doesn't mean you should go out of your way to avoid backticks
+when they're the right way to get something done. Perl was made to be
+a glue language, and one of the things it glues together is commands.
+Just understand what you're getting yourself into.
See L<"I/O Operators"> for more discussion.
@@ -858,13 +976,16 @@ whitespace as the word delimiters. It is exactly equivalent to
split(' ', q/STRING/);
+This equivalency means that if used in scalar context, you'll get split's
+(unfortunate) scalar context behavior, complete with mysterious warnings.
+
Some frequently seen examples:
use POSIX qw( setlocale localeconv )
@EXPORT = qw( foo bar baz );
A common mistake is to try to separate the words with comma or to put
-comments into a multi-line qw-string. For this reason the C<-w>
+comments into a multi-line C<qw>-string. For this reason the C<-w>
switch produce warnings if the STRING contains the "," or the "#"
character.
@@ -876,7 +997,7 @@ made. Otherwise it returns false (specifically, the empty string).
If no string is specified via the C<=~> or C<!~> operator, the C<$_>
variable is searched and modified. (The string specified with C<=~> must
-be a scalar variable, an array element, a hash element, or an assignment
+be scalar variable, an array element, a hash element, or an assignment
to one of those, i.e., an lvalue.)
If the delimiter chosen is single quote, no variable interpolation is
@@ -885,9 +1006,9 @@ PATTERN contains a $ that looks like a variable rather than an
end-of-string test, the variable will be interpolated into the pattern
at run-time. If you want the pattern compiled only once the first time
the variable is interpolated, use the C</o> option. If the pattern
-evaluates to a null string, the last successfully executed regular
+evaluates to the empty string, the last successfully executed regular
expression is used instead. See L<perlre> for further explanation on these.
-See L<perllocale> for discussion of additional considerations which apply
+See L<perllocale> for discussion of additional considerations that apply
when C<use locale> is in effect.
Options are:
@@ -920,9 +1041,9 @@ Examples:
s/Login: $foo/Login: $bar/; # run-time pattern
- ($foo = $bar) =~ s/this/that/;
+ ($foo = $bar) =~ s/this/that/; # copy first, then change
- $count = ($paragraph =~ s/Mister\b/Mr./g);
+ $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
$_ = 'abc123xyz';
s/\d+/$&*2/e; # yields 'abc246xyz'
@@ -933,18 +1054,27 @@ Examples:
s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
s/^=(\w+)/&pod($1)/ge; # use function call
+ # expand variables in $_, but dynamics only, using
+ # symbolic dereferencing
+ s/\$(\w+)/${$1}/g;
+
# /e's can even nest; this will expand
- # simple embedded variables in $_
+ # any embedded scalar variable (including lexicals) in $_
s/(\$\w+)/$1/eeg;
- # Delete C comments.
+ # Delete (most) C comments.
$program =~ s {
/\* # Match the opening delimiter.
.*? # Match a minimal number of characters.
\*/ # Match the closing delimiter.
} []gsx;
- s/^\s*(.*?)\s*$/$1/; # trim white space
+ s/^\s*(.*?)\s*$/$1/; # trim white space in $_, expensively
+
+ for ($variable) { # trim white space in $variable, cheap
+ s/^\s+//;
+ s/\s+$//;
+ }
s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
@@ -998,7 +1128,7 @@ character.
If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
than the SEARCHLIST, the final character is replicated till it is long
-enough. If the REPLACEMENTLIST is null, the SEARCHLIST is replicated.
+enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
This latter is useful for counting characters in a class or for
squashing character sequences in a class.
@@ -1045,8 +1175,8 @@ There are several I/O operators you should know about.
A string enclosed by backticks (grave accents) first undergoes
variable substitution just like a double quoted string. It is then
interpreted as a command, and the output of that command is the value
-of the pseudo-literal, like in a shell. In a scalar context, a single
-string consisting of all the output is returned. In a list context,
+of the pseudo-literal, like in a shell. In scalar context, a single
+string consisting of all the output is returned. In list context,
a list of values is returned, one for each line of output. (You can
set C<$/> to use a different line terminator.) The command is executed
each time the pseudo-literal is evaluated. The status value of the
@@ -1066,7 +1196,7 @@ situation where an automatic assignment happens. I<If and ONLY if> the
input symbol is the only thing inside the conditional of a C<while> or
C<for(;;)> loop, the value is automatically assigned to the variable
C<$_>. In these loop constructs, the assigned value (whether assignment
-is automatic or explcit) is then tested to see if it is defined.
+is automatic or explicit) is then tested to see if it is defined.
The defined test avoids problems where line has a string value
that would be treated as false by perl e.g. "" or "0" with no trailing
newline. (This may seem like an odd thing to you, but you'll use the
@@ -1086,12 +1216,12 @@ and this also behaves similarly, but avoids the use of $_ :
while (my $line = <STDIN>) { print $line }
If you really mean such values to terminate the loop they should be
-tested for explcitly:
+tested for explicitly:
while (($_ = <STDIN>) ne '0') { ... }
while (<STDIN>) { last unless $_; ... }
-In other boolean contexts C<E<lt>I<filehandle>E<gt>> without explcit C<defined>
+In other boolean contexts, C<E<lt>I<filehandle>E<gt>> without explicit C<defined>
test or comparison will solicit a warning if C<-w> is in effect.
The filehandles STDIN, STDOUT, and STDERR are predefined. (The
@@ -1109,7 +1239,7 @@ The null filehandle E<lt>E<gt> is special and can be used to emulate the
behavior of B<sed> and B<awk>. Input from E<lt>E<gt> comes either from
standard input, or from each file listed on the command line. Here's
how it works: the first time E<lt>E<gt> is evaluated, the @ARGV array is
-checked, and if it is null, C<$ARGV[0]> is set to "-", which when opened
+checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened
gives you standard input. The @ARGV array is then processed as a list
of filenames. The loop
@@ -1136,10 +1266,19 @@ doesn't work because it treats E<lt>ARGVE<gt> as non-magical.)
You can modify @ARGV before the first E<lt>E<gt> as long as the array ends up
containing the list of filenames you really want. Line numbers (C<$.>)
continue as if the input were one big happy file. (But see example
-under eof() for how to reset line numbers on each file.)
+under C<eof> for how to reset line numbers on each file.)
+
+If you want to set @ARGV to your own list of files, go right ahead.
+This sets @ARGV to all plain text files if no @ARGV was given:
+
+ @ARGV = grep { -f && -T } glob('*') unless @ARGV;
-If you want to set @ARGV to your own list of files, go right ahead. If
-you want to pass switches into your script, you can use one of the
+You can even set them to pipe commands. For example, this automatically
+filters compressed arguments through B<gzip>:
+
+ @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
+
+If you want to pass switches into your script, you can use one of the
Getopts modules or put a loop on the front like this:
while ($_ = $ARGV[0], /^-/) {
@@ -1147,10 +1286,11 @@ Getopts modules or put a loop on the front like this:
last if /^--$/;
if (/^-D(.*)/) { $debug = $1 }
if (/^-v/) { $verbose++ }
- ... # other switches
+ # ... # other switches
}
+
while (<>) {
- ... # code for each line
+ # ... # code for each line
}
The E<lt>E<gt> symbol will return C<undef> for end-of-file only once.
@@ -1159,22 +1299,28 @@ If you call it again after this it will assume you are processing another
If the string inside the angle brackets is a reference to a scalar
variable (e.g., E<lt>$fooE<gt>), then that variable contains the name of the
-filehandle to input from, or a reference to the same. For example:
+filehandle to input from, or its typeglob, or a reference to the same. For example:
$fh = \*STDIN;
$line = <$fh>;
-If the string inside angle brackets is not a filehandle or a scalar
-variable containing a filehandle name or reference, then it is interpreted
-as a filename pattern to be globbed, and either a list of filenames or the
-next filename in the list is returned, depending on context. One level of
-$ interpretation is done first, but you can't say C<E<lt>$fooE<gt>>
-because that's an indirect filehandle as explained in the previous
-paragraph. (In older versions of Perl, programmers would insert curly
-brackets to force interpretation as a filename glob: C<E<lt>${foo}E<gt>>.
-These days, it's considered cleaner to call the internal function directly
-as C<glob($foo)>, which is probably the right way to have done it in the
-first place.) Example:
+If what's within the angle brackets is neither a filehandle nor a simple
+scalar variable containing a filehandle name, typeglob, or typeglob
+reference, it is interpreted as a filename pattern to be globbed, and
+either a list of filenames or the next filename in the list is returned,
+depending on context. This distinction is determined on syntactic
+grounds alone. That means C<E<lt>$xE<gt>> is always a readline from
+an indirect handle, but C<E<lt>$hash{key}E<gt>> is always a glob.
+That's because $x is a simple scalar variable, but C<$hash{key}> is
+not--it's a hash element.
+
+One level of double-quote interpretation is done first, but you can't
+say C<E<lt>$fooE<gt>> because that's an indirect filehandle as explained
+in the previous paragraph. (In older versions of Perl, programmers
+would insert curly brackets to force interpretation as a filename glob:
+C<E<lt>${foo}E<gt>>. These days, it's considered cleaner to call the
+internal function directly as C<glob($foo)>, which is probably the right
+way to have done it in the first place.) Example:
while (<*.c>) {
chmod 0644, $_;
@@ -1202,7 +1348,7 @@ long" errors (unless you've installed tcsh(1L) as F</bin/csh>).
A glob evaluates its (embedded) argument only when it is starting a new
list. All values must be read before it will start over. In a list
context this isn't important, because you automatically get them all
-anyway. In a scalar context, however, the operator returns the next value
+anyway. In scalar context, however, the operator returns the next value
each time it is called, or a C<undef> value if you've just run out. As
for filehandles an automatic C<defined> is generated when the glob
occurs in the test part of a C<while> or C<for> - because legal glob returns
@@ -1229,7 +1375,7 @@ to become confused with the indirect filehandle notation.
=head2 Constant Folding
Like C, Perl does a certain amount of expression evaluation at
-compile time, whenever it determines that all of the arguments to an
+compile time, whenever it determines that all arguments to an
operator are static and have no side effects. In particular, string
concatenation happens at compile time between literals that don't do
variable substitution. Backslash interpretation also happens at
@@ -1242,7 +1388,7 @@ and this all reduces to one string internally. Likewise, if
you say
foreach $file (@filenames) {
- if (-s $file > 5 + 100 * 2**16) { ... }
+ if (-s $file > 5 + 100 * 2**16) { }
}
the compiler will precompute the number that
@@ -1299,7 +1445,7 @@ However, C<use integer> still has meaning
for them. By default, their results are interpreted as unsigned
integers. However, if C<use integer> is in effect, their results are
interpreted as signed integers. For example, C<~0> usually evaluates
-to a large integral value. However, C<use integer; ~0> is -1.
+to a large integral value. However, C<use integer; ~0> is -1 on twos-complement machines.
=head2 Floating-point Arithmetic
@@ -1308,6 +1454,27 @@ similar ways to provide rounding or truncation at a certain number of
decimal places. For rounding to a certain number of digits, sprintf()
or printf() is usually the easiest route.
+Floating-point numbers are only approximations to what a mathematician
+would call real numbers. There are infinitely more reals than floats,
+so some corners must be cut. For example:
+
+ printf "%.20g\n", 123456789123456789;
+ # produces 123456789123456784
+
+Testing for exact equality of floating-point equality or inequality is
+not a good idea. Here's a (relatively expensive) work-around to compare
+whether two floating-point numbers are equal to a particular number of
+decimal places. See Knuth, volume II, for a more robust treatment of
+this topic.
+
+ sub fp_equal {
+ my ($X, $Y, $POINTS) = @_;
+ my ($tX, $tY);
+ $tX = sprintf("%.${POINTS}g", $X);
+ $tY = sprintf("%.${POINTS}g", $Y);
+ return $tX eq $tY;
+ }
+
The POSIX module (part of the standard perl distribution) implements
ceil(), floor(), and a number of other mathematical and trigonometric
functions. The Math::Complex module (part of the standard perl
@@ -1320,3 +1487,17 @@ the rounding method used should be specified precisely. In these
cases, it probably pays not to trust whichever system rounding is
being used by Perl, but to instead implement the rounding function you
need yourself.
+
+=head2 Bigger Numbers
+
+The standard Math::BigInt and Math::BigFloat modules provide
+variable precision arithmetic and overloaded operators.
+At the cost of some space and considerable speed, they
+avoid the normal pitfalls associated with limited-precision
+representations.
+
+ use Math::BigInt;
+ $x = Math::BigInt->new('123456789123456789');
+ print $x * $x;
+
+ # prints +15241578780673678515622620750190521
diff --git a/pod/perlre.pod b/pod/perlre.pod
index da32f873fc..927d088edb 100644
--- a/pod/perlre.pod
+++ b/pod/perlre.pod
@@ -10,7 +10,7 @@ operations, plus various examples of the same, see C<m//> and C<s///> in
L<perlop>.
The matching operations can have various modifiers. The modifiers
-which relate to the interpretation of the regular expression inside
+that relate to the interpretation of the regular expression inside
are listed below. For the modifiers that alter the behaviour of the
operation, see L<perlop/"m//"> and L<perlop/"s//">.
@@ -34,8 +34,8 @@ line anywhere within the string,
Treat string as single line. That is, change "." to match any character
whatsoever, even a newline, which it normally would not match.
-The /s and /m modifiers both override the C<$*> setting. That is, no matter
-what C<$*> contains, /s (without /m) will force "^" to match only at the
+The C</s> and C</m> modifiers both override the C<$*> setting. That is, no matter
+what C<$*> contains, C</s> without C</m> will force "^" to match only at the
beginning of the string and "$" to match only at the end (or just before a
newline at the end) of the string. Together, as /ms, they let the "." match
any character whatsoever, while yet allowing "^" and "$" to match,
@@ -58,18 +58,19 @@ backslashed nor within a character class. You can use this to break up
your regular expression into (slightly) more readable parts. The C<#>
character is also treated as a metacharacter introducing a comment,
just as in ordinary Perl code. This also means that if you want real
-whitespace or C<#> characters in the pattern that you'll have to either
+whitespace or C<#> characters in the pattern (outside of a character
+class, where they are unaffected by C</x>), that you'll either have to
escape them or encode them using octal or hex escapes. Taken together,
these features go a long way towards making Perl's regular expressions
more readable. Note that you have to be careful not to include the
pattern delimiter in the comment--perl has no way of knowing you did
-not intend to close the pattern early. See the C comment deletion code
+not intend to close the pattern early. See the C-comment deletion code
in L<perlop>.
=head2 Regular Expressions
The patterns used in pattern matching are regular expressions such as
-those supplied in the Version 8 regexp routines. (In fact, the
+those supplied in the Version 8 regex routines. (In fact, the
routines are derived (distantly) from Henry Spencer's freely
redistributable reimplementation of the V8 routines.)
See L<Version 8 Regular Expressions> for details.
@@ -146,7 +147,7 @@ also work:
\L lowercase till \E (think vi)
\U uppercase till \E (think vi)
\E end case modification (think vi)
- \Q quote (disable) regexp metacharacters till \E
+ \Q quote (disable) pattern metacharacters till \E
If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
and C<\U> is taken from the current locale. See L<perllocale>.
@@ -165,7 +166,7 @@ In addition, Perl defines the following:
\d Match a digit character
\D Match a non-digit character
-Note that C<\w> matches a single alphanumeric character, not a whole
+A C<\w> matches a single alphanumeric character, not a whole
word. To match a word you'd need to say C<\w+>. If C<use locale> is in
effect, the list of alphabetic characters generated by C<\w> is taken
from the current locale. See L<perllocale>. You may use C<\w>, C<\W>,
@@ -185,7 +186,7 @@ has a C<\w> on one side of it and a C<\W> on the other side of it (in
either order), counting the imaginary characters off the beginning and
end of the string as matching a C<\W>. (Within character classes C<\b>
represents backspace rather than a word boundary.) The C<\A> and C<\Z> are
-just like "^" and "$" except that they won't match multiple times when the
+just like "^" and "$", except that they won't match multiple times when the
C</m> modifier is used, while "^" and "$" will match at every internal line
boundary. To match the actual end of the string, not ignoring newline,
you can use C<\Z(?!\n)>. The C<\G> assertion can be used to chain global
@@ -193,7 +194,7 @@ matches (using C<m//g>), as described in
L<perlop/"Regexp Quote-Like Operators">.
It is also useful when writing C<lex>-like scanners, when you have several
-regexps which you want to match against consequent substrings of your
+patterns that you want to match against consequent substrings of your
string, see the previous reference.
The actual location where C<\G> will match can also be influenced
by using C<pos()> as an lvalue. See L<perlfunc/pos>.
@@ -233,21 +234,21 @@ Once perl sees that you need one of C<$&>, C<$`> or C<$'> anywhere in
the program, it has to provide them on each and every pattern match.
This can slow your program down. The same mechanism that handles
these provides for the use of $1, $2, etc., so you pay the same price
-for each regexp that contains capturing parentheses. But if you never
-use $&, etc., in your script, then regexps I<without> capturing
+for each pattern that contains capturing parentheses. But if you never
+use $&, etc., in your script, then patterns I<without> capturing
parentheses won't be penalized. So avoid $&, $', and $` if you can,
but if you can't (and some algorithms really appreciate them), once
you've used them once, use them at will, because you've already paid
-the price.
+the price. As of 5.005, $& is not so costly as the other two.
-You will note that all backslashed metacharacters in Perl are
+Backslashed metacharacters in Perl are
alphanumeric, such as C<\b>, C<\w>, C<\n>. Unlike some other regular
expression languages, there are no backslashed symbols that aren't
alphanumeric. So anything that looks like \\, \(, \), \E<lt>, \E<gt>,
\{, or \} is always interpreted as a literal character, not a
metacharacter. This was once used in a common idiom to disable or
quote the special meanings of regular expression metacharacters in a
-string that you want to use for a pattern. Simply quote all the
+string that you want to use for a pattern. Simply quote all
non-alphanumeric characters:
$pattern =~ s/(\W)/\\$1/g;
@@ -271,31 +272,32 @@ function of the extension. Several extensions are already supported:
A comment. The text is ignored. If the C</x> switch is used to enable
whitespace formatting, a simple C<#> will suffice.
-=item C<(?:regexp)>
+=item C<(?:pattern)>
-This groups things like "()" but doesn't make backreferences like "()" does. So
+This is for clustering, not capturing; it groups subexpressions like
+"()", but doesn't make backreferences as "()" does. So
- split(/\b(?:a|b|c)\b/)
+ @fields = split(/\b(?:a|b|c)\b/)
is like
- split(/\b(a|b|c)\b/)
+ @fields = split(/\b(a|b|c)\b/)
but doesn't spit out extra fields.
-=item C<(?=regexp)>
+=item C<(?=pattern)>
A zero-width positive lookahead assertion. For example, C</\w+(?=\t)/>
matches a word followed by a tab, without including the tab in C<$&>.
-=item C<(?!regexp)>
+=item C<(?!pattern)>
A zero-width negative lookahead assertion. For example C</foo(?!bar)/>
matches any occurrence of "foo" that isn't followed by "bar". Note
however that lookahead and lookbehind are NOT the same thing. You cannot
use this for lookbehind.
-If you are looking for a "bar" which isn't preceded by a "foo", C</(?!foo)bar/>
+If you are looking for a "bar" that isn't preceded by a "foo", C</(?!foo)bar/>
will not do what you want. That's because the C<(?!foo)> is just saying that
the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will
match. You would have to do something like C</(?!foo)...bar/> for that. We
@@ -307,13 +309,13 @@ Sometimes it's still easier just to say:
For lookbehind see below.
-=item C<(?<=regexp)>
+=item C<(?E<lt>=pattern)>
-A zero-width positive lookbehind assertion. For example, C</(?=\t)\w+/>
+A zero-width positive lookbehind assertion. For example, C</(?E<lt>=\t)\w+/>
matches a word following a tab, without including the tab in C<$&>.
Works only for fixed-width lookbehind.
-=item C<(?<!regexp)>
+=item C<(?<!pattern)>
A zero-width negative lookbehind assertion. For example C</(?<!bar)foo/>
matches any occurrence of "foo" that isn't following "bar".
@@ -325,70 +327,80 @@ Experimental "evaluate any Perl code" zero-width assertion. Always
succeeds. C<code> is not interpolated. Currently the rules to
determine where the C<code> ends are somewhat convoluted.
-=item C<(?E<gt>regexp)>
+B<WARNING>: This is a grave security risk for arbitrarily interpolated
+patterns. It introduces security holes in previously safe programs.
+A fix to Perl, and to this documentation, will be forthcoming prior
+to the actual 5.005 release.
-An "independend" subexpression. Matches the substring which a
-I<standalone> C<regexp> would match if anchored at the given position,
+=item C<(?E<gt>pattern)>
+
+An "independent" subexpression. Matches the substring that a
+I<standalone> C<pattern> would match if anchored at the given position,
B<and only this substring>.
Say, C<^(?E<gt>a*)ab> will never match, since C<(?E<gt>a*)> (anchored
-at the beginning of string, as above) will match I<all> the characters
+at the beginning of string, as above) will match I<all> characters
C<a> at the beginning of string, leaving no C<a> for C<ab> to match.
In contrast, C<a*ab> will match the same as C<a+b>, since the match of
the subgroup C<a*> is influenced by the following group C<ab> (see
L<"Backtracking">). In particular, C<a*> inside C<a*ab> will match
less characters that a standalone C<a*>, since this makes the tail match.
-Note that a similar effect to C<(?E<gt>regexp)> may be achieved by
+An effect similar to C<(?E<gt>pattern)> may be achieved by
- (?=(regexp))\1
+ (?=(pattern))\1
since the lookahead is in I<"logical"> context, thus matches the same
substring as a standalone C<a+>. The following C<\1> eats the matched
string, thus making a zero-length assertion into an analogue of
-C<(?>...)>. (The difference of these two constructions is that the
-second one uses a catching group, thus shifts ordinals of
+C<(?>...)>. (The difference between these two constructs is that the
+second one uses a catching group, thus shifting ordinals of
backreferences in the rest of a regular expression.)
-This construction is very useful for optimizations of "eternal"
-matches, since it will not backtrack (see L<"Backtracking">). Say,
+This construct is useful for optimizations of "eternal"
+matches, because it will not backtrack (see L<"Backtracking">).
- / \( (
+ m{ \( (
[^()]+
|
\( [^()]* \)
)+
- \) /x
-
-will match a nonempty group with matching two-or-less-level-deep
-parentheses. It is very efficient in finding such groups. However,
-if there is no such group, it is going to take forever (on reasonably
-long string), since there are so many different ways to split a long
-string into several substrings (this is essentially what C<(.+)+> is
-doing, and this is a subpattern of the above pattern). Say, on
-C<((()aaaaaaaaaaaaaaaaaa> the above pattern detects no-match in 5sec
-(on kitchentop'96 processor), and each extra letter doubles this time.
-
-However, a tiny modification of this
-
- / \( (
+ \)
+ }x
+
+That will efficiently match a nonempty group with matching
+two-or-less-level-deep parentheses. However, if there is no such group,
+it will take virtually forever on a long string. That's because there are
+so many different ways to split a long string into several substrings.
+This is essentially what C<(.+)+> is doing, and this is a subpattern
+of the above pattern. Consider that C<((()aaaaaaaaaaaaaaaaaa> on the
+pattern above detects no-match in several seconds, but that each extra
+letter doubles this time. This exponential performance will make it
+appear that your program has hung.
+
+However, a tiny modification of this pattern
+
+ m{ \( (
(?> [^()]+ )
|
\( [^()]* \)
)+
- \) /x
+ \)
+ }x
-which uses (?>...) matches exactly when the above one does (it is a
-good excercise to check this), but finishes in a fourth of the above
-time on a similar string with 1000000 C<a>s.
+which uses C<(?E<gt>...)> matches exactly when the one above does (verifying
+this yourself would be a productive exercise), but finishes in a fourth
+the time when used on a similar string with 1000000 C<a>s. Be aware,
+however, that this pattern currently triggers a warning message under
+B<-w> saying it C<"matches the null string many times">):
-Note that on simple groups like the above C<(?> [^()]+ )> a similar
+On simple groups, such as the pattern C<(?> [^()]+ )>, a comparable
effect may be achieved by negative lookahead, as in C<[^()]+ (?! [^()] )>.
This was only 4 times slower on a string with 1000000 C<a>s.
-=item C<(?(condition)yes-regexp|no-regexp)>
+=item C<(?(condition)yes-pattern|no-pattern)>
-=item C<(?(condition)yes-regexp)>
+=item C<(?(condition)yes-pattern)>
Conditional expression. C<(condition)> should be either an integer in
parentheses (which is valid if the corresponding pair of parentheses
@@ -396,14 +408,15 @@ matched), or lookahead/lookbehind/evaluate zero-width assertion.
Say,
- / ( \( )?
+ m{ ( \( )?
[^()]+
- (?(1) \) )/x
+ (?(1) \) )
+ }x
matches a chunk of non-parentheses, possibly included in parentheses
themselves.
-=item C<(?imstx)>
+=item C<(?imsx)>
One or more embedded pattern-match modifiers. This is particularly
useful for patterns that are specified in a table somewhere, some of
@@ -412,15 +425,14 @@ insensitive ones need to include merely C<(?i)> at the front of the
pattern. For example:
$pattern = "foobar";
- if ( /$pattern/i )
+ if ( /$pattern/i ) { }
# more flexible:
$pattern = "(?i)foobar";
- if ( /$pattern/ )
+ if ( /$pattern/ ) { }
-Note that these modifiers are localized inside an enclosing group (if
-any). Say,
+These modifiers are localized inside an enclosing group (if any). Say,
( (?i) blah ) \s+ \1
@@ -430,15 +442,15 @@ case.
=back
-The specific choice of question mark for this and the new minimal
-matching construct was because 1) question mark is pretty rare in older
-regular expressions, and 2) whenever you see one, you should stop
-and "question" exactly what is going on. That's psychology...
+A question mark was chosen for this and for the new minimal-matching
+construct because 1) question mark is pretty rare in older regular
+expressions, and 2) whenever you see one, you should stop and "question"
+exactly what is going on. That's psychology...
=head2 Backtracking
A fundamental feature of regular expression matching involves the
-notion called I<backtracking>. which is currently used (when needed)
+notion called I<backtracking>, which is currently used (when needed)
by all regular expression quantifiers, namely C<*>, C<*?>, C<+>,
C<+?>, C<{n,m}>, and C<{n,m}?>.
@@ -539,7 +551,8 @@ As you see, this can be a bit tricky. It's important to realize that a
regular expression is merely a set of assertions that gives a definition
of success. There may be 0, 1, or several different ways that the
definition might succeed against a particular string. And if there are
-multiple ways it might succeed, you need to understand backtracking to know which variety of success you will achieve.
+multiple ways it might succeed, you need to understand backtracking to
+know which variety of success you will achieve.
When using lookahead assertions and negations, this can all get even
tricker. Imagine you'd like to find a sequence of non-digits not
@@ -578,14 +591,13 @@ non-digits, you have something that's not 123?" If the pattern matcher had
let C<\D*> expand to "ABC", this would have caused the whole pattern to
fail.
The search engine will initially match C<\D*> with "ABC". Then it will
-try to match C<(?!123> with "123" which, of course, fails. But because
+try to match C<(?!123> with "123", which of course fails. But because
a quantifier (C<\D*>) has been used in the regular expression, the
search engine can backtrack and retry the match differently
in the hope of matching the complete regular expression.
-Well now,
-the pattern really, I<really> wants to succeed, so it uses the
-standard regexp back-off-and-retry and lets C<\D*> expand to just "AB" this
+The pattern really, I<really> wants to succeed, so it uses the
+standard pattern back-off-and-retry and lets C<\D*> expand to just "AB" this
time. Now there's indeed something following "AB" that is not
"123". It's in fact "C123", which suffices.
@@ -601,7 +613,7 @@ you'd expect; that is, case 5 will fail, but case 6 succeeds:
6: got ABC
-In other words, the two zero-width assertions next to each other work like
+In other words, the two zero-width assertions next to each other work as though
they're ANDed together, just as you'd use any builtin assertions: C</^$/>
matches only if you're at the beginning of the line AND the end of the
line simultaneously. The deeper underlying truth is that juxtaposition in
@@ -621,31 +633,31 @@ And if you used C<*>'s instead of limiting it to 0 through 5 matches, then
it would take literally forever--or until you ran out of stack space.
A powerful tool for optimizing such beasts is "independent" groups,
-which do not backtrace (see L<C<(?E<gt>regexp)>>). Note also that
+which do not backtrace (see L<C<(?E<gt>pattern)>>). Note also that
zero-length lookahead/lookbehind assertions will not backtrace to make
the tail match, since they are in "logical" context: only the fact
whether they match or not is considered relevant. For an example
where side-effects of a lookahead I<might> have influenced the
-following match, see L<C<(?E<gt>regexp)>>.
+following match, see L<C<(?E<gt>pattern)>>.
=head2 Version 8 Regular Expressions
-In case you're not familiar with the "regular" Version 8 regexp
+In case you're not familiar with the "regular" Version 8 regex
routines, here are the pattern-matching rules not described above.
Any single character matches itself, unless it is a I<metacharacter>
with a special meaning described here or above. You can cause
-characters which normally function as metacharacters to be interpreted
+characters that normally function as metacharacters to be interpreted
literally by prefixing them with a "\" (e.g., "\." matches a ".", not any
character; "\\" matches a "\"). A series of characters matches that
series of characters in the target string, so the pattern C<blurfl>
would match "blurfl" in the target string.
You can specify a character class, by enclosing a list of characters
-in C<[]>, which will match any one of the characters in the list. If the
+in C<[]>, which will match any one character from the list. If the
first character after the "[" is "^", the class matches any character not
in the list. Within a list, the "-" character is used to specify a
-range, so that C<a-z> represents all the characters between "a" and "z",
+range, so that C<a-z> represents all characters between "a" and "z",
inclusive. If you want "-" itself to be a member of a class, put it
at the start or end of the list, or escape it with a backslash. (The
following all specify the same class of three characters: C<[-az]>,
@@ -663,7 +675,7 @@ character except "\n" (unless you use C</s>).
You can specify a series of alternatives for a pattern using "|" to
separate them, so that C<fee|fie|foe> will match any of "fee", "fie",
-or "foe" in the target string (as would C<f(e|i|o)e>). Note that the
+or "foe" in the target string (as would C<f(e|i|o)e>). The
first alternative includes everything from the last pattern delimiter
("(", "[", or the beginning of the pattern) up to the first "|", and
the last alternative contains everything from the last "|" to the next
@@ -671,7 +683,7 @@ pattern delimiter. For this reason, it's common practice to include
alternatives in parentheses, to minimize confusion about where they
start and end.
-Note that alternatives are tried from left to right, so the first
+Alternatives are tried from left to right, so the first
alternative found for which the entire expression matches, is the one that
is chosen. This means that alternatives are not necessarily greedy. For
example: when mathing C<foo|foot> against "barefoot", only the "foo"
@@ -679,23 +691,23 @@ part will match, as that is the first alternative tried, and it successfully
matches the target string. (This might not seem important, but it is
important when you are capturing matched text using parentheses.)
-Also note that "|" is interpreted as a literal within square brackets,
+Also remember that "|" is interpreted as a literal within square brackets,
so if you write C<[fee|fie|foe]> you're really only matching C<[feio|]>.
Within a pattern, you may designate subpatterns for later reference by
enclosing them in parentheses, and you may refer back to the I<n>th
subpattern later in the pattern using the metacharacter \I<n>.
Subpatterns are numbered based on the left to right order of their
-opening parenthesis. Note that a backreference matches whatever
+opening parenthesis. A backreference matches whatever
actually matched the subpattern in the string being examined, not the
rules for that subpattern. Therefore, C<(0|0x)\d*\s\1\d*> will
-match "0x1234 0x4321",but not "0x1234 01234", because subpattern 1
+match "0x1234 0x4321", but not "0x1234 01234", because subpattern 1
actually matched "0x", even though the rule C<0|0x> could
potentially match the leading 0 in the second number.
=head2 WARNING on \1 vs $1
-Some people get too used to writing things like
+Some people get too used to writing things like:
$pattern =~ s/(\W)/\\\1/g;
@@ -707,7 +719,7 @@ meaning of C<\1> is kludged in for C<s///>. However, if you get into the habit
of doing that, you get yourself into trouble if you then add an C</e>
modifier.
- s/(\d+)/ \1 + 1 /eg;
+ s/(\d+)/ \1 + 1 /eg; # causes warning under -w
Or if you try to do
@@ -726,4 +738,4 @@ L<perlfunc/pos>.
L<perllocale>.
-"Mastering Regular Expressions" (see L<perlbook>) by Jeffrey Friedl.
+I<Mastering Regular Expressions> (see L<perlbook>) by Jeffrey Friedl.
diff --git a/pod/perlref.pod b/pod/perlref.pod
index 34c071fcfe..50400b7807 100644
--- a/pod/perlref.pod
+++ b/pod/perlref.pod
@@ -5,13 +5,13 @@ perlref - Perl references and nested data structures
=head1 DESCRIPTION
Before release 5 of Perl it was difficult to represent complex data
-structures, because all references had to be symbolic, and even that was
-difficult to do when you wanted to refer to a variable rather than a
-symbol table entry. Perl not only makes it easier to use symbolic
-references to variables, but lets you have "hard" references to any piece
-of data. Any scalar may hold a hard reference. Because arrays and hashes
-contain scalars, you can now easily build arrays of arrays, arrays of
-hashes, hashes of arrays, arrays of hashes of functions, and so on.
+structures, because all references had to be symbolic--and even then
+it was difficult to refer to a variable instead of a symbol table entry.
+Perl now not only makes it easier to use symbolic references to variables,
+but also lets you have "hard" references to any piece of data or code.
+Any scalar may hold a hard reference. Because arrays and hashes contain
+scalars, you can now easily build arrays of arrays, arrays of hashes,
+hashes of arrays, arrays of hashes of functions, and so on.
Hard references are smart--they keep track of reference counts for you,
automatically freeing the thing referred to when its reference count goes
@@ -32,7 +32,7 @@ them that; references are confusing enough without useless synonyms.)
In contrast, hard references are more like hard links in a Unix file
system: They are used to access an underlying object without concern for
what its (other) name is. When the word "reference" is used without an
-adjective, like in the following paragraph, it usually is talking about a
+adjective, as in the following paragraph, it is usually talking about a
hard reference.
References are easy to use in Perl. There is just one overriding
@@ -41,7 +41,9 @@ scalar is holding a reference, it always behaves as a simple scalar. It
doesn't magically start being an array or hash or subroutine; you have to
tell it explicitly to do so, by dereferencing it.
-References can be constructed in several ways.
+=head2 Making References
+
+References can be created in several ways.
=over 4
@@ -60,20 +62,20 @@ reference that the backslash returned. Here are some examples:
$coderef = \&handler;
$globref = \*foo;
-It isn't possible to create a true reference to an IO handle (filehandle or
-dirhandle) using the backslash operator. See the explanation of the
-*foo{THING} syntax below. (However, you're apt to find Perl code
-out there using globrefs as though they were IO handles, which is
-grandfathered into continued functioning.)
+It isn't possible to create a true reference to an IO handle (filehandle
+or dirhandle) using the backslash operator. The most you can get is a
+reference to a typeglob, which is actually a complete symbol table entry.
+But see the explanation of the C<*foo{THING}> syntax below. However,
+you can still use type globs and globrefs as though they were IO handles.
=item 2.
-A reference to an anonymous array can be constructed using square
+A reference to an anonymous array can be created using square
brackets:
$arrayref = [1, 2, ['a', 'b', 'c']];
-Here we've constructed a reference to an anonymous array of three elements
+Here we've created a reference to an anonymous array of three elements
whose final element is itself a reference to another anonymous array of three
elements. (The multidimensional syntax described later can be used to
access this. For example, after the above, C<$arrayref-E<gt>[2][1]> would have
@@ -91,7 +93,7 @@ of C<@foo>, not a reference to C<@foo> itself. Likewise for C<%foo>.
=item 3.
-A reference to an anonymous hash can be constructed using curly
+A reference to an anonymous hash can be created using curly
brackets:
$hashref = {
@@ -99,7 +101,7 @@ brackets:
'Clyde' => 'Bonnie',
};
-Anonymous hash and array constructors can be intermixed freely to
+Anonymous hash and array composers like these can be intermixed freely to
produce as complicated a structure as you want. The multidimensional
syntax described below works for these too. The values above are
literals, but variables and expressions would work just as well, because
@@ -131,7 +133,7 @@ the expression to mean either the HASH reference, or the BLOCK.
=item 4.
-A reference to an anonymous subroutine can be constructed by using
+A reference to an anonymous subroutine can be created by using
C<sub> without a subname:
$coderef = sub { print "Boink!\n" };
@@ -139,7 +141,7 @@ C<sub> without a subname:
Note the presence of the semicolon. Except for the fact that the code
inside isn't executed immediately, a C<sub {}> is not so much a
declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no
-matter how many times you execute that line (unless you're in an
+matter how many times you execute that particular line (unless you're in an
C<eval("...")>), C<$coderef> will still have a reference to the I<SAME>
anonymous subroutine.)
@@ -183,7 +185,7 @@ newprint() I<despite> the fact that the "my $x" has seemingly gone out of
scope by the time the anonymous subroutine runs. That's what closure
is all about.
-This applies to only lexical variables, by the way. Dynamic variables
+This applies only to lexical variables, by the way. Dynamic variables
continue to work as they have always worked. Closure is not something
that most Perl programmers need trouble themselves about to begin with.
@@ -194,11 +196,23 @@ Perl objects are just references to a special kind of object that happens to kno
which package it's associated with. Constructors are just special
subroutines that know how to create that association. They do so by
starting with an ordinary reference, and it remains an ordinary reference
-even while it's also being an object. Constructors are customarily
-named new(), but don't have to be:
+even while it's also being an object. Constructors are often
+named new() and called indirectly:
$objref = new Doggie (Tail => 'short', Ears => 'long');
+But don't have to be:
+
+ $objref = Doggie->new(Tail => 'short', Ears => 'long');
+
+ use Term::Cap;
+ $terminal = Term::Cap->Tgetent( { OSPEED => 9600 });
+
+ use Tk;
+ $main = MainWindow->new();
+ $menubar = $main->Frame(-relief => "raised",
+ -borderwidth => 2)
+
=item 6.
References of the appropriate type can spring into existence if you
@@ -230,36 +244,34 @@ except in the case of scalars. *foo{SCALAR} returns a reference to an
anonymous scalar if $foo hasn't been used yet. This might change in a
future release.
-The use of *foo{IO} is the best way to pass bareword filehandles into or
-out of subroutines, or to store them in larger data structures.
+*foo{IO} is an alternative to the \*HANDLE mechanism given in
+L<perldata/"Typeglobs and Filehandles"> for passing filehandles
+into or out of subroutines, or storing into larger data structures.
+Its disadvantage is that it won't create a new filehandle for you.
+Its advantage is that you have no risk of clobbering more than you want
+to with a typeglob assignment, although if you assign to a scalar instead
+of a typeglob, you're ok.
+ splutter(*STDOUT);
splutter(*STDOUT{IO});
+
sub splutter {
my $fh = shift;
print $fh "her um well a hmmm\n";
}
+ $rec = get_rec(*STDIN);
$rec = get_rec(*STDIN{IO});
+
sub get_rec {
my $fh = shift;
return scalar <$fh>;
}
-Beware, though, that you can't do this with a routine which is going to
-open the filehandle for you, because *HANDLE{IO} will be undef if HANDLE
-hasn't been used yet. Use \*HANDLE for that sort of thing instead.
-
-Using \*HANDLE (or *HANDLE) is another way to use and store non-bareword
-filehandles (before perl version 5.002 it was the only way). The two
-methods are largely interchangeable, you can do
-
- splutter(\*STDOUT);
- $rec = get_rec(\*STDIN);
-
-with the above subroutine definitions.
-
=back
+=head2 Using References
+
That's it for creating references. By now you're probably dying to
know how to use references to get back to your long-lost data. There
are several basic methods.
@@ -347,6 +359,7 @@ statement, C<$array[$x]> may have been undefined. If so, it's
automatically defined with a hash reference so that we can look up
C<{"foo"}> in it. Likewise C<$array[$x]-E<gt>{"foo"}> will automatically get
defined with an array reference so that we can look up C<[0]> in it.
+This process is called I<autovivification>.
One more thing here. The arrow is optional I<BETWEEN> brackets
subscripts, so you can shrink the above down to
@@ -376,8 +389,8 @@ civility though.
The ref() operator may be used to determine what type of thing the
reference is pointing to. See L<perlfunc>.
-The bless() operator may be used to associate a reference with a package
-functioning as an object class. See L<perlobj>.
+The bless() operator may be used to associate the object a reference
+points to with a package functioning as an object class. See L<perlobj>.
A typeglob may be dereferenced the same way a reference can, because
the dereference syntax always indicates the kind of reference desired.
@@ -430,11 +443,11 @@ block. An inner block may countermand that with
no strict 'refs';
-Only package variables are visible to symbolic references. Lexical
-variables (declared with my()) aren't in a symbol table, and thus are
-invisible to this mechanism. For example:
+Only package variables (globals, even if localized) are visible to
+symbolic references. Lexical variables (declared with my()) aren't in
+a symbol table, and thus are invisible to this mechanism. For example:
- local($value) = 10;
+ local $value = 10;
$ref = \$value;
{
my $value = 20;
@@ -498,6 +511,79 @@ The B<-w> switch will warn you if it interprets a reserved word as a string.
But it will no longer warn you about using lowercase words, because the
string is effectively quoted.
+=head2 Function Templates
+
+As explained above, a closure is an anonymous function with access to the
+lexical variables visible when that function was compiled. It retains
+access to those variables even though it doesn't get run until later,
+such as in a signal handler or a Tk callback.
+
+Using a closure as a function template allows us to generate many functions
+that act similarly. Suppopose you wanted functions named after the colors
+that generated HTML font changes for the various colors:
+
+ print "Be ", red("careful"), "with that ", green("light");
+
+The red() and green() functions would be very similar. To create these,
+we'll assign a closure to a typeglob of the name of the function we're
+trying to build.
+
+ @colors = qw(red blue green yellow orange purple violet);
+ for my $name (@colors) {
+ no strict 'refs'; # allow symbol table manipulation
+ *$name = *{uc $name} = sub { "<FONT COLOR='$name'>@_</FONT>" };
+ }
+
+Now all those different functions appear to exist independently. You can
+call red(), RED(), blue(), BLUE(), green(), etc. This technique saves on
+both compile time and memory use, and is less error-prone as well, since
+syntax checks happen at compile time. It's critical that any variables in
+the anonymous subroutine be lexicals in order to create a proper closure.
+That's the reasons for the C<my> on the loop iteration variable.
+
+This is one of the only places where giving a prototype to a closure makes
+much sense. If you wanted to impose scalar context on the arguments of
+these functions (probably not a wise idea for this particular example),
+you could have written it this way instead:
+
+ *$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" };
+
+However, since prototype checking happens at compile time, the assignment
+above happens too late to be of much use. You could address this by
+putting the whole loop of assignments within a BEGIN block, forcing it
+to occur during compilation.
+
+Access to lexicals that change over type--like those in the C<for> loop
+above--only works with closures, not general subroutines. In the general
+case, then, named subroutines do not nest properly, although anonymous
+ones do. If you are accustomed to using nested subroutines in other
+programming languages with their own private variables, you'll have to
+work at it a bit in Perl. The intuitive coding of this kind of thing
+incurs mysterious warnings about ``will not stay shared''. For example,
+this won't work:
+
+ sub outer {
+ my $x = $_[0] + 35;
+ sub inner { return $x * 19 } # WRONG
+ return $x + inner();
+ }
+
+A work-around is the following:
+
+ sub outer {
+ my $x = $_[0] + 35;
+ local *inner = sub { return $x * 19 };
+ return $x + inner();
+ }
+
+Now inner() can only be called from within outer(), because of the
+temporary assignments of the closure (anonymous subroutine). But when
+it does, it has normal access to the lexical variable $x from the scope
+of outer().
+
+This has the interesting effect of creating a function local to another
+function, something not normally supported in Perl.
+
=head1 WARNING
You may not (usefully) use a reference as the key to a hash. It will be
@@ -515,6 +601,8 @@ more like
And then at least you can use the values(), which will be
real refs, instead of the keys(), which won't.
+The standard Tie::RefHash module provides a convenient workaround to this.
+
=head1 SEE ALSO
Besides the obvious documents, source code can be instructive.
@@ -522,5 +610,5 @@ Some rather pathological examples of the use of references can be found
in the F<t/op/ref.t> regression test in the Perl source directory.
See also L<perldsc> and L<perllol> for how to use references to create
-complex data structures, and L<perlobj> for how to use them to create
-objects.
+complex data structures, and L<perltoot>, L<perlobj>, and L<perlbot>
+for how to use them to create objects.
diff --git a/pod/perlrun.pod b/pod/perlrun.pod
index 84ce270e3e..72c772e4bb 100644
--- a/pod/perlrun.pod
+++ b/pod/perlrun.pod
@@ -512,14 +512,15 @@ support that).
=item B<-u>
causes Perl to dump core after compiling your script. You can then
-take this core dump and turn it into an executable file by using the
+in theory take this core dump and turn it into an executable file by using the
B<undump> program (not supplied). This speeds startup at the expense of
some disk space (which you can minimize by stripping the executable).
(Still, a "hello world" executable comes out to about 200K on my
machine.) If you want to execute a portion of your script before dumping,
use the dump() operator instead. Note: availability of B<undump> is
platform specific and may not be available for a specific port of
-Perl.
+Perl. It has been superseded by the new perl-to-C compiler, which is more
+portable, even though it's still only considered beta.
=item B<-U>
diff --git a/pod/perlsec.pod b/pod/perlsec.pod
index 3fd903412d..4a743c7430 100644
--- a/pod/perlsec.pod
+++ b/pod/perlsec.pod
@@ -225,15 +225,14 @@ never call the shell at all.
} else {
my @temp = ($EUID, $EGID);
$EUID = $UID;
- $EGID = $GID; # XXX: initgroups() not called
+ $EGID = $GID; # initgroups() also called!
# Make sure privs are really gone
($EUID, $EGID) = @temp;
- die "Can't drop privileges" unless
- $UID == $EUID and
- $GID eq $EGID; # String test
+ die "Can't drop privileges"
+ unless $UID == $EUID && $GID eq $EGID;
$ENV{PATH} = "/bin:/usr/bin";
- exec 'myprog', 'arg1', 'arg2' or
- die "can't exec myprog: $!";
+ exec 'myprog', 'arg1', 'arg2'
+ or die "can't exec myprog: $!";
}
A similar strategy would work for wildcard expansion via C<glob>, although
@@ -320,9 +319,10 @@ First of all, however, you I<can't> take away read permission, because
the source code has to be readable in order to be compiled and
interpreted. (That doesn't mean that a CGI script's source is
readable by people on the web, though.) So you have to leave the
-permissions at the socially friendly 0755 level.
+permissions at the socially friendly 0755 level. This lets
+people on your local system only see your source.
-Some people regard this as a security problem. If your program does
+Some people mistakenly regard this as a security problem. If your program does
insecure things, and relies on people not knowing how to exploit those
insecurities, it is not secure. It is often possible for someone to
determine the insecure things and exploit them without viewing the
@@ -345,3 +345,7 @@ statements like "This is unpublished proprietary software of XYZ Corp.
Your access to it does not give you permission to use it blah blah
blah." You should see a lawyer to be sure your licence's wording will
stand up in court.
+
+=head1 SEE ALSO
+
+L<perlrun> for its description of cleaning up environment variables.
diff --git a/pod/perlsub.pod b/pod/perlsub.pod
index 1d7660c20e..5baff89473 100644
--- a/pod/perlsub.pod
+++ b/pod/perlsub.pod
@@ -14,7 +14,8 @@ To declare subroutines:
To define an anonymous subroutine at runtime:
- $subref = sub BLOCK;
+ $subref = sub BLOCK; # no proto
+ $subref = sub (PROTO) BLOCK; # with proto
To import subroutines:
@@ -24,7 +25,7 @@ To call subroutines:
NAME(LIST); # & is optional with parentheses.
NAME LIST; # Parentheses optional if predeclared/imported.
- &NAME; # Passes current @_ to subroutine.
+ &NAME; # Makes current @_ visible to called subroutine.
=head1 DESCRIPTION
@@ -33,7 +34,7 @@ may be located anywhere in the main program, loaded in from other files
via the C<do>, C<require>, or C<use> keywords, or even generated on the
fly using C<eval> or anonymous subroutines (closures). You can even call
a function indirectly using a variable containing its name or a CODE reference
-to it, as in C<$var = \&function>.
+to it.
The Perl model for function call and return values is simple: all
functions are passed as parameters one single flat list of scalars, and
@@ -190,6 +191,14 @@ disables any prototype checking on the arguments you do provide. This
is partly for historical reasons, and partly for having a convenient way
to cheat if you know what you're doing. See the section on Prototypes below.
+Function whose names are in all upper case are reserved to the Perl core,
+just as are modules whose names are in all lower case. A function in
+all capitals is a loosely-held convention meaning it will be called
+indirectly by the run-time system itself. Functions that do special,
+pre-defined things BEGIN, END, AUTOLOAD, and DESTROY--plus all the
+functions mentioned in L<perltie>. The 5.005 release adds INIT
+to this list.
+
=head2 Private Variables via my()
Synopsis:
@@ -207,11 +216,20 @@ must be placed in parentheses. All listed elements must be legal lvalues.
Only alphanumeric identifiers may be lexically scoped--magical
builtins like $/ must currently be localized with "local" instead.
-Unlike dynamic variables created by the "local" statement, lexical
+Unlike dynamic variables created by the "local" operator, lexical
variables declared with "my" are totally hidden from the outside world,
including any called subroutines (even if it's the same subroutine called
from itself or elsewhere--every call gets its own copy).
+This doesn't mean that a my() variable declared in a statically
+I<enclosing> lexical scope would be invisible. Only the dynamic scopes
+are cut off. For example, the bumpx() function below has access to the
+lexical $x variable because both the my and the sub occurred at the same
+scope, presumably the file scope.
+
+ my $x = 10;
+ sub bumpx { $x++ }
+
(An eval(), however, can see the lexical variables of the scope it is
being evaluated in so long as the names aren't hidden by declarations within
the eval() itself. See L<perlref>.)
@@ -236,7 +254,7 @@ The "my" is simply a modifier on something you might assign to. So when
you do assign to the variables in its argument list, the "my" doesn't
change whether those variables is viewed as a scalar or an array. So
- my ($foo) = <STDIN>;
+ my ($foo) = <STDIN>; # WRONG?
my @FOO = <STDIN>;
both supply a list context to the right-hand side, while
@@ -245,7 +263,7 @@ both supply a list context to the right-hand side, while
supplies a scalar context. But the following declares only one variable:
- my $foo, $bar = 1;
+ my $foo, $bar = 1; # WRONG
That has the same effect as
@@ -342,13 +360,13 @@ lexical of the same name is also visible:
That will print out 20 and 10.
-You may declare "my" variables at the outermost scope of a file to
-hide any such identifiers totally from the outside world. This is similar
+You may declare "my" variables at the outermost scope of a file to hide
+any such identifiers totally from the outside world. This is similar
to C's static variables at the file level. To do this with a subroutine
-requires the use of a closure (anonymous function). If a block (such as
-an eval(), function, or C<package>) wants to create a private subroutine
-that cannot be called from outside that block, it can declare a lexical
-variable containing an anonymous sub reference:
+requires the use of a closure (anonymous function with lexical access).
+If a block (such as an eval(), function, or C<package>) wants to create
+a private subroutine that cannot be called from outside that block,
+it can declare a lexical variable containing an anonymous sub reference:
my $secret_version = '1.001-beta';
my $secret_sub = sub { print $secret_version };
@@ -363,13 +381,28 @@ unqualified and unqualifiable.
This does not work with object methods, however; all object methods have
to be in the symbol table of some package to be found.
-Just because the lexical variable is lexically (also called statically)
-scoped doesn't mean that within a function it works like a C static. It
-normally works more like a C auto. But here's a mechanism for giving a
-function private variables with both lexical scoping and a static
-lifetime. If you do want to create something like C's static variables,
-just enclose the whole function in an extra block, and put the
-static variable outside the function but in the block.
+=head2 Peristent Private Variables
+
+Just because a lexical variable is lexically (also called statically)
+scoped to its enclosing block, eval, or do FILE, this doesn't mean that
+within a function it works like a C static. It normally works more
+like a C auto, but with implicit garbage collection.
+
+Unlike local variables in C or C++, Perl's lexical variables don't
+necessarily get recycled just because their scope has exited.
+If something more permanent is still aware of the lexical, it will
+stick around. So long as something else references a lexical, that
+lexical won't be freed--which is as it should be. You wouldn't want
+memory being free until you were done using it, or kept around once you
+were done. Automatic garbage collection takes care of this for you.
+
+This means that you can pass back or save away references to lexical
+variables, whereas to return a pointer to a C auto is a grave error.
+It also gives us a way to simulate C's function statics. Here's a
+mechanism for giving a function private variables with both lexical
+scoping and a static lifetime. If you do want to create something like
+C's static variables, just enclose the whole function in an extra block,
+and put the static variable outside the function but in the block.
{
my $secret_val = 0;
@@ -397,6 +430,12 @@ starts to run:
See L<perlrun> about the BEGIN function.
+If declared at the outermost scope, the file scope, then lexicals work
+someone like C's file statics. They are available to all functions in
+that same file declared below them, but are inaccessible from outside of
+the file. This is sometimes used in modules to create private variables
+for the whole module.
+
=head2 Temporary Values via local()
B<NOTE>: In general, you should be using "my" instead of "local", because
@@ -419,11 +458,12 @@ Synopsis:
local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal
local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc
-A local() modifies its listed variables to be local to the enclosing
-block, (or subroutine, C<eval{}>, or C<do>) and I<any called from
-within that block>. A local() just gives temporary values to global
-(meaning package) variables. This is known as dynamic scoping. Lexical
-scoping is done with "my", which works more like C's auto declarations.
+A local() modifies its listed variables to be "local" to the enclosing
+block, eval, or C<do FILE>--and to I<any called from within that block>.
+A local() just gives temporary values to global (meaning package)
+variables. It does not create a local variable. This is known as
+dynamic scoping. Lexical scoping is done with "my", which works more
+like C's auto declarations.
If more than one variable is given to local(), they must be placed in
parentheses. All listed elements must be legal lvalues. This operator works
@@ -541,9 +581,6 @@ Perl will print
This is a test only a test.
The array has 6 elements: 0, 1, 2, undef, undef, 5
-In short, be careful when manipulating the containers for composite types
-whose elements have been localized.
-
=head2 Passing Symbol Table Entries (typeglobs)
[Note: The mechanism described in this section was originally the only
@@ -586,6 +623,77 @@ mechanism will merge all the array values so that you can't extract out
the individual arrays. For more on typeglobs, see
L<perldata/"Typeglobs and Filehandles">.
+=head2 When to Still Use local()
+
+Despite the existence of my(), there are still three places where the
+local() operator still shines. In fact, in these three places, you
+I<must> use C<local> instead of C<my>.
+
+=over
+
+=item 1. You need to give a global variable a temporary value, especially $_.
+
+The global variables, like @ARGV or the punctuation variables, must be
+localized with local(). This block reads in I</etc/motd>, and splits
+it up into chunks separated by lines of equal signs, which are placed
+in @Fields.
+
+ {
+ local @ARGV = ("/etc/motd");
+ local $/ = undef;
+ local $_ = <>;
+ @Fields = split /^\s*=+\s*$/;
+ }
+
+It particular, its important to localize $_ in any routine that assigns
+to it. Look out for implicit assignments in C<while> conditionals.
+
+=item 2. You need to create a local file or directory handle or a local function.
+
+A function that needs a filehandle of its own must use local() uses
+local() on complete typeglob. This can be used to create new symbol
+table entries:
+
+ sub ioqueue {
+ local (*READER, *WRITER); # not my!
+ pipe (READER, WRITER); or die "pipe: $!";
+ return (*READER, *WRITER);
+ }
+ ($head, $tail) = ioqueue();
+
+See the Symbol module for a way to create anonymous symbol table
+entries.
+
+Because assignment of a reference to a typeglob creates an alias, this
+can be used to create what is effectively a local function, or at least,
+a local alias.
+
+ {
+ local *grow = \&shrink; # only until this block exists
+ grow(); # really calls shrink()
+ move(); # if move() grow()s, it shrink()s too
+ }
+ grow(); # get the real grow() again
+
+See L<perlref/"Function Templates"> for more about manipulating
+functions by name in this way.
+
+=item 3. You want to temporarily change just one element of an array or hash.
+
+You can localize just one element of an aggregate. Usually this
+is done on dynamics:
+
+ {
+ local $SIG{INT} = 'IGNORE';
+ funct(); # uninterruptible
+ }
+ # interruptibility automatically restored here
+
+But it also works on lexically declared aggregates. Prior to 5.005,
+this operation could on occasion misbehave.
+
+=back
+
=head2 Pass by Reference
If you want to pass more than one array or hash into a function--or
@@ -852,7 +960,7 @@ without C<&> or C<do>. Calls made using C<&> or C<do> are never
inlined. (See constant.pm for an easy way to declare most
constants.)
-All of the following functions would be inlined.
+The following functions would all be inlined:
sub pi () { 3.14159 } # Not exact, but close.
sub PI () { 4 * atan2 1, 1 } # As good as it gets,
@@ -881,7 +989,7 @@ All of the following functions would be inlined.
sub N_FACTORIAL () { $prod }
}
-If you redefine a subroutine which was eligible for inlining you'll get
+If you redefine a subroutine that was eligible for inlining, you'll get
a mandatory warning. (You can use this warning to tell whether or not a
particular subroutine is considered constant.) The warning is
considered severe enough not to be optional because previously compiled
@@ -991,7 +1099,7 @@ library.
If you call a subroutine that is undefined, you would ordinarily get an
immediate fatal error complaining that the subroutine doesn't exist.
(Likewise for subroutines being used as methods, when the method
-doesn't exist in any of the base classes of the class package.) If,
+doesn't exist in any base class of the class package.) If,
however, there is an C<AUTOLOAD> subroutine defined in the package or
packages that were searched for the original subroutine, then that
C<AUTOLOAD> subroutine is called with the arguments that would have been
@@ -1036,7 +1144,6 @@ functions to perl code in L<perlxs>.
=head1 SEE ALSO
-See L<perlref> for more on references. See L<perlxs> if you'd
-like to learn about calling C subroutines from perl. See
-L<perlmod> to learn about bundling up your functions in
-separate files.
+See L<perlref> for more about references and closures. See L<perlxs> if
+you'd like to learn about calling C subroutines from perl. See L<perlmod>
+to learn about bundling up your functions in separate files.
diff --git a/pod/perlsyn.pod b/pod/perlsyn.pod
index 1d0f5d694f..7c932578bb 100644
--- a/pod/perlsyn.pod
+++ b/pod/perlsyn.pod
@@ -95,10 +95,25 @@ conditional is evaluated. This is so that you can write loops like:
...
} until $line eq ".\n";
-See L<perlfunc/do>. Note also that the loop control
-statements described later will I<NOT> work in this construct, because
-modifiers don't take loop labels. Sorry. You can always wrap
-another block around it to do that sort of thing.
+See L<perlfunc/do>. Note also that the loop control statements described
+later will I<NOT> work in this construct, because modifiers don't take
+loop labels. Sorry. You can always put another block inside of it
+(for C<next>) or around it (for C<last>) to do that sort of thing.
+For next, just double the braces:
+
+ do {{
+ next if $x == $y;
+ # do something here
+ }} until $x++ > $z;
+
+For last, you have to be more elaborate:
+
+ LOOP: {
+ do {
+ last if $x = $y**2;
+ # do something here
+ } while $x++ <= $z;
+ }
=head2 Compound statements
@@ -201,31 +216,34 @@ which is Perl short-hand for the more explicitly written version:
# now process $line
}
-Or here's a simpleminded Pascal comment stripper (warning: assumes no
-{ or } in strings).
+Note that if there were a C<continue> block on the above code, it would get
+executed even on discarded lines. This is often used to reset line counters
+or C<?pat?> one-time matches.
- LINE: while (<STDIN>) {
- while (s|({.*}.*){.*}|$1 |) {}
- s|{.*}| |;
- if (s|{.*| |) {
- $front = $_;
- while (<STDIN>) {
- if (/}/) { # end of comment?
- s|^|$front{|;
- redo LINE;
- }
- }
- }
- print;
+ # inspired by :1,$g/fred/s//WILMA/
+ while (<>) {
+ ?(fred)? && s//WILMA $1 WILMA/;
+ ?(barney)? && s//BETTY $1 BETTY/;
+ ?(homer)? && s//MARGE $1 MARGE/;
+ } continue {
+ print "$ARGV $.: $_";
+ close ARGV if eof(); # reset $.
+ reset if eof(); # reset ?pat?
}
-Note that if there were a C<continue> block on the above code, it would get
-executed even on discarded lines.
-
If the word C<while> is replaced by the word C<until>, the sense of the
test is reversed, but the conditional is still tested before the first
iteration.
+The loop control statements don't work in an C<if> or C<unless>, since
+they aren't loops. You can double the braces to make them such, though.
+
+ if (/pattern/) {{
+ next if /fred/;
+ next if /barney/;
+ # so something here
+ }}
+
The form C<while/if BLOCK BLOCK>, available in Perl 4, is no longer
available. Replace any occurrence of C<if BLOCK> by C<if (do BLOCK)>.
@@ -276,11 +294,12 @@ if you have subroutine or format declarations within the loop which
refer to it.)
The C<foreach> keyword is actually a synonym for the C<for> keyword, so
-you can use C<foreach> for readability or C<for> for brevity. If VAR is
-omitted, $_ is set to each value. If any element of LIST is an lvalue,
-you can modify it by modifying VAR inside the loop. That's because
-the C<foreach> loop index variable is an implicit alias for each item
-in the list that you're looping over.
+you can use C<foreach> for readability or C<for> for brevity. (Or because
+the Bourne shell is more familiar to you than I<csh>, so writing C<for>
+comes more naturally.) If VAR is omitted, $_ is set to each value.
+If any element of LIST is an lvalue, you can modify it by modifying VAR
+inside the loop. That's because the C<foreach> loop index variable is
+an implicit alias for each item in the list that you're looping over.
If any part of LIST is an array, C<foreach> will get very confused if
you add or remove elements within the loop body, for example with
@@ -409,18 +428,6 @@ or
$nothing = 1;
}
-or, using experimental C<EVAL blocks> of regular expressions
-(see L<perlre/"(?{ code })">),
-
- / ^abc (?{ $abc = 1 })
- |
- ^def (?{ $def = 1 })
- |
- ^xyz (?{ $xyz = 1 })
- |
- (?{ $nothing = 1 })
- /x;
-
or even, horrors,
if (/^abc/)
@@ -432,7 +439,6 @@ or even, horrors,
else
{ $nothing = 1 }
-
A common idiom for a switch statement is to use C<foreach>'s aliasing to make
a temporary assignment to $_ for convenient matching:
@@ -447,7 +453,7 @@ Another interesting approach to a switch statement is arrange
for a C<do> block to return the proper value:
$amode = do {
- if ($flag & O_RDONLY) { "r" }
+ if ($flag & O_RDONLY) { "r" } # XXX: isn't this 0?
elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" }
elsif ($flag & O_RDWR) {
if ($flag & O_CREAT) { "w+" }
@@ -455,6 +461,38 @@ for a C<do> block to return the proper value:
}
};
+Or
+
+ print do {
+ ($flags & O_WRONLY) ? "write-only" :
+ ($flags & O_RDWR) ? "read-write" :
+ "read-only";
+ };
+
+Or if you are certainly that all the C<&&> clauses are true, you can use
+something like this, which "switches" on the value of the
+HTTP_USER_AGENT envariable.
+
+ #!/usr/bin/perl
+ # pick out jargon file page based on browser
+ $dir = 'http://www.wins.uva.nl/~mes/jargon';
+ for ($ENV{HTTP_USER_AGENT}) {
+ $page = /Mac/ && 'm/Macintrash.html'
+ || /Win(dows )?NT/ && 'e/evilandrude.html'
+ || /Win|MSIE|WebTV/ && 'm/MicroslothWindows.html'
+ || /Linux/ && 'l/Linux.html'
+ || /HP-UX/ && 'h/HP-SUX.html'
+ || /SunOS/ && 's/ScumOS.html'
+ || 'a/AppendixB.html';
+ }
+ print "Location: $dir/$page\015\012\015\012";
+
+That kind of switch statement only works when you know the C<&&> clauses
+will be true. If you don't, the previous C<?:> example should be used.
+
+You might also consider writing a hash instead of synthesizing a switch
+statement.
+
=head2 Goto
Although not for the faint of heart, Perl does support a C<goto> statement.
@@ -539,8 +577,8 @@ of code.
=head2 Plain Old Comments (Not!)
-Much like the C preprocessor, perl can process line directives. Using
-this, one can control perl's idea of filenames and line numbers in
+Much like the C preprocessor, Perl can process line directives. Using
+this, one can control Perl's idea of filenames and line numbers in
error or warning messages (especially for strings that are processed
with eval()). The syntax for this mechanism is the same as for most
C preprocessors: it matches the regular expression
diff --git a/pod/perltie.pod b/pod/perltie.pod
index da4fbe99cf..cae0a15a54 100644
--- a/pod/perltie.pod
+++ b/pod/perltie.pod
@@ -23,7 +23,7 @@ Now you can.
The tie() function binds a variable to a class (package) that will provide
the implementation for access methods for that variable. Once this magic
has been performed, accessing a tied variable automatically triggers
-method calls in the proper class. All of the complexity of the class is
+method calls in the proper class. The complexity of the class is
hidden behind magic methods calls. The method names are in ALL CAPS,
which is a convention that Perl uses to indicate that they're called
implicitly rather than explicitly--just like the BEGIN() and END()
diff --git a/pod/perltoot.pod b/pod/perltoot.pod
index 90ef81ae26..c77a971b57 100644
--- a/pod/perltoot.pod
+++ b/pod/perltoot.pod
@@ -1753,27 +1753,25 @@ L<perltie>,
and
L<overload>.
-=head1 COPYRIGHT
+=head1 AUTHOR AND COPYRIGHT
+
+Copyright (c) 1997, 1998 Tom Christiansen
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
-I I<really> hate to have to say this, but recent unpleasant
-experiences have mandated its inclusion:
-
- Copyright 1996 Tom Christiansen. All Rights Reserved.
-
-This work derives in part from the second edition of I<Programming Perl>.
-Although destined for release as a manpage with the standard Perl
-distribution, it is not public domain (nor is any of Perl and its docset:
-publishers beware). It's expected to someday make its way into a revision
-of the Camel Book. While it is copyright by me with all rights reserved,
-permission is granted to freely distribute verbatim copies of this
-document provided that no modifications outside of formatting be made,
-and that this notice remain intact. You are permitted and encouraged to
-use its code and derivatives thereof in your own source code for fun or
-for profit as you see fit. But so help me, if in six months I find some
-book out there with a hacked-up version of this material in it claiming to
-be written by someone else, I'll tell all the world that you're a jerk.
-Furthermore, your lawyer will meet my lawyer (or O'Reilly's) over lunch
-to arrange for you to receive your just deserts. Count on it.
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
+
+=head1 COPYRIGHT
=head2 Acknowledgments
diff --git a/pod/perlvar.pod b/pod/perlvar.pod
index d9edffa58a..1a120118d0 100644
--- a/pod/perlvar.pod
+++ b/pod/perlvar.pod
@@ -6,15 +6,15 @@ perlvar - Perl predefined variables
=head2 Predefined Names
-The following names have special meaning to Perl. Most of the
+The following names have special meaning to Perl. Most
punctuation names have reasonable mnemonics, or analogues in one of
-the shells. Nevertheless, if you wish to use the long variable names,
+the shells. Nevertheless, if you wish to use long variable names,
you just need to say
use English;
at the top of your program. This will alias all the short names to the
-long names in the current package. Some of them even have medium names,
+long names in the current package. Some even have medium names,
generally borrowed from B<awk>.
To go a step further, those variables that depend on the currently
@@ -28,7 +28,7 @@ after which you may use either
method HANDLE EXPR
-or
+or more safely,
HANDLE->method(EXPR)
@@ -112,11 +112,11 @@ test. Note that outside of a C<while> test, this will not happen.
=over 8
-=item $E<lt>I<digit>E<gt>
+=item $E<lt>I<digits>E<gt>
Contains the subpattern from the corresponding set of parentheses in
the last pattern matched, not counting patterns matched in nested
-blocks that have been exited already. (Mnemonic: like \digit.)
+blocks that have been exited already. (Mnemonic: like \digits.)
These variables are all read-only.
=item $MATCH
@@ -176,7 +176,8 @@ is 0. (Mnemonic: * matches multiple things.) Note that this variable
influences the interpretation of only "C<^>" and "C<$>". A literal newline can
be searched for even when C<$* == 0>.
-Use of "C<$*>" is deprecated in modern perls.
+Use of "C<$*>" is deprecated in modern Perls, supplanted by
+the C</s> and C</m> modifiers on pattern matching.
=item input_line_number HANDLE EXPR
@@ -427,12 +428,11 @@ L<perlfunc/formline()>.
=item $?
The status returned by the last pipe close, backtick (C<``>) command,
-or system() operator. Note that this is the status word returned by
-the wait() system call (or else is made up to look like it). Thus,
-the exit value of the subprocess is actually (C<$? E<gt>E<gt> 8>), and
-C<$? & 255> gives which signal, if any, the process died from, and
-whether there was a core dump. (Mnemonic: similar to B<sh> and
-B<ksh>.)
+or system() operator. Note that this is the status word returned by the
+wait() system call (or else is made up to look like it). Thus, the exit
+value of the subprocess is actually (C<$? E<gt>E<gt> 8>), and C<$? & 127>
+gives which signal, if any, the process died from, and C<$? & 128> reports
+whether there was a core dump. (Mnemonic: similar to B<sh> and B<ksh>.)
Additionally, if the C<h_errno> variable is supported in C, its value
is returned via $? if any of the C<gethost*()> functions fail.