summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRafael Garcia-Suarez <rgarciasuarez@gmail.com>2003-10-15 19:26:41 +0000
committerRafael Garcia-Suarez <rgarciasuarez@gmail.com>2003-10-15 19:26:41 +0000
commit0ab0a39e5d9094e86ba7729ee50d9d47b5ea2fc3 (patch)
tree0add3484515734bba146762c6d4afedcbe12d830
parent39ad2eb0d7630d34fa4a6642281edbae6544f79d (diff)
downloadperl-0ab0a39e5d9094e86ba7729ee50d9d47b5ea2fc3.tar.gz
Grab the newest perlfaq.
It's assumed that the perlfaqs must not be version-dependent. p4raw-id: //depot/maint-5.6/perl-5.6.2@21457
-rw-r--r--pod/perlfaq.pod304
-rw-r--r--pod/perlfaq1.pod99
-rw-r--r--pod/perlfaq2.pod269
-rw-r--r--pod/perlfaq3.pod509
-rw-r--r--pod/perlfaq4.pod952
-rw-r--r--pod/perlfaq5.pod629
-rw-r--r--pod/perlfaq6.pod364
-rw-r--r--pod/perlfaq7.pod297
-rw-r--r--pod/perlfaq8.pod265
-rw-r--r--pod/perlfaq9.pod229
10 files changed, 2186 insertions, 1731 deletions
diff --git a/pod/perlfaq.pod b/pod/perlfaq.pod
index bc29c694f2..7acdf613d0 100644
--- a/pod/perlfaq.pod
+++ b/pod/perlfaq.pod
@@ -1,19 +1,106 @@
=head1 NAME
-perlfaq - frequently asked questions about Perl ($Date: 1999/05/23 20:38:02 $)
+perlfaq - frequently asked questions about Perl ($Date: 2003/01/31 17:37:17 $)
=head1 DESCRIPTION
-The perlfaq is structured into the following documents:
+The perlfaq is divided into several documents based on topics. A table
+of contents is at the end of this document.
+=head2 Where to get the perlfaq
+
+Extracts of the perlfaq are posted regularly to
+comp.lang.perl.misc. It is available on many web sites:
+http://www.perldoc.com/ and http://faq.perl.org/
+
+=head2 How to contribute to the perlfaq
+
+You may mail corrections, additions, and suggestions to
+perlfaq-workers@perl.org . This alias should not be used to
+I<ask> FAQs. It's for fixing the current FAQ. Send
+questions to the comp.lang.perl.misc newsgroup. You can
+view the source tree at http://cvs.perl.org/cvsweb/perlfaq/
+(which is outside of the main Perl source tree). The CVS
+repository notes all changes to the FAQ.
+
+=head2 What will happen if you mail your Perl programming problems to the authors
+
+Your questions will probably go unread, unless they're
+suggestions of new questions to add to the FAQ, in which
+case they should have gone to the perlfaq-workers@perl.org
+instead.
-=head2 perlfaq: Structural overview of the FAQ.
+You should have read section 2 of this faq. There you would
+have learned that comp.lang.perl.misc is the appropriate
+place to go for free advice. If your question is really
+important and you require a prompt and correct answer, you
+should hire a consultant.
-This document.
+=head1 Credits
+
+The original perlfaq was written by Tom Christiansen, then expanded
+by collaboration between Tom and Nathan Torkington. The current
+document is maintained by the perlfaq-workers (perlfaq-workers@perl.org).
+Several people have contributed answers, corrections, and comments.
+
+=head1 Author and Copyright Information
+
+Copyright (c) 1997-2003 Tom Christiansen, Nathan Torkington, and
+other contributors noted in the answers.
+
+All rights reserved.
+
+=head2 Bundled Distributions
+
+This documentation is free; you can redistribute it and/or modify it
+under the same terms as Perl itself.
+
+Irrespective of its distribution, all code examples in these files
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.
+
+=head2 Disclaimer
+
+This information is offered in good faith and in the hope that it may
+be of use, but is not guaranteed to be correct, up to date, or suitable
+for any particular purpose whatsoever. The authors accept no liability
+in respect of this information or its use.
+
+=head1 Table of Contents
+
+=over 4
+
+=item perlfaq - this document
+
+=item perlfaq1 - General Questions About Perl
+
+=item perlfaq2 - Obtaining and Learning about Perl
+
+=item perlfaq3 - Programming Tools
+
+=item perlfaq4 - Data Manipulation
+
+=item perlfaq5 - Files and Formats
+
+=item perlfaq6 - Regular Expressions
+
+=item perlfaq7 - General Perl Language Issues
+
+=item perlfaq8 - System Interaction
+
+=item perlfaq9 - Networking
+
+
+=back
+
+
+=head1 The Questions
=head2 L<perlfaq1>: General Questions About Perl
-Very general, high-level information about Perl.
+Very general, high-level questions about Perl.
=over 4
@@ -75,14 +162,14 @@ Where can I get a list of Larry Wall witticisms?
=item *
-How can I convince my sysadmin/supervisor/employees to use version 5/5.005/Perl instead of some other language?
+How can I convince my sysadmin/supervisor/employees to use version 5/5.6.1/Perl instead of some other language?
=back
=head2 L<perlfaq2>: Obtaining and Learning about Perl
-Where to find source and documentation to Perl, support,
+Where to find source and documentation for Perl, support,
and related matters.
=over 4
@@ -157,7 +244,7 @@ Where do I send bug reports?
=item *
-What is perl.com? Perl Mongers? pm.org? perl.org?
+What is perl.com? Perl Mongers? pm.org? perl.org? cpan.org?
=back
@@ -182,6 +269,10 @@ Is there a Perl shell?
=item *
+How do I find which modules are installed on my system?
+
+=item *
+
How do I debug my Perl programs?
=item *
@@ -226,10 +317,6 @@ How can I generate simple menus without using CGI or Tk?
=item *
-What is undump?
-
-=item *
-
How can I make my Perl program run faster?
=item *
@@ -238,7 +325,7 @@ How can I make my Perl program take less memory?
=item *
-Is it unsafe to return a pointer to local data?
+Is it safe to return a reference to local or lexical data?
=item *
@@ -291,8 +378,7 @@ my C program; what am I doing wrong?
=item *
-When I tried to run my script, I got this message. What does it
-mean?
+When I tried to run my script, I got this message. What does it mean?
=item *
@@ -322,7 +408,7 @@ Does Perl have a round() function? What about ceil() and floor()? Trig functio
=item *
-How do I convert bits into ints?
+How do I convert between numeric representations?
=item *
@@ -346,7 +432,11 @@ Why aren't my random numbers random?
=item *
-How do I find the week-of-the-year/day-of-the-year?
+How do I get a random number between X and Y?
+
+=item *
+
+How do I find the day or week of the year?
=item *
@@ -406,7 +496,7 @@ How do I reformat a paragraph?
=item *
-How can I access/change the first N letters of a string?
+How can I access or change N characters of a string?
=item *
@@ -422,8 +512,7 @@ How do I capitalize all the words on one line?
=item *
-How can I split a [character] delimited string except when inside
-[character]? (Comma-separated files)
+How can I split a [character] delimited string except when inside [character]?
=item *
@@ -451,7 +540,7 @@ What's wrong with always quoting "$vars"?
=item *
-Why don't my <<HERE documents work?
+Why don't my E<lt>E<lt>HERE documents work?
=item *
@@ -467,7 +556,7 @@ How can I remove duplicate elements from a list or array?
=item *
-How can I tell whether a list or array contains a certain element?
+How can I tell whether a certain element is contained in a list or array?
=item *
@@ -610,7 +699,7 @@ How do I pack arrays of doubles or floats for XS code?
=head2 L<perlfaq5>: Files and Formats
-I/O and the "f" issues: filehandles, flushing, formats and footers.
+I/O and the "f" issues: filehandles, flushing, formats, and footers.
=over 4
@@ -628,6 +717,10 @@ How do I count the number of lines in a file?
=item *
+How can I use Perl's C<-i> option from within a program?
+
+=item *
+
How do I make a temporary file name?
=item *
@@ -664,7 +757,7 @@ How come when I open a file read-write it wipes it out?
=item *
-Why do I sometimes get an "Argument list too long" when I use <*>?
+Why do I sometimes get an "Argument list too long" when I use E<lt>*E<gt>?
=item *
@@ -684,7 +777,7 @@ How can I lock a file?
=item *
-Why can't I just open(FH, ">file.lock")?
+Why can't I just open(FH, "E<gt>file.lock")?
=item *
@@ -692,6 +785,10 @@ I still don't get locking. I just want to increment the number in the file. Ho
=item *
+All I want to do is append a small amount of text to the end of a file. Do I still have to use locking?
+
+=item *
+
How do I randomly update a binary file?
=item *
@@ -757,7 +854,7 @@ Why do I get weird spaces when I print an array of lines?
=back
-=head2 L<perlfaq6>: Regexps
+=head2 L<perlfaq6>: Regular Expressions
Pattern matching and regular expressions.
@@ -939,7 +1036,7 @@ What's the difference between deep and shallow binding?
=item *
-Why doesn't "my($foo) = <FILE>;" work right?
+Why doesn't "my($foo) = E<lt>FILEE<gt>;" work right?
=item *
@@ -955,7 +1052,7 @@ How do I create a switch or case statement?
=item *
-How can I catch accesses to undefined variables/functions/methods?
+How can I catch accesses to undefined variables, functions, or methods?
=item *
@@ -977,6 +1074,10 @@ How do I clear a package?
How can I use a variable as a variable name?
+=item *
+
+What does "bad interpreter" mean?
+
=back
@@ -1188,12 +1289,16 @@ What is socket.ph and where do I get it?
=head2 L<perlfaq9>: Networking
-Networking, the Internet, and a few on the web.
+Networking, the internet, and a few on the web.
=over 4
=item *
+What is the correct form of response from a CGI script?
+
+=item *
+
My CGI script runs from the command line but not the browser. (500 Server Error)
=item *
@@ -1270,6 +1375,10 @@ How do I send mail?
=item *
+How do I use MIME to make an attachment to a mail message?
+
+=item *
+
How do I read mail?
=item *
@@ -1291,140 +1400,3 @@ How can I do RPC in Perl?
=back
-=head1 About the perlfaq documents
-
-=head2 Where to get the perlfaq
-
-This document is posted regularly to comp.lang.perl.announce and
-several other related newsgroups. It is available in a variety of
-formats from CPAN in the /CPAN/doc/FAQs/FAQ/ directory or on the web
-at http://www.perl.com/perl/faq/ .
-
-=head2 How to contribute to the perlfaq
-
-You may mail corrections, additions, and suggestions to
-perlfaq-suggestions@perl.com . This alias should not be
-used to I<ask> FAQs. It's for fixing the current FAQ.
-Send questions to the comp.lang.perl.misc newsgroup.
-
-=head2 What will happen if you mail your Perl programming problems to the authors
-
-Your questions will probably go unread, unless they're suggestions of
-new questions to add to the FAQ, in which case they should have gone
-to the perlfaq-suggestions@perl.com instead.
-
-You should have read section 2 of this faq. There you would have
-learned that comp.lang.perl.misc is the appropriate place to go for
-free advice. If your question is really important and you require a
-prompt and correct answer, you should hire a consultant.
-
-=head1 Credits
-
-When I first began the Perl FAQ in the late 80s, I never realized it
-would have grown to over a hundred pages, nor that Perl would ever become
-so popular and widespread. This document could not have been written
-without the tremendous help provided by Larry Wall and the rest of the
-Perl Porters.
-
-=head1 Author and Copyright Information
-
-Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
-All rights reserved.
-
-=head2 Bundled Distributions
-
-When included as part of the Standard Version of Perl or as part of
-its complete documentation whether printed or otherwise, this work
-may be distributed only under the terms of Perl's Artistic License.
-Any distribution of this file or derivatives thereof I<outside>
-of that package requires that special arrangements be made with
-copyright holder.
-
-Irrespective of its distribution, all code examples in these files
-are hereby placed into the public domain. You are permitted and
-encouraged to use this code in your own programs for fun
-or for profit as you see fit. A simple comment in the code giving
-credit would be courteous but is not required.
-
-=head2 Disclaimer
-
-This information is offered in good faith and in the hope that it may
-be of use, but is not guaranteed to be correct, up to date, or suitable
-for any particular purpose whatsoever. The authors accept no liability
-in respect of this information or its use.
-
-=head1 Changes
-
-=over 4
-
-=item 1/November/2000
-
-A few grammatical fixes and updates implemented by John Borwick.
-
-=item 23/May/99
-
-Extensive updates from the net in preparation for 5.6 release.
-
-=item 13/April/99
-
-More minor touch-ups. Added new question at the end
-of perlfaq7 on variable names within variables.
-
-=item 7/January/99
-
-Small touchups here and there. Added all questions in this
-document as a sort of table of contents.
-
-=item 22/June/98
-
-Significant changes throughout in preparation for the 5.005
-release.
-
-=item 24/April/97
-
-Style and whitespace changes from Chip, new question on reading one
-character at a time from a terminal using POSIX from Tom.
-
-=item 23/April/97
-
-Added http://www.oasis.leo.org/perl/ to L<perlfaq2>. Style fix to
-L<perlfaq3>. Added floating point precision, fixed complex number
-arithmetic, cross-references, caveat for Text::Wrap, alternative
-answer for initial capitalizing, fixed incorrect regexp, added example
-of Tie::IxHash to L<perlfaq4>. Added example of passing and storing
-filehandles, added commify to L<perlfaq5>. Restored variable suicide,
-and added mass commenting to L<perlfaq7>. Added Net::Telnet, fixed
-backticks, added reader/writer pair to telnet question, added FindBin,
-grouped module questions together in L<perlfaq8>. Expanded caveats
-for the simple URL extractor, gave LWP example, added CGI security
-question, expanded on the mail address answer in L<perlfaq9>.
-
-=item 25/March/97
-
-Added more info to the binary distribution section of L<perlfaq2>.
-Added Net::Telnet to L<perlfaq6>. Fixed typos in L<perlfaq8>. Added
-mail sending example to L<perlfaq9>. Added Merlyn's columns to
-L<perlfaq2>.
-
-=item 18/March/97
-
-Added the DATE to the NAME section, indicating which sections have
-changed.
-
-Mentioned SIGPIPE and L<perlipc> in the forking open answer in
-L<perlfaq8>.
-
-Fixed description of a regular expression in L<perlfaq4>.
-
-=item 17/March/97 Version
-
-Various typos fixed throughout.
-
-Added new question on Perl BNF on L<perlfaq7>.
-
-=item Initial Release: 11/March/97
-
-This is the initial release of version 3 of the FAQ; consequently there
-have been no changes since its initial release.
-
-=back
diff --git a/pod/perlfaq1.pod b/pod/perlfaq1.pod
index 68c6bfd928..13f8f421dd 100644
--- a/pod/perlfaq1.pod
+++ b/pod/perlfaq1.pod
@@ -1,6 +1,6 @@
=head1 NAME
-perlfaq1 - General Questions About Perl ($Revision: 1.23 $, $Date: 1999/05/23 16:08:30 $)
+perlfaq1 - General Questions About Perl ($Revision: 1.12 $, $Date: 2003/07/09 15:47:28 $)
=head1 DESCRIPTION
@@ -33,13 +33,17 @@ distribution for more details. See L<perlhist> (new as of 5.005)
for Perl's milestone releases.
In particular, the core development team (known as the Perl Porters)
-are a rag-tag band of highly altruistic individuals committed
-to producing better software for free than you could hope to
-purchase for money. You may snoop on pending developments via
-nntp://news.perl.com/perl.porters-gw/ and the Deja archive at
-http://www.deja.com/ using the perl.porters-gw newsgroup, or you can
-subscribe to the mailing list by sending perl5-porters-request@perl.org
-a subscription request.
+are a rag-tag band of highly altruistic individuals committed to
+producing better software for free than you could hope to purchase for
+money. You may snoop on pending developments via the archives at
+http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
+and http://archive.develooper.com/perl5-porters@perl.org/
+or the news gateway nntp://nntp.perl.org/perl.perl5.porters or
+its web interface at http://nntp.perl.org/group/perl.perl5.porters ,
+or read the faq at http://simon-cozens.org/writings/p5p-faq ,
+or you can subscribe to the mailing list by sending
+perl5-porters-request@perl.org a subscription request
+(an empty message with no subject is fine).
While the GNU project includes Perl in its distributions, there's no
such thing as "GNU Perl". Perl is not produced nor maintained by the
@@ -55,14 +59,15 @@ users the informal support will more than suffice. See the answer to
You should definitely use version 5. Version 4 is old, limited, and
no longer maintained; its last patch (4.036) was in 1992, long ago and
far away. Sure, it's stable, but so is anything that's dead; in fact,
-perl4 had been called a dead, flea-bitten camel carcass. The most recent
-production release is 5.6 (although 5.005_03 is still supported).
-The most cutting-edge development release is 5.7. Further references
-to the Perl language in this document refer to the production release
-unless otherwise specified. There may be one or more official bug fixes
-by the time you read this, and also perhaps some experimental versions
-on the way to the next release. All releases prior to 5.004 were subject
-to buffer overruns, a grave security issue.
+perl4 had been called a dead, flea-bitten camel carcass. The most
+recent production release is 5.8.1 (although 5.005_03 and 5.6.1 are
+still supported). The most cutting-edge development release is 5.9.
+Further references to the Perl language in this document refer to the
+production release unless otherwise specified. There may be one or
+more official bug fixes by the time you read this, and also perhaps
+some experimental versions on the way to the next release.
+All releases prior to 5.004 were subject to buffer overruns, a grave
+security issue.
=head2 What are perl4 and perl5?
@@ -78,7 +83,7 @@ The 5.0 release is, essentially, a ground-up rewrite of the original
perl source code from releases 1 through 4. It has been modularized,
object-oriented, tweaked, trimmed, and optimized until it almost doesn't
look like the old code. However, the interface is mostly the same, and
-compatibility with previous releases is very high.
+compatibility with previous releases is very high.
See L<perltrap/"Perl4 to Perl5 Traps">.
To avoid the "what language is perl5?" confusion, some people prefer to
@@ -87,20 +92,31 @@ simply use "perl" to refer to the latest version of perl and avoid using
See L<perlhist> for a history of Perl revisions.
+=head2 What is Ponie?
+
+At The O'Reilly Open Source Software Convention in 2003, Artur
+Bergman, Fotango, and The Perl Foundation announced a project to
+run perl5 on the Parrot virtual machine named Ponie. Ponie stands for
+Perl On New Internal Engine. The Perl 5.10 language implementation
+will be used for Ponie, and there will be no language level
+differences between perl5 and ponie. Ponie is not a complete rewrite
+of perl5.
+
=head2 What is perl6?
-At The Second O'Reilly Open Source Software Convention, Larry Wall
+At The Second O'Reilly Open Source Software Convention, Larry Wall
announced Perl6 development would begin in earnest. Perl6 was an oft
used term for Chip Salzenberg's project to rewrite Perl in C++ named
-Topaz. However, Topaz should not be confused with the nisus to rewrite
-Perl while keeping the lessons learned from other software, as well as
-Perl5, in mind.
+Topaz. However, Topaz provided valuable insights to the next version
+of Perl and its implementation, but was ultimately abandoned.
-If you have a desire to help in the crusade to make Perl a better place
-then peruse the Perl6 developers page at http://www.perl.org/perl6/ and
-get involved.
+If you want to learn more about Perl6, or have a desire to help in
+the crusade to make Perl a better place then peruse the Perl6 developers
+page at http://dev.perl.org/perl6/ and get involved.
-The first alpha release is expected by Summer 2001.
+Perl6 is not scheduled for release yet, and Perl5 will still be supported
+for quite awhile after its release. Do not wait for Perl6 to do whatever
+you need to do.
"We're really serious about reinventing everything that needs reinventing."
--Larry Wall
@@ -211,7 +227,7 @@ i.e. the current interpreter. Hence Tom's quip that "Nothing but perl
can parse Perl." You may or may not choose to follow this usage. For
example, parallelism means "awk and perl" and "Python and Perl" look
OK, while "awk and Perl" and "Python and perl" do not. But never
-write "PERL", because perl isn't really an acronym, apocryphal
+write "PERL", because perl is not an acronym, apocryphal
folklore and post-facto expansions notwithstanding.
=head2 Is it a Perl program or a Perl script?
@@ -252,18 +268,14 @@ programmers prefer to avoid them altogether.
These are the "just another perl hacker" signatures that some people
sign their postings with. Randal Schwartz made these famous. About
100 of the earlier ones are available from
-http://www.perl.com/CPAN/misc/japh .
+http://www.cpan.org/misc/japh .
=head2 Where can I get a list of Larry Wall witticisms?
Over a hundred quips by Larry, from postings of his or source code,
-can be found at http://www.perl.com/CPAN/misc/lwall-quotes.txt.gz .
-
-Newer examples can be found by perusing Larry's postings:
-
- http://x1.dejanews.com/dnquery.xp?QRY=*&DBS=2&ST=PS&defaultOp=AND&LNG=ALL&format=terse&showsort=date&maxhits=100&subjects=&groups=&authors=larry@*wall.org&fromdate=&todate=
+can be found at http://www.cpan.org/misc/lwall-quotes.txt.gz .
-=head2 How can I convince my sysadmin/supervisor/employees to use version 5/5.005/Perl instead of some other language?
+=head2 How can I convince my sysadmin/supervisor/employees to use version 5/5.6.1/Perl instead of some other language?
If your manager or employees are wary of unsupported software, or
software which doesn't officially ship with your operating system, you
@@ -295,11 +307,12 @@ for any given task. Also mention that the difference between version
(Well, OK, maybe it's not quite that distinct, but you get the idea.)
If you want support and a reasonable guarantee that what you're
developing will continue to work in the future, then you have to run
-the supported version. As of April 2001 that probably means
-running either of the releases 5.6.1 (released in April 2001) or
-5.005_03 (released in March 1999), although 5.004_05 isn't that bad
-if you B<absolutely> need such an old version (released in April 1999)
-for stability reasons. Anything older than 5.004_05 shouldn't be used.
+the supported version. As of October 2003 that means running either
+5.8.1 (released in September 2003), or one of the older releases like
+5.6.1 (released in April 2001) or 5.005_03 (released in March 1999),
+although 5.004_05 isn't that bad if you B<absolutely> need such an old
+version (released in April 1999) for stability reasons.
+Anything older than 5.004_05 shouldn't be used.
Of particular note is the massive bug hunt for buffer overflow
problems that went into the 5.004 release. All releases prior to
@@ -310,16 +323,18 @@ In August 2000 in all Linux distributions a new security problem was
found in the optional 'suidperl' (not built or installed by default)
in all the Perl branches 5.6, 5.005, and 5.004, see
http://www.cpan.org/src/5.0/sperl-2000-08-05/
+Perl maintenance releases 5.6.1 and 5.8.0 have this security hole closed.
+Most, if not all, Linux distribution have patches for this
+vulnerability available, see http://www.linuxsecurity.com/advisories/ ,
+but the most recommendable way is to upgrade to at least Perl 5.6.1.
=head1 AUTHOR AND COPYRIGHT
Copyright (c) 1997, 1998, 1999, 2000, 2001 Tom Christiansen and Nathan
Torkington. All rights reserved.
-When included as an integrated part of the Standard Distribution
-of Perl or of its documentation (printed or otherwise), this works is
-covered under Perl's Artistic Licence. For separate distributions of
-all or part of this FAQ outside of that, see L<perlfaq>.
+This documentation is free; you can redistribute it and/or modify it
+under the same terms as Perl itself.
Irrespective of its distribution, all code examples here are in the public
domain. You are permitted and encouraged to use this code and any
diff --git a/pod/perlfaq2.pod b/pod/perlfaq2.pod
index aecc1fc4c3..8649ca8882 100644
--- a/pod/perlfaq2.pod
+++ b/pod/perlfaq2.pod
@@ -1,6 +1,6 @@
=head1 NAME
-perlfaq2 - Obtaining and Learning about Perl ($Revision: 1.32 $, $Date: 1999/10/14 18:46:09 $)
+perlfaq2 - Obtaining and Learning about Perl ($Revision: 1.20 $, $Date: 2003/01/26 17:50:56 $)
=head1 DESCRIPTION
@@ -41,7 +41,7 @@ get free compilers for, not for Unix systems.
Some URLs that might help you are:
http://www.cpan.org/ports/
- http://language.perl.com/info/software.html
+ http://www.perl.com/pub/language/info/software.html
Someone looking for a Perl for Win16 might look to Laszlo Molnar's djgpp
port in http://www.cpan.org/ports/#msdos , which comes with clear
@@ -69,7 +69,7 @@ approaches are doomed to failure.
One simple way to check that things are in the right place is to print out
the hard-coded @INC that perl looks through for libraries:
- % perl -e 'print join("\n",@INC)'
+ % perl -le 'print for @INC'
If this command lists any paths that don't exist on your system, then you
may need to move the appropriate libraries to these locations, or create
@@ -90,14 +90,14 @@ architecture.
=head2 What modules and extensions are available for Perl? What is CPAN? What does CPAN/src/... mean?
-CPAN stands for Comprehensive Perl Archive Network, a ~700mb archive
+CPAN stands for Comprehensive Perl Archive Network, a ~1.2Gb archive
replicated on nearly 200 machines all over the world. CPAN contains
source code, non-native ports, documentation, scripts, and many
third-party modules and extensions, designed for everything from
commercial database interfaces to keyboard/screen control to web
walking and CGI scripts. The master web site for CPAN is
http://www.cpan.org/ and there is the CPAN Multiplexer at
-http://www.perl.com/CPAN/CPAN.html which will choose a mirror near you
+http://www.cpan.org/CPAN.html which will choose a mirror near you
via DNS. See http://www.perl.com/CPAN (without a slash at the
end) for how this process works. Also, http://mirror.cpan.org/
has a nice interface to the http://www.cpan.org/MIRRORED.BY
@@ -129,6 +129,7 @@ miscellaneous modules.
See http://www.cpan.org/modules/00modlist.long.html or
http://search.cpan.org/ for a more complete list of modules by category.
+CPAN is not affiliated with O'Reilly and Associates.
=head2 Is there an ISO or ANSI certified version of Perl?
@@ -166,25 +167,41 @@ assistance:
http://perldoc.cpan.org/
http://www.perldoc.com/
- http://reference.perl.com/query.cgi?tutorials
http://bookmarks.cpan.org/search.cgi?cat=Training%2FTutorials
=head2 What are the Perl newsgroups on Usenet? Where do I post questions?
-The now defunct comp.lang.perl newsgroup has been superseded by the
-following groups:
+Several groups devoted to the Perl language are on Usenet:
comp.lang.perl.announce Moderated announcement group
- comp.lang.perl.misc Very busy group about Perl in general
- comp.lang.perl.moderated Moderated discussion group
+ comp.lang.perl.misc High traffic general Perl discussion
+ comp.lang.perl.moderated Moderated discussion group
comp.lang.perl.modules Use and development of Perl modules
comp.lang.perl.tk Using Tk (and X) from Perl
comp.infosystems.www.authoring.cgi Writing CGI scripts for the Web.
-There is also Usenet gateway to the mailing list used by the crack
-Perl development team (perl5-porters) at
-news://news.perl.com/perl.porters-gw/ .
+Some years ago, comp.lang.perl was divided into those groups, and
+comp.lang.perl itself officially removed. While that group may still
+be found on some news servers, it is unwise to use it, because
+postings there will not appear on news servers which honour the
+official list of group names. Use comp.lang.perl.misc for topics
+which do not have a more-appropriate specific group.
+
+There is also a Usenet gateway to Perl mailing lists sponsored by
+perl.org at nntp://nntp.perl.org , a web interface to the same lists
+at http://nntp.perl.org/group/ and these lists are also available
+under the C<perl.*> hierarchy at http://groups.google.com . Other
+groups are listed at http://lists.perl.org/ ( also known as
+http://lists.cpan.org/ ).
+
+A nice place to ask questions is the PerlMonks site,
+http://www.perlmonks.org/ , or the Perl Beginners mailing list
+http://lists.perl.org/showlist.cgi?name=beginners .
+
+Note that none of the above are supposed to write your code for you:
+asking questions about particular problems or general advice is fine,
+but asking someone to write your code for free is not very cool.
=head2 Where should I post source code?
@@ -192,12 +209,12 @@ You should post source code to whichever group is most appropriate, but
feel free to cross-post to comp.lang.perl.misc. If you want to cross-post
to alt.sources, please make sure it follows their posting standards,
including setting the Followup-To header line to NOT include alt.sources;
-see their FAQ (http://www.faqs.org/faqs/alt-sources-intro/) for details.
+see their FAQ ( http://www.faqs.org/faqs/alt-sources-intro/ ) for details.
If you're just looking for software, first use Google
-(http://www.google.com), Deja (http://www.deja.com), and
-CPAN Search (http://search.cpan.org). This is faster and more
-productive than just posting a request.
+( http://www.google.com ), Google's usenet search interface
+( http://groups.google.com ), and CPAN Search ( http://search.cpan.org ).
+This is faster and more productive than just posting a request.
=head2 Perl Books
@@ -222,32 +239,50 @@ of real-world examples, mini-tutorials, and complete programs is:
by Tom Christiansen and Nathan Torkington,
with Foreword by Larry Wall
ISBN 1-56592-243-3 [1st Edition August 1998]
- http://perl.oreilly.com/cookbook/
+ http://perl.oreilly.com/catalog/cookbook/
-If you're already a hard-core systems programmer, then the Camel Book
-might suffice for you to learn Perl from. If you're not, check out
+If you're already a seasoned programmer, then the Camel Book might
+suffice for you to learn Perl from. If you're not, check out the
+Llama book:
- Learning Perl (the "Llama Book"):
- by Randal Schwartz and Tom Christiansen
- with Foreword by Larry Wall
- ISBN 1-56592-284-0 [2nd Edition July 1997]
- http://www.oreilly.com/catalog/lperl2/
+ Learning Perl (the "Llama Book")
+ by Randal L. Schwartz and Tom Phoenix
+ ISBN 0-596-00132-0 [3rd edition July 2001]
+ http://www.oreilly.com/catalog/lperl3/
-Despite the picture at the URL above, the second edition of "Llama
-Book" really has a blue cover and was updated for the 5.004 release
-of Perl. Various foreign language editions are available, including
-I<Learning Perl on Win32 Systems> (the "Gecko Book").
+And for more advanced information on writing larger programs,
+presented in the same style as the Llama book, continue your education
+with the Alpaca book:
-If you're not an accidental programmer, but a more serious and possibly
-even degreed computer scientist who doesn't need as much hand-holding as
-we try to provide in the Llama or its defurred cousin the Gecko, please
-check out the delightful book, I<Perl: The Programmer's Companion>,
-written by Nigel Chapman.
+ Learning Perl Objects, References, and Modules (the "Alpaca Book")
+ by Randal L. Schwartz, with Tom Phoenix (foreword by Damian Conway)
+ ISBN 0-596-00478-8 [1st edition June 2003]
+ http://www.oreilly.com/catalog/lrnperlorm/
+
+If you're not an accidental programmer, but a more serious and
+possibly even degreed computer scientist who doesn't need as much
+hand-holding as we try to provide in the Llama, please check out the
+delightful book
+
+ Perl: The Programmer's Companion
+ by Nigel Chapman
+ ISBN 0-471-97563-X [1997, 3rd printing Spring 1998]
+ http://www.wiley.com/compbooks/catalog/97563-X.htm
+ http://www.wiley.com/compbooks/chapman/perl/perltpc.html (errata etc)
-Addison-Wesley (http://www.awlonline.com/) and Manning
-(http://www.manning.com/) are also publishers of some fine Perl books
-such as Object Oriented Programming with Perl by Damian Conway and
-Network Programming with Perl by Lincoln Stein.
+If you are more at home in Windows the following is available
+(though unfortunately rather dated).
+
+ Learning Perl on Win32 Systems (the "Gecko Book")
+ by Randal L. Schwartz, Erik Olson, and Tom Christiansen,
+ with foreword by Larry Wall
+ ISBN 1-56592-324-3 [1st edition August 1997]
+ http://www.oreilly.com/catalog/lperlwin/
+
+Addison-Wesley ( http://www.awlonline.com/ ) and Manning
+( http://www.manning.com/ ) are also publishers of some fine Perl books
+such as I<Object Oriented Programming with Perl> by Damian Conway and
+I<Network Programming with Perl> by Lincoln Stein.
An excellent technical book discounter is Bookpool at
http://www.bookpool.com/ where a 30% discount or more is not unusual.
@@ -267,12 +302,12 @@ Recommended books on (or mostly on) Perl follow.
http://www.oreilly.com/catalog/pperl3/
Perl 5 Pocket Reference
- by Johan Vromans
+ by Johan Vromans
ISBN 0-596-00032-4 [3rd edition May 2000]
http://www.oreilly.com/catalog/perlpr3/
Perl in a Nutshell
- by Ellen Siever, Stephan Spainhour, and Nathan Patwardhan
+ by Ellen Siever, Stephan Spainhour, and Nathan Patwardhan
ISBN 1-56592-286-7 [1st edition December 1998]
http://www.oreilly.com/catalog/perlnut/
@@ -280,14 +315,18 @@ Recommended books on (or mostly on) Perl follow.
Elements of Programming with Perl
by Andrew L. Johnson
- ISBN 1884777805 [1st edition October 1999]
+ ISBN 1-884777-80-5 [1st edition October 1999]
http://www.manning.com/Johnson/
Learning Perl
- by Randal L. Schwartz and Tom Christiansen
- with foreword by Larry Wall
- ISBN 1-56592-284-0 [2nd edition July 1997]
- http://www.oreilly.com/catalog/lperl2/
+ by Randal L. Schwartz and Tom Phoenix
+ ISBN 0-596-00132-0 [3rd edition July 2001]
+ http://www.oreilly.com/catalog/lperl3/
+
+ Learning Perl Objects, References, and Modules
+ by Randal L. Schwartz, with Tom Phoenix (foreword by Damian Conway)
+ ISBN 0-596-00478-8 [1st edition June 2003]
+ http://www.oreilly.com/catalog/lrnperlorm/
Learning Perl on Win32 Systems
by Randal L. Schwartz, Erik Olson, and Tom Christiansen,
@@ -297,8 +336,9 @@ Recommended books on (or mostly on) Perl follow.
Perl: The Programmer's Companion
by Nigel Chapman
- ISBN 0-471-97563-X [1st edition October 1997]
- http://catalog.wiley.com/title.cgi?isbn=047197563X
+ ISBN 0-471-97563-X [1997, 3rd printing Spring 1998]
+ http://www.wiley.com/compbooks/catalog/97563-X.htm
+ http://www.wiley.com/compbooks/chapman/perl/perltpc.html (errata etc)
Cross-Platform Perl
by Eric Foster-Johnson
@@ -329,8 +369,8 @@ Recommended books on (or mostly on) Perl follow.
Mastering Regular Expressions
by Jeffrey E. F. Friedl
- ISBN 1-56592-257-3 [1st edition January 1997]
- http://www.oreilly.com/catalog/regex/
+ ISBN 0-596-00289-0 [2nd edition July 2002]
+ http://www.oreilly.com/catalog/regex2/
Network Programming with Perl
by Lincoln Stein
@@ -340,61 +380,59 @@ Recommended books on (or mostly on) Perl follow.
Object Oriented Perl
Damian Conway
with foreword by Randal L. Schwartz
- ISBN 1884777791 [1st edition August 1999]
+ ISBN 1-884777-79-1 [1st edition August 1999]
http://www.manning.com/Conway/
Data Munging with Perl
- Dave Cross
- ISBN 1930110006 [1st edition 2001]
- http://www.manning.com/cross
+ Dave Cross
+ ISBN 1-930110-00-6 [1st edition 2001]
+ http://www.manning.com/cross
- Learning Perl/Tk
- by Nancy Walsh
- ISBN 1-56592-314-6 [1st edition January 1999]
- http://www.oreilly.com/catalog/lperltk/
+ Mastering Perl/Tk
+ by Steve Lidie and Nancy Walsh
+ ISBN 1-56592-716-8 [1st edition January 2002]
+ http://www.oreilly.com/catalog/mastperltk/
+
+ Extending and Embedding Perl
+ by Tim Jenness and Simon Cozens
+ ISBN 1-930110-82-0 [1st edition August 2002]
+ http://www.manning.com/jenness
=back
=head2 Perl in Magazines
-The first and only periodical devoted to All Things Perl, I<The
-Perl Journal> contained tutorials, demonstrations, case studies,
-announcements, contests, and much more. I<TPJ> had columns on web
+The first (and for a long time, only) periodical devoted to All Things Perl,
+I<The Perl Journal> contains tutorials, demonstrations, case studies,
+announcements, contests, and much more. I<TPJ> has columns on web
development, databases, Win32 Perl, graphical programming, regular
-expressions, and networking, and sponsored the Obfuscated Perl
-Contest. Sadly, this publication is no longer in circulation, but
-should it be resurrected, it will most likely be announced on
-http://use.perl.org/ .
-
-Beyond this, magazines that frequently carry high-quality articles
-on Perl are I<Web Techniques> (see http://www.webtechniques.com/),
-I<Performance Computing> (http://www.performance-computing.com/), and Usenix's
-newsletter/magazine to its members, I<login:>, at http://www.usenix.org/.
-Randal's Web Technique's columns are available on the web at
-http://www.stonehenge.com/merlyn/WebTechniques/ .
+expressions, and networking, and sponsors the Obfuscated Perl Contest
+and the Perl Poetry Contests. As of mid-2001, the dead tree version
+of TPJ will be published as a quarterly supplement of SysAdmin
+magazine ( http://www.sysadminmag.com/ ) For more details on TPJ,
+see http://www.tpj.com/
+
+Beyond this, magazines that frequently carry quality articles on
+Perl are I<The Perl Review> ( http://www.theperlreview.com ),
+I<Unix Review> ( http://www.unixreview.com/ ),
+I<Linux Magazine> ( http://www.linuxmagazine.com/ ),
+and Usenix's newsletter/magazine to its members, I<login:>
+( http://www.usenix.org/ )
+
+The Perl columns of Randal L. Schwartz are available on the web at
+http://www.stonehenge.com/merlyn/WebTechniques/ ,
+http://www.stonehenge.com/merlyn/UnixReview/ , and
+http://www.stonehenge.com/merlyn/LinuxMag/ .
=head2 Perl on the Net: FTP and WWW Access
-To get the best performance, pick a site from
-the list below and use it to grab the complete list of mirror sites
-which is at /CPAN/MIRRORED.BY or at http://mirror.cpan.org/.
-From there you can find the quickest site for you. Remember, the
-following list is I<not> the complete list of CPAN mirrors
-(the complete list contains 165 sites as of January 2001):
-
- http://www.cpan.org/
- http://www.perl.com/CPAN/
- http://download.sourceforge.net/mirrors/CPAN/
- ftp://ftp.digital.com/pub/plan/perl/CPAN/
- ftp://ftp.flirble.org/pub/languages/perl/CPAN/
- ftp://ftp.uvsq.fr/pub/perl/CPAN/
- ftp://ftp.funet.fi/pub/languages/perl/CPAN/
- ftp://ftp.dti.ad.jp/pub/lang/CPAN/
- ftp://mirror.aarnet.edu.au/pub/perl/CPAN/
- ftp://cpan.if.usp.br/pub/mirror/CPAN/
-
-One may also use xx.cpan.org where "xx" is the 2-letter country code
-for your domain; e.g. Australia would use au.cpan.org.
+To get the best performance, pick a site from the list at
+http://www.cpan.org/SITES.html . From there you can find the quickest
+site for you.
+
+You may also use xx.cpan.org where "xx" is the 2-letter country code
+for your domain; e.g. Australia would use au.cpan.org. [Note: This
+only applies to countries that host at least one mirror.]
=head2 What mailing lists are there for Perl?
@@ -402,26 +440,21 @@ Most of the major modules (Tk, CGI, libwww-perl) have their own
mailing lists. Consult the documentation that came with the module for
subscription information.
- http://lists.cpan.org/
+A comprehensive list of Perl related mailing lists can be found at:
-=head2 Archives of comp.lang.perl.misc
-
-Have you tried Deja or AltaVista? Those are the
-best archives. Just look up "*perl*" as a newsgroup.
+ http://lists.perl.org/
- http://www.deja.com/dnquery.xp?QRY=&DBS=2&ST=PS&defaultOp=AND&LNG=ALL&format=terse&showsort=date&maxhits=25&subjects=&groups=*perl*&authors=&fromdate=&todate=
+=head2 Archives of comp.lang.perl.misc
-You might want to trim that down a bit, though.
+The Google search engine now carries archived and searchable newsgroup
+content.
-You'll probably want more a sophisticated query and retrieval mechanism
-than a file listing, preferably one that allows you to retrieve
-articles using a fast-access indices, keyed on at least author, date,
-subject, thread (as in "trn") and probably keywords. The best
-solution the FAQ authors know of is the MH pick command, but it is
-very slow to select on 18000 articles.
+http://groups.google.com/groups?group=comp.lang.perl.misc
-If you have, or know where can be found, the missing sections, please
-let perlfaq-suggestions@perl.com know.
+If you have a question, you can be sure someone has already asked the
+same question at some point on c.l.p.m. It requires some time and patience
+to sift through all the content but often you will find the answer you
+seek.
=head2 Where can I buy a commercial version of Perl?
@@ -431,7 +464,7 @@ in releases and comes in well-defined packages. There is a very large
user community and an extensive literature. The comp.lang.perl.*
newsgroups and several of the mailing lists provide free answers to your
questions in near real-time. Perl has traditionally been supported by
-Larry, scores of software designers and developers, and myriads of
+Larry, scores of software designers and developers, and myriad
programmers, all working for free to create a useful thing to make life
better for everyone.
@@ -484,15 +517,10 @@ bugs.
Read the perlbug(1) man page (perl5.004 or later) for more information.
-=head2 What is perl.com? Perl Mongers? pm.org? perl.org?
+=head2 What is perl.com? Perl Mongers? pm.org? perl.org? cpan.org?
-The Perl Home Page at http://www.perl.com/ is currently hosted on a
-T3 line courtesy of Songline Systems, a software-oriented subsidiary of
-O'Reilly and Associates. Other starting points include
-
- http://language.perl.com/
- http://conference.perl.com/
- http://reference.perl.com/
+The Perl Home Page at http://www.perl.com/ is currently hosted by
+The O'Reilly Network, a subsidiary of O'Reilly and Associates.
Perl Mongers is an advocacy organization for the Perl language which
maintains the web site http://www.perl.org/ as a general advocacy
@@ -512,18 +540,19 @@ and there are many other sub-domains for special topics, such as
http://bugs.perl.org/
http://history.perl.org/
http://lists.perl.org/
- http://news.perl.org/
http://use.perl.org/
+http://www.cpan.org/ is the Comprehensive Perl Archive Network,
+a replicated worlwide repository of Perl software, see
+the I<What is CPAN?> question earlier in this document.
+
=head1 AUTHOR AND COPYRIGHT
Copyright (c) 1997-2001 Tom Christiansen and Nathan Torkington.
All rights reserved.
-When included as an integrated part of the Standard Distribution
-of Perl or of its documentation (printed or otherwise), this works is
-covered under Perl's Artistic License. For separate distributions of
-all or part of this FAQ outside of that, see L<perlfaq>.
+This documentation is free; you can redistribute it and/or modify it
+under the same terms as Perl itself.
Irrespective of its distribution, all code examples here are in the public
domain. You are permitted and encouraged to use this code and any
diff --git a/pod/perlfaq3.pod b/pod/perlfaq3.pod
index 49cae1a209..8fd484fea2 100644
--- a/pod/perlfaq3.pod
+++ b/pod/perlfaq3.pod
@@ -1,6 +1,6 @@
=head1 NAME
-perlfaq3 - Programming Tools ($Revision: 1.38 $, $Date: 1999/05/23 16:08:30 $)
+perlfaq3 - Programming Tools ($Revision: 1.35 $, $Date: 2003/08/24 05:26:59 $)
=head1 DESCRIPTION
@@ -11,7 +11,7 @@ and programming support.
Have you looked at CPAN (see L<perlfaq2>)? The chances are that
someone has already written a module that can solve your problem.
-Have you read the appropriate man pages? Here's a brief index:
+Have you read the appropriate manpages? Here's a brief index:
Basics perldata, perlvar, perlsyn, perlop, perlsub
Execution perlrun, perldebug
@@ -22,15 +22,16 @@ Have you read the appropriate man pages? Here's a brief index:
Regexes perlre, perlfunc, perlop, perllocale
Moving to perl5 perltrap, perl
Linking w/C perlxstut, perlxs, perlcall, perlguts, perlembed
- Various http://www.perl.com/CPAN/doc/FMTEYEWTK/index.html
- (not a man-page but still useful)
+ Various http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz
+ (not a man-page but still useful, a collection
+ of various essays on Perl techniques)
-A crude table of contents for the Perl man page set is found in L<perltoc>.
+A crude table of contents for the Perl manpage set is found in L<perltoc>.
=head2 How can I use Perl interactively?
The typical approach uses the Perl debugger, described in the
-perldebug(1) man page, on an ``empty'' program, like this:
+perldebug(1) manpage, on an ``empty'' program, like this:
perl -de 42
@@ -41,14 +42,70 @@ operations typically found in symbolic debuggers.
=head2 Is there a Perl shell?
-In general, no. The Shell.pm module (distributed with Perl) makes
-Perl try commands which aren't part of the Perl language as shell
-commands. perlsh from the source distribution is simplistic and
-uninteresting, but may still be what you want.
+The psh (Perl sh) is currently at version 1.8. The Perl Shell is a
+shell that combines the interactive nature of a Unix shell with the
+power of Perl. The goal is a full featured shell that behaves as
+expected for normal shell activity and uses Perl syntax and
+functionality for control-flow statements and other things.
+You can get psh at http://www.focusresearch.com/gregor/psh/ .
+
+Zoidberg is a similar project and provides a shell written in perl,
+configured in perl and operated in perl. It is intended as a login shell
+and development environment. It can be found at http://zoidberg.sf.net/
+or your local CPAN mirror.
+
+The Shell.pm module (distributed with Perl) makes Perl try commands
+which aren't part of the Perl language as shell commands. perlsh
+from the source distribution is simplistic and uninteresting, but
+may still be what you want.
+
+=head2 How do I find which modules are installed on my system?
+
+You can use the ExtUtils::Installed module to show all
+installed distributions, although it can take awhile to do
+its magic. The standard library which comes with Perl just
+shows up as "Perl" (although you can get those with
+Mod::CoreList).
+
+ use ExtUtils::Installed;
+
+ my $inst = ExtUtils::Installed->new();
+ my @modules = $inst->modules();
+
+If you want a list of all of the Perl module filenames, you
+can use File::Find::Rule.
+
+ use File::Find::Rule;
+
+ my @files = File::Find::Rule->file()->name( '*.pm' )->in( @INC );
+
+If you do not have that module, you can do the same thing
+with File::Find which is part of the standard library.
+
+ use File::Find;
+ my @files;
+
+ find sub { push @files, $File::Find::name if -f _ && /\.pm$/ },
+ @INC;
+
+ print join "\n", @files;
+
+If you simply need to quickly check to see if a module is
+available, you can check for its documentation. If you can
+read the documentation the module is most likely installed.
+If you cannot read the documentation, the module might not
+have any (in rare cases).
+
+ prompt% perldoc Module::Name
+
+You can also try to include the module in a one-liner to see if
+perl finds it.
+
+ perl -MModule::Name -e1
=head2 How do I debug my Perl programs?
-Have you tried C<use warnings> or used C<-w>? They enable warnings
+Have you tried C<use warnings> or used C<-w>? They enable warnings
to detect dubious practices.
Have you tried C<use strict>? It prevents you from using symbolic
@@ -74,9 +131,9 @@ why what it's doing isn't what it should be doing.
=head2 How do I profile my Perl programs?
You should get the Devel::DProf module from the standard distribution
-(or separately on CPAN) and also use Benchmark.pm from the standard
-distribution. The Benchmark module lets you time specific portions of
-your code, while Devel::DProf gives detailed breakdowns of where your
+(or separately on CPAN) and also use Benchmark.pm from the standard
+distribution. The Benchmark module lets you time specific portions of
+your code, while Devel::DProf gives detailed breakdowns of where your
code spends its time.
Here's a sample use of Benchmark:
@@ -89,10 +146,8 @@ Here's a sample use of Benchmark:
timethese($count, {
'map' => sub { my @a = @junk;
map { s/a/b/ } @a;
- return @a
- },
+ return @a },
'for' => sub { my @a = @junk;
- local $_;
for (@a) { s/a/b/ };
return @a },
});
@@ -110,50 +165,50 @@ of contrasting algorithms.
=head2 How do I cross-reference my Perl programs?
-The B::Xref module, shipped with the new, alpha-release Perl compiler
-(not the general distribution prior to the 5.005 release), can be used
-to generate cross-reference reports for Perl programs.
+The B::Xref module can be used to generate cross-reference reports
+for Perl programs.
perl -MO=Xref[,OPTIONS] scriptname.plx
=head2 Is there a pretty-printer (formatter) for Perl?
-There is no program that will reformat Perl as much as indent(1) does
-for C. The complex feedback between the scanner and the parser (this
-feedback is what confuses the vgrind and emacs programs) makes it
-challenging at best to write a stand-alone Perl parser.
-
-Of course, if you simply follow the guidelines in L<perlstyle>, you
-shouldn't need to reformat. The habit of formatting your code as you
-write it will help prevent bugs. Your editor can and should help you
-with this. The perl-mode or newer cperl-mode for emacs can provide
-remarkable amounts of help with most (but not all) code, and even less
-programmable editors can provide significant assistance. Tom swears
-by the following settings in vi and its clones:
+Perltidy is a Perl script which indents and reformats Perl scripts
+to make them easier to read by trying to follow the rules of the
+L<perlstyle>. If you write Perl scripts, or spend much time reading
+them, you will probably find it useful. It is available at
+http://perltidy.sourceforge.net
+
+Of course, if you simply follow the guidelines in L<perlstyle>,
+you shouldn't need to reformat. The habit of formatting your code
+as you write it will help prevent bugs. Your editor can and should
+help you with this. The perl-mode or newer cperl-mode for emacs
+can provide remarkable amounts of help with most (but not all)
+code, and even less programmable editors can provide significant
+assistance. Tom Christiansen and many other VI users swear by
+the following settings in vi and its clones:
set ai sw=4
map! ^O {^M}^[O^T
-Now put that in your F<.exrc> file (replacing the caret characters
+Put that in your F<.exrc> file (replacing the caret characters
with control characters) and away you go. In insert mode, ^T is
for indenting, ^D is for undenting, and ^O is for blockdenting--
-as it were. If you haven't used the last one, you're missing
-a lot. A more complete example, with comments, can be found at
-http://www.perl.com/CPAN-local/authors/id/TOMC/scripts/toms.exrc.gz
-
-If you are used to using the I<vgrind> program for printing out nice code
-to a laser printer, you can take a stab at this using
-http://www.perl.com/CPAN/doc/misc/tips/working.vgrind.entry, but the
-results are not particularly satisfying for sophisticated code.
+as it were. A more complete example, with comments, can be found at
+http://www.cpan.org/authors/id/TOMC/scripts/toms.exrc.gz
-The a2ps at http://www.infres.enst.fr/%7Edemaille/a2ps/ does lots of things
-related to generating nicely printed output of documents.
+The a2ps http://www-inf.enst.fr/%7Edemaille/a2ps/black+white.ps.gz does
+lots of things related to generating nicely printed output of
+documents, as does enscript at http://people.ssh.fi/mtr/genscript/ .
=head2 Is there a ctags for Perl?
-There's a simple one at
-http://www.perl.com/CPAN/authors/id/TOMC/scripts/ptags.gz which may do
-the trick. And if not, it's easy to hack into what you want.
+Recent versions of ctags do much more than older versions did.
+EXUBERANT CTAGS is available from http://ctags.sourceforge.net/
+and does a good job of making tags files for perl code.
+
+There is also a simple one at
+http://www.cpan.org/authors/id/TOMC/scripts/ptags.gz which may do
+the trick. It can be easy to hack this into what you want.
=head2 Is there an IDE or Windows Perl Editor?
@@ -163,39 +218,46 @@ If you're on Unix, you already have an IDE--Unix itself. The UNIX
philosophy is the philosophy of several small tools that each do one
thing and do it well. It's like a carpenter's toolbox.
-If you want a Windows IDE, check the following:
+If you want an IDE, check the following:
=over 4
-=item CodeMagicCD
-
-http://www.codemagiccd.com/
-
=item Komodo
-ActiveState's cross-platform, multi-language IDE has Perl support,
-including a regular expression debugger and remote debugging
-(http://www.ActiveState.com/Products/Komodo/index.html).
-(Visual Perl, a Visual Studio.NET plug-in is currently (early 2001)
-in beta (http://www.ActiveState.com/Products/VisualPerl/index.html)).
+ActiveState's cross-platform (as of April 2001 Windows and Linux),
+multi-language IDE has Perl support, including a regular expression
+debugger and remote debugging
+( http://www.ActiveState.com/Products/Komodo/index.html ). (Visual
+Perl, a Visual Studio.NET plug-in is currently (early 2001) in beta
+( http://www.ActiveState.com/Products/VisualPerl/index.html )).
=item The Object System
-(http://www.castlelink.co.uk/object_system/) is a Perl web
-applications development IDE.
+( http://www.castlelink.co.uk/object_system/ ) is a Perl web
+applications development IDE, apparently for any platform
+that runs Perl.
+
+=item Open Perl IDE
+
+( http://open-perl-ide.sourceforge.net/ )
+Open Perl IDE is an integrated development environment for writing
+and debugging Perl scripts with ActiveState's ActivePerl distribution
+under Windows 95/98/NT/2000.
=item PerlBuilder
-(http://www.solutionsoft.com/perl.htm) is an integrated development
+( http://www.solutionsoft.com/perl.htm ) is an integrated development
environment for Windows that supports Perl development.
-=item Perl code magic
+=item visiPerl+
-(http://www.petes-place.com/codemagic.html).
+( http://helpconsulting.net/visiperl/ )
+From Help Consulting, for Windows.
-=item visiPerl+
+=item OptiPerl
-http://helpconsulting.net/visiperl/, from Help Consulting.
+( http://www.optiperl.com/ ) is a Windows IDE with simulated CGI
+environment, including debugger and syntax highlighting editor.
=back
@@ -204,7 +266,21 @@ and possibly an emacs too, so you may not need to download anything.
In any emacs the cperl-mode (M-x cperl-mode) gives you perhaps the
best available Perl editing mode in any editor.
-For Windows editors: you can download an Emacs
+If you are using Windows, you can use any editor that lets
+you work with plain text, such as NotePad or WordPad. Word
+processors, such as Microsoft Word or WordPerfect, typically
+do not work since they insert all sorts of behind-the-scenes
+information, although some allow you to save files as "Text
+Only". You can also download text editors designed
+specifically for programming, such as Textpad
+( http://www.textpad.com/ ) and UltraEdit
+( http://www.ultraedit.com/ ), among others.
+
+If you are using MacOS, the same concerns apply. MacPerl
+(for Classic environments) comes with a simple editor.
+Popular external editors are BBEdit ( http://www.bbedit.com/ )
+or Alpha ( http://www.kelehers.org/alpha/ ). MacOS X users can
+use Unix editors as well.
=over 4
@@ -214,12 +290,16 @@ http://www.gnu.org/software/emacs/windows/ntemacs.html
=item MicroEMACS
-http://members.nbci.com/uemacs/
+http://www.microemacs.de/
=item XEmacs
http://www.xemacs.org/Download/index.html
+=item Jed
+
+http://space.mit.edu/~davis/jed/
+
=back
or a vi clone such as
@@ -232,20 +312,19 @@ ftp://ftp.cs.pdx.edu/pub/elvis/ http://www.fh-wedel.de/elvis/
=item Vile
-http://vile.cx/
+http://dickey.his.com/vile/vile.html
=item Vim
http://www.vim.org/
-win32: http://www.cs.vu.nl/%7Etmgil/vi.html
-
=back
For vi lovers in general, Windows or elsewhere:
-http://www.thomer.com/thomer/vi/vi.html.
-nvi (http://www.bostic.com/vi/, available from CPAN in src/misc/) is
+ http://www.thomer.com/thomer/vi/vi.html
+
+nvi ( http://www.bostic.com/vi/ , available from CPAN in src/misc/) is
yet another vi clone, unfortunately not available for Windows, but in
UNIX platforms you might be interested in trying it out, firstly because
strictly speaking it is not a vi clone, it is the real vi, or the new
@@ -273,9 +352,9 @@ http://www.slickedit.com/
There is also a toyedit Text widget based editor written in Perl
that is distributed with the Tk module on CPAN. The ptkdb
-(http://world.std.com/~aep/ptkdb/) is a Perl/tk based debugger that
+( http://world.std.com/~aep/ptkdb/ ) is a Perl/tk based debugger that
acts as a development environment of sorts. Perl Composer
-(http://perlcomposer.sourceforge.net/vperl.html) is an IDE for Perl/Tk
+( http://perlcomposer.sourceforge.net/ ) is an IDE for Perl/Tk
GUI creation.
In addition to an editor/IDE you might be interested in a more
@@ -285,21 +364,21 @@ powerful shell environment for Win32. Your options include
=item Bash
-from the Cygwin package (http://sources.redhat.com/cygwin/)
+from the Cygwin package ( http://sources.redhat.com/cygwin/ )
=item Ksh
-from the MKS Toolkit (http://www.mks.com/), or the Bourne shell of
-the U/WIN environment (http://www.research.att.com/sw/tools/uwin/)
+from the MKS Toolkit ( http://www.mks.com/ ), or the Bourne shell of
+the U/WIN environment ( http://www.research.att.com/sw/tools/uwin/ )
=item Tcsh
-ftp://ftp.astron.com/pub/tcsh/, see also
+ftp://ftp.astron.com/pub/tcsh/ , see also
http://www.primate.wisc.edu/software/csh-tcsh-book/
=item Zsh
-ftp://ftp.blarg.net/users/amol/zsh/, see also http://www.zsh.org/
+ftp://ftp.blarg.net/users/amol/zsh/ , see also http://www.zsh.org/
=back
@@ -323,26 +402,26 @@ no 32k limit).
=item BBEdit and BBEdit Lite
are text editors for Mac OS that have a Perl sensitivity mode
-(http://web.barebones.com/).
+( http://web.barebones.com/ ).
=item Alpha
is an editor, written and extensible in Tcl, that nonetheless has
built in support for several popular markup and programming languages
-including Perl and HTML (http://alpha.olm.net/).
+including Perl and HTML ( http://alpha.olm.net/ ).
=back
Pepper and Pe are programming language sensitive text editors for Mac
-OS X and BeOS respectively (http://www.hekkelman.com/).
+OS X and BeOS respectively ( http://www.hekkelman.com/ ).
=head2 Where can I get Perl macros for vi?
For a complete version of Tom Christiansen's vi configuration file,
-see http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/toms.exrc.gz ,
+see http://www.cpan.org/authors/Tom_Christiansen/scripts/toms.exrc.gz ,
the standard benchmark file for vi emulators. The file runs best with nvi,
the current version of vi out of Berkeley, which incidentally can be built
-with an embedded Perl interpreter--see http://www.perl.com/CPAN/src/misc.
+with an embedded Perl interpreter--see http://www.cpan.org/src/misc/ .
=head2 Where can I get perl-mode for emacs?
@@ -363,7 +442,7 @@ shouldn't be an issue.
The Curses module from CPAN provides a dynamically loadable object
module interface to a curses library. A small demo can be found at the
-directory http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/rep;
+directory http://www.cpan.org/authors/Tom_Christiansen/scripts/rep.gz ;
this program repeats a command and updates the screen as needed, rendering
B<rep ps axu> similar to B<top>.
@@ -372,65 +451,51 @@ B<rep ps axu> similar to B<top>.
Tk is a completely Perl-based, object-oriented interface to the Tk toolkit
that doesn't force you to use Tcl just to get at Tk. Sx is an interface
to the Athena Widget set. Both are available from CPAN. See the
-directory http://www.perl.com/CPAN/modules/by-category/08_User_Interfaces/
+directory http://www.cpan.org/modules/by-category/08_User_Interfaces/
Invaluable for Perl/Tk programming are the Perl/Tk FAQ at
http://w4.lns.cornell.edu/%7Epvhp/ptk/ptkTOC.html , the Perl/Tk Reference
Guide available at
-http://www.perl.com/CPAN-local/authors/Stephen_O_Lidie/ , and the
+http://www.cpan.org/authors/Stephen_O_Lidie/ , and the
online manpages at
http://www-users.cs.umn.edu/%7Eamundson/perl/perltk/toc.html .
=head2 How can I generate simple menus without using CGI or Tk?
-The http://www.perl.com/CPAN/authors/id/SKUNZ/perlmenu.v4.0.tar.gz
+The http://www.cpan.org/authors/id/SKUNZ/perlmenu.v4.0.tar.gz
module, which is curses-based, can help with this.
-=head2 What is undump?
-
-See the next question on ``How can I make my Perl program run faster?''
-
=head2 How can I make my Perl program run faster?
The best way to do this is to come up with a better algorithm. This
can often make a dramatic difference. Jon Bentley's book
-``Programming Pearls'' (that's not a misspelling!) has some good tips
+I<Programming Pearls> (that's not a misspelling!) has some good tips
on optimization, too. Advice on benchmarking boils down to: benchmark
and profile to make sure you're optimizing the right part, look for
better algorithms instead of microtuning your code, and when all else
-fails consider just buying faster hardware.
+fails consider just buying faster hardware. You will probably want to
+read the answer to the earlier question ``How do I profile my Perl
+programs?'' if you haven't done so already.
A different approach is to autoload seldom-used Perl code. See the
AutoSplit and AutoLoader modules in the standard distribution for
that. Or you could locate the bottleneck and think about writing just
that part in C, the way we used to take bottlenecks in C code and
-write them in assembler. Similar to rewriting in C,
-modules that have critical sections can be written in C (for instance, the
-PDL module from CPAN).
-
-In some cases, it may be worth it to use the backend compiler to
-produce byte code (saving compilation time) or compile into C, which
-will certainly save compilation time and sometimes a small amount (but
-not much) execution time. See the question about compiling your Perl
-programs for more on the compiler--the wins aren't as obvious as you'd
-hope.
-
-If you're currently linking your perl executable to a shared I<libc.so>,
-you can often gain a 10-25% performance benefit by rebuilding it to
-link with a static libc.a instead. This will make a bigger perl
-executable, but your Perl programs (and programmers) may thank you for
-it. See the F<INSTALL> file in the source distribution for more
-information.
-
-Unsubstantiated reports allege that Perl interpreters that use sfio
-outperform those that don't (for I/O intensive applications). To try
-this, see the F<INSTALL> file in the source distribution, especially
-the ``Selecting File I/O mechanisms'' section.
-
-The undump program was an old attempt to speed up your Perl program
-by storing the already-compiled form to disk. This is no longer
-a viable option, as it only worked on a few architectures, and
-wasn't a good solution anyway.
+write them in assembler. Similar to rewriting in C, modules that have
+critical sections can be written in C (for instance, the PDL module
+from CPAN).
+
+If you're currently linking your perl executable to a shared
+I<libc.so>, you can often gain a 10-25% performance benefit by
+rebuilding it to link with a static libc.a instead. This will make a
+bigger perl executable, but your Perl programs (and programmers) may
+thank you for it. See the F<INSTALL> file in the source distribution
+for more information.
+
+The undump program was an ancient attempt to speed up Perl program by
+storing the already-compiled form to disk. This is no longer a viable
+option, as it only worked on a few architectures, and wasn't a good
+solution anyway.
=head2 How can I make my Perl program take less memory?
@@ -457,16 +522,112 @@ Information about malloc is in the F<INSTALL> file in the source
distribution. You can find out whether you are using perl's malloc by
typing C<perl -V:usemymalloc>.
-=head2 Is it unsafe to return a pointer to local data?
+Of course, the best way to save memory is to not do anything to waste
+it in the first place. Good programming practices can go a long way
+toward this:
+
+=over 4
+
+=item * Don't slurp!
+
+Don't read an entire file into memory if you can process it line
+by line. Or more concretely, use a loop like this:
+
+ #
+ # Good Idea
+ #
+ while (<FILE>) {
+ # ...
+ }
+
+instead of this:
-No, Perl's garbage collection system takes care of this.
+ #
+ # Bad Idea
+ #
+ @data = <FILE>;
+ foreach (@data) {
+ # ...
+ }
+
+When the files you're processing are small, it doesn't much matter which
+way you do it, but it makes a huge difference when they start getting
+larger.
+
+=item * Use map and grep selectively
+
+Remember that both map and grep expect a LIST argument, so doing this:
+
+ @wanted = grep {/pattern/} <FILE>;
+
+will cause the entire file to be slurped. For large files, it's better
+to loop:
+
+ while (<FILE>) {
+ push(@wanted, $_) if /pattern/;
+ }
+
+=item * Avoid unnecessary quotes and stringification
+
+Don't quote large strings unless absolutely necessary:
+
+ my $copy = "$large_string";
+
+makes 2 copies of $large_string (one for $copy and another for the
+quotes), whereas
+
+ my $copy = $large_string;
+
+only makes one copy.
+
+Ditto for stringifying large arrays:
+
+ {
+ local $, = "\n";
+ print @big_array;
+ }
+
+is much more memory-efficient than either
+
+ print join "\n", @big_array;
+
+or
+
+ {
+ local $" = "\n";
+ print "@big_array";
+ }
+
+
+=item * Pass by reference
+
+Pass arrays and hashes by reference, not by value. For one thing, it's
+the only way to pass multiple lists or hashes (or both) in a single
+call/return. It also avoids creating a copy of all the contents. This
+requires some judgment, however, because any changes will be propagated
+back to the original data. If you really want to mangle (er, modify) a
+copy, you'll have to sacrifice the memory needed to make one.
+
+=item * Tie large variables to disk.
+
+For "big" data stores (i.e. ones that exceed available memory) consider
+using one of the DB modules to store it on disk instead of in RAM. This
+will incur a penalty in access time, but that's probably better than
+causing your hard disk to thrash due to massive swapping.
+
+=back
+
+=head2 Is it safe to return a reference to local or lexical data?
+
+Yes. Perl's garbage collection system takes care of this so
+everything works out right.
sub makeone {
my @a = ( 1 .. 10 );
return \@a;
}
- for $i ( 1 .. 10 ) {
+ for ( 1 .. 10 ) {
push @many, makeone();
}
@@ -476,17 +637,14 @@ No, Perl's garbage collection system takes care of this.
=head2 How can I free an array or hash so my program shrinks?
-You can't. On most operating systems, memory allocated to a program
-can never be returned to the system. That's why long-running programs
-sometimes re-exec themselves. Some operating systems (notably,
-FreeBSD and Linux) allegedly reclaim large chunks of memory that is no
-longer used, but it doesn't appear to happen with Perl (yet). The Mac
-appears to be the only platform that will reliably (albeit, slowly)
-return memory to the OS.
-
-We've had reports that on Linux (Redhat 5.1) on Intel, C<undef
-$scalar> will return memory to the system, while on Solaris 2.6 it
-won't. In general, try it yourself and see.
+You usually can't. On most operating systems, memory
+allocated to a program can never be returned to the system.
+That's why long-running programs sometimes re-exec
+themselves. Some operating systems (notably, systems that
+use mmap(2) for allocating large chunks of memory) can
+reclaim memory that is no longer used, but on such systems,
+perl must be configured and compiled to use the OS's malloc,
+not perl's.
However, judicious use of my() on your variables will help make sure
that they go out of scope so that Perl can free up that space for
@@ -508,7 +666,7 @@ you> because the process start-up overhead is where the bottleneck is.
There are two popular ways to avoid this overhead. One solution
involves running the Apache HTTP server (available from
-http://www.apache.org/) with either of the mod_perl or mod_fastcgi
+http://www.apache.org/ ) with either of the mod_perl or mod_fastcgi
plugin modules.
With mod_perl and the Apache::Registry module (distributed with
@@ -520,14 +678,14 @@ anything a module written in C can. For more on mod_perl, see
http://perl.apache.org/
With the FCGI module (from CPAN) and the mod_fastcgi
-module (available from http://www.fastcgi.com/) each of your Perl
+module (available from http://www.fastcgi.com/ ) each of your Perl
programs becomes a permanent CGI daemon process.
Both of these solutions can have far-reaching effects on your system
and on the way you write your CGI programs, so investigate them with
care.
-See http://www.perl.com/CPAN/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/ .
+See http://www.cpan.org/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/ .
A non-free, commercial product, ``The Velocity Engine for Perl'',
(http://www.binevolve.com/ or http://www.binevolve.com/velocigen/ )
@@ -557,14 +715,21 @@ determine the insecure things and exploit them without viewing the
source. Security through obscurity, the name for hiding your bugs
instead of fixing them, is little security indeed.
-You can try using encryption via source filters (Filter::* from CPAN),
-but any decent programmer will be able to decrypt it. You can try using
-the byte code compiler and interpreter described below, but the curious
-might still be able to de-compile it. You can try using the native-code
-compiler described below, but crackers might be able to disassemble it.
-These pose varying degrees of difficulty to people wanting to get at
-your code, but none can definitively conceal it (true of every
-language, not just Perl).
+You can try using encryption via source filters (Starting from Perl
+5.8 the Filter::Simple and Filter::Util::Call modules are included in
+the standard distribution), but any decent programmer will be able to
+decrypt it. You can try using the byte code compiler and interpreter
+described below, but the curious might still be able to de-compile it.
+You can try using the native-code compiler described below, but
+crackers might be able to disassemble it. These pose varying degrees
+of difficulty to people wanting to get at your code, but none can
+definitively conceal it (true of every language, not just Perl).
+
+It is very easy to recover the source of Perl programs. You simply
+feed the program to the perl interpreter and use the modules in
+the B:: hierarchy. The B::Deparse module should be able to
+defeat most attempts to hide source. Again, this is not
+unique to Perl.
If you're concerned about people profiting from your code, then the
bottom line is that nothing but a restrictive license will give you
@@ -630,8 +795,8 @@ For OS/2 just use
as the first line in C<*.cmd> file (C<-S> due to a bug in cmd.exe's
`extproc' handling). For DOS one should first invent a corresponding
-batch file and codify it in C<ALTERNATIVE_SHEBANG> (see the
-F<INSTALL> file in the source distribution for more information).
+batch file and codify it in C<ALTERNATE_SHEBANG> (see the
+F<dosish.h> file in the source distribution for more information).
The Win95/NT installation, when using the ActiveState port of Perl,
will modify the Registry to associate the C<.pl> extension with the
@@ -696,6 +861,9 @@ For example:
print "Hello world\n"
(then Run "Myscript" or Shift-Command-R)
+ # MPW
+ perl -e 'print "Hello world\n"'
+
# VMS
perl -e "print ""Hello world\n"""
@@ -714,8 +882,7 @@ characters as control characters.
Using qq(), q(), and qx(), instead of "double quotes", 'single
quotes', and `backticks`, may make one-liners easier to write.
-There is no general solution to all of this. It is a mess, pure and
-simple. Sucks to be away from Unix, huh? :-)
+There is no general solution to all of this. It is a mess.
[Some of this answer was contributed by Kenneth Albanowski.]
@@ -725,36 +892,21 @@ For modules, get the CGI or LWP modules from CPAN. For textbooks,
see the two especially dedicated to web stuff in the question on
books. For problems and questions related to the web, like ``Why
do I get 500 Errors'' or ``Why doesn't it run from the browser right
-when it runs fine on the command line'', see these sources:
-
- WWW Security FAQ
- http://www.w3.org/Security/Faq/
-
- Web FAQ
- http://www.boutell.com/faq/
-
- CGI FAQ
- http://www.webthing.com/tutorials/cgifaq.html
+when it runs fine on the command line'', see the troubleshooting
+guides and references in L<perlfaq9> or in the CGI MetaFAQ:
- HTTP Spec
- http://www.w3.org/pub/WWW/Protocols/HTTP/
-
- HTML Spec
- http://www.w3.org/TR/REC-html40/
- http://www.w3.org/pub/WWW/MarkUp/
-
- CGI Spec
- http://www.w3.org/CGI/
-
- CGI Security FAQ
- http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt
+ http://www.perl.org/CGI_MetaFAQ.html
=head2 Where can I learn about object-oriented Perl programming?
A good place to start is L<perltoot>, and you can use L<perlobj>,
-L<perlboot>, and L<perlbot> for reference. Perltoot didn't come out
-until the 5.004 release; you can get a copy (in pod, html, or
-postscript) from http://www.perl.com/CPAN/doc/FMTEYEWTK/ .
+L<perlboot>, L<perltoot>, L<perltooc>, and L<perlbot> for reference.
+(If you are using really old Perl, you may not have all of these,
+try http://www.perldoc.com/ , but consider upgrading your perl.)
+
+A good book on OO on Perl is the "Object-Oriented Perl"
+by Damian Conway from Manning Publications,
+http://www.manning.com/Conway/index.html
=head2 Where can I learn about linking C with Perl? [h2xs, xsubpp]
@@ -773,8 +925,7 @@ the tests pass, read the pods again and again and again. If they
fail, see L<perlbug> and send a bug report with the output of
C<make test TEST_VERBOSE=1> along with C<perl -V>.
-=head2 When I tried to run my script, I got this message. What does it
-mean?
+=head2 When I tried to run my script, I got this message. What does it mean?
A complete list of Perl's error messages and warnings with explanatory
text can be found in L<perldiag>. You can also use the splain program
@@ -799,13 +950,11 @@ information, see L<ExtUtils::MakeMaker>.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
+Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
All rights reserved.
-When included as an integrated part of the Standard Distribution
-of Perl or of its documentation (printed or otherwise), this works is
-covered under Perl's Artistic License. For separate distributions of
-all or part of this FAQ outside of that, see L<perlfaq>.
+This documentation is free; you can redistribute it and/or modify it
+under the same terms as Perl itself.
Irrespective of its distribution, all code examples here are in the public
domain. You are permitted and encouraged to use this code and any
diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod
index 8c570c2683..61503b6c57 100644
--- a/pod/perlfaq4.pod
+++ b/pod/perlfaq4.pod
@@ -1,64 +1,83 @@
=head1 NAME
-perlfaq4 - Data Manipulation ($Revision: 1.49 $, $Date: 1999/05/23 20:37:49 $)
+perlfaq4 - Data Manipulation ($Revision: 1.52 $, $Date: 2003/10/02 04:44:33 $)
=head1 DESCRIPTION
-The section of the FAQ answers questions related to the manipulation
-of data as numbers, dates, strings, arrays, hashes, and miscellaneous
-data issues.
+This section of the FAQ answers questions related to manipulating
+numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
=head1 Data: Numbers
=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
-The infinite set that a mathematician thinks of as the real numbers can
-only be approximated on a computer, since the computer only has a finite
-number of bits to store an infinite number of, um, numbers.
-
-Internally, your computer represents floating-point numbers in binary.
-Floating-point numbers read in from a file or appearing as literals
-in your program are converted from their decimal floating-point
-representation (eg, 19.95) to an internal binary representation.
-
-However, 19.95 can't be precisely represented as a binary
-floating-point number, just like 1/3 can't be exactly represented as a
-decimal floating-point number. The computer's binary representation
-of 19.95, therefore, isn't exactly 19.95.
-
-When a floating-point number gets printed, the binary floating-point
-representation is converted back to decimal. These decimal numbers
-are displayed in either the format you specify with printf(), or the
-current output format for numbers. (See L<perlvar/"$#"> if you use
-print. C<$#> has a different default value in Perl5 than it did in
-Perl4. Changing C<$#> yourself is deprecated.)
-
-This affects B<all> computer languages that represent decimal
-floating-point numbers in binary, not just Perl. Perl provides
-arbitrary-precision decimal numbers with the Math::BigFloat module
-(part of the standard Perl distribution), but mathematical operations
-are consequently slower.
-
-To get rid of the superfluous digits, just use a format (eg,
-C<printf("%.2f", 19.95)>) to get the required precision.
-See L<perlop/"Floating-point Arithmetic">.
+Internally, your computer represents floating-point numbers
+in binary. Digital (as in powers of two) computers cannot
+store all numbers exactly. Some real numbers lose precision
+in the process. This is a problem with how computers store
+numbers and affects all computer languages, not just Perl.
+
+L<perlnumber> show the gory details of number
+representations and conversions.
+
+To limit the number of decimal places in your numbers, you
+can use the printf or sprintf function. See the
+L<"Floating Point Arithmetic"|perlop> for more details.
+
+ printf "%.2f", 10/3;
+
+ my $number = sprintf "%.2f", 10/3;
+
+=head2 Why is int() broken?
+
+Your int() is most probably working just fine. It's the numbers that
+aren't quite what you think.
+
+First, see the above item "Why am I getting long decimals
+(eg, 19.9499999999999) instead of the numbers I should be getting
+(eg, 19.95)?".
+
+For example, this
+
+ print int(0.6/0.2-2), "\n";
+
+will in most computers print 0, not 1, because even such simple
+numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
+numbers. What you think in the above as 'three' is really more like
+2.9999999999999995559.
=head2 Why isn't my octal data interpreted correctly?
-Perl only understands octal and hex numbers as such when they occur
-as literals in your program. If they are read in from somewhere and
-assigned, no automatic conversion takes place. You must explicitly
-use oct() or hex() if you want the values converted. oct() interprets
-both hex ("0x350") numbers and octal ones ("0350" or even without the
-leading "0", like "377"), while hex() only converts hexadecimal ones,
-with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
+Perl only understands octal and hex numbers as such when they occur as
+literals in your program. Octal literals in perl must start with a
+leading "0" and hexadecimal literals must start with a leading "0x".
+If they are read in from somewhere and assigned, no automatic
+conversion takes place. You must explicitly use oct() or hex() if you
+want the values converted to decimal. oct() interprets hex ("0x350"),
+octal ("0350" or even without the leading "0", like "377") and binary
+("0b1010") numbers, while hex() only converts hexadecimal ones, with
+or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
+The inverse mapping from decimal to octal can be done with either the
+"%o" or "%O" sprintf() formats.
This problem shows up most often when people try using chmod(), mkdir(),
-umask(), or sysopen(), which all want permissions in octal.
+umask(), or sysopen(), which by widespread tradition typically take
+permissions in octal.
- chmod(644, $file); # WRONG -- perl -w catches this
+ chmod(644, $file); # WRONG
chmod(0644, $file); # right
+Note the mistake in the first line was specifying the decimal literal
+644, rather than the intended octal literal 0644. The problem can
+be seen with:
+
+ printf("%#o",644); # prints 01204
+
+Surely you had not intended C<chmod(01204, $file);> - did you? If you
+want to use numeric literals as arguments to chmod() et al. then please
+try to express them as octal constants, that is with a leading zero and
+with the following digits restricted to the set 0..7.
+
=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
Remember that int() merely truncates toward 0. For rounding to a
@@ -93,7 +112,7 @@ alternation:
for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
- 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
+ 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
0.8 0.8 0.9 0.9 1.0 1.0
Don't blame Perl. It's the same as in C. IEEE says we have to do this.
@@ -101,24 +120,143 @@ Perl numbers whose absolute values are integers under 2**31 (on 32 bit
machines) will work pretty much like mathematical integers. Other numbers
are not guaranteed.
-=head2 How do I convert bits into ints?
+=head2 How do I convert between numeric representations/bases/radixes?
+
+As always with Perl there is more than one way to do it. Below
+are a few examples of approaches to making common conversions
+between number representations. This is intended to be representational
+rather than exhaustive.
+
+Some of the examples below use the Bit::Vector module from CPAN.
+The reason you might choose Bit::Vector over the perl built in
+functions is that it works with numbers of ANY size, that it is
+optimized for speed on some operations, and for at least some
+programmers the notation might be familiar.
+
+=over 4
+
+=item How do I convert hexadecimal into decimal
+
+Using perl's built in conversion of 0x notation:
+
+ $dec = 0xDEADBEEF;
+
+Using the hex function:
+
+ $dec = hex("DEADBEEF");
+
+Using pack:
+
+ $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
+
+Using the CPAN module Bit::Vector:
+
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
+ $dec = $vec->to_Dec();
+
+=item How do I convert from decimal to hexadecimal
+
+Using sprintf:
+
+ $hex = sprintf("%X", 3735928559); # upper case A-F
+ $hex = sprintf("%x", 3735928559); # lower case a-f
+
+Using unpack:
+
+ $hex = unpack("H*", pack("N", 3735928559));
+
+Using Bit::Vector:
+
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(32, -559038737);
+ $hex = $vec->to_Hex();
-To turn a string of 1s and 0s like C<10110110> into a scalar containing
-its binary value, use the pack() and unpack() functions (documented in
-L<perlfunc/"pack"> and L<perlfunc/"unpack">):
+And Bit::Vector supports odd bit counts:
- $decimal = unpack('c', pack('B8', '10110110'));
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(33, 3735928559);
+ $vec->Resize(32); # suppress leading 0 if unwanted
+ $hex = $vec->to_Hex();
-This packs the string C<10110110> into an eight bit binary structure.
-This is then unpacked as a character, which returns its ordinal value.
+=item How do I convert from octal to decimal
-This does the same thing:
+Using Perl's built in conversion of numbers with leading zeros:
+
+ $dec = 033653337357; # note the leading 0!
+
+Using the oct function:
+
+ $dec = oct("33653337357");
+
+Using Bit::Vector:
+
+ use Bit::Vector;
+ $vec = Bit::Vector->new(32);
+ $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
+ $dec = $vec->to_Dec();
+
+=item How do I convert from decimal to octal
+
+Using sprintf:
+
+ $oct = sprintf("%o", 3735928559);
+
+Using Bit::Vector:
+
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(32, -559038737);
+ $oct = reverse join('', $vec->Chunk_List_Read(3));
+
+=item How do I convert from binary to decimal
+
+Perl 5.6 lets you write binary numbers directly with
+the 0b notation:
+
+ $number = 0b10110110;
+
+Using oct:
+
+ my $input = "10110110";
+ $decimal = oct( "0b$input" );
+
+Using pack and ord:
$decimal = ord(pack('B8', '10110110'));
-Here's an example of going the other way:
+Using pack and unpack for larger strings:
- $binary_string = unpack('B*', "\x29");
+ $int = unpack("N", pack("B32",
+ substr("0" x 32 . "11110101011011011111011101111", -32)));
+ $dec = sprintf("%d", $int);
+
+ # substr() is used to left pad a 32 character string with zeros.
+
+Using Bit::Vector:
+
+ $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
+ $dec = $vec->to_Dec();
+
+=item How do I convert from decimal to binary
+
+Using sprintf (perl 5.6+):
+
+ $bin = sprintf("%b", 3735928559);
+
+Using unpack:
+
+ $bin = unpack("B*", pack("N", 3735928559));
+
+Using Bit::Vector:
+
+ use Bit::Vector;
+ $vec = Bit::Vector->new_Dec(32, -559038737);
+ $bin = $vec->to_Bin();
+
+The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
+are left as an exercise to the inclined reader.
+
+=back
=head2 Why doesn't & work the way I want it to?
@@ -129,7 +267,7 @@ C<00110011>). The operators work with the binary form of a number
(the number C<3> is treated as the bit pattern C<00000011>).
So, saying C<11 & 3> performs the "and" operation on numbers (yielding
-C<1>). Saying C<"11" & "3"> performs the "and" operation on strings
+C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
(yielding C<"1">).
Most problems with C<&> and C<|> arise because the programmer thinks
@@ -194,21 +332,25 @@ will not create a list of 500,000 integers.
=head2 How can I output Roman numerals?
-Get the http://www.perl.com/CPAN/modules/by-module/Roman module.
+Get the http://www.cpan.org/modules/by-module/Roman module.
=head2 Why aren't my random numbers random?
If you're using a version of Perl before 5.004, you must call C<srand>
once at the start of your program to seed the random number generator.
+
+ BEGIN { srand() if $] < 5.004 }
+
5.004 and later automatically call C<srand> at the beginning. Don't
-call C<srand> more than once--you make your numbers less random, rather
+call C<srand> more than once---you make your numbers less random, rather
than more.
Computers are good at being predictable and bad at being random
-(despite appearances caused by bugs in your programs :-).
-http://www.perl.com/CPAN/doc/FMTEYEWTK/random , courtesy of Tom
-Phoenix, talks more about this. John von Neumann said, ``Anyone who
-attempts to generate random numbers by deterministic means is, of
+(despite appearances caused by bugs in your programs :-). see the
+F<random> article in the "Far More Than You Ever Wanted To Know"
+collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of
+Tom Phoenix, talks more about this. John von Neumann said, ``Anyone
+who attempts to generate random numbers by deterministic means is, of
course, living in a state of sin.''
If you want numbers that are more random than C<rand> with C<srand>
@@ -218,48 +360,75 @@ random numbers, but this takes quite a while. If you want a better
pseudorandom generator than comes with your operating system, look at
``Numerical Recipes in C'' at http://www.nr.com/ .
+=head2 How do I get a random number between X and Y?
+
+Use the following simple function. It selects a random integer between
+(and possibly including!) the two given integers, e.g.,
+C<random_int_in(50,120)>
+
+ sub random_int_in ($$) {
+ my($min, $max) = @_;
+ # Assumes that the two arguments are integers themselves!
+ return $min if $min == $max;
+ ($min, $max) = ($max, $min) if $min > $max;
+ return $min + int rand(1 + $max - $min);
+ }
+
=head1 Data: Dates
-=head2 How do I find the week-of-the-year/day-of-the-year?
+=head2 How do I find the day or week of the year?
+
+The localtime function returns the day of the week. Without an
+argument localtime uses the current time.
-The day of the year is in the array returned by localtime() (see
-L<perlfunc/"localtime">):
+ $day_of_year = (localtime)[7];
- $day_of_year = (localtime(time()))[7];
+The POSIX module can also format a date as the day of the year or
+week of the year.
-or more legibly (in 5.004 or higher):
+ use POSIX qw/strftime/;
+ my $day_of_year = strftime "%j", localtime;
+ my $week_of_year = strftime "%W", localtime;
- use Time::localtime;
- $day_of_year = localtime(time())->yday;
+To get the day of year for any date, use the Time::Local module to get
+a time in epoch seconds for the argument to localtime.
-You can find the week of the year by dividing this by 7:
+ use POSIX qw/strftime/;
+ use Time::Local;
+ my $week_of_year = strftime "%W",
+ localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) );
- $week_of_year = int($day_of_year / 7);
+The Date::Calc module provides two functions for to calculate these.
-Of course, this believes that weeks start at zero. The Date::Calc
-module from CPAN has a lot of date calculation functions, including
-day of the year, week of the year, and so on. Note that not
-all businesses consider ``week 1'' to be the same; for example,
-American businesses often consider the first week with a Monday
-in it to be Work Week #1, despite ISO 8601, which considers
-WW1 to be the first week with a Thursday in it.
+ use Date::Calc;
+ my $day_of_year = Day_of_Year( 1987, 12, 18 );
+ my $week_of_year = Week_of_Year( 1987, 12, 18 );
=head2 How do I find the current century or millennium?
Use the following simple functions:
- sub get_century {
+ sub get_century {
return int((((localtime(shift || time))[5] + 1999))/100);
- }
- sub get_millennium {
+ }
+ sub get_millennium {
return 1+int((((localtime(shift || time))[5] + 1899))/1000);
- }
+ }
-On some systems, you'll find that the POSIX module's strftime() function
-has been extended in a non-standard way to use a C<%C> format, which they
-sometimes claim is the "century". It isn't, because on most such systems,
-this is only the first two digits of the four-digit year, and thus cannot
-be used to reliably determine the current century or millennium.
+You can also use the POSIX strftime() function which may be a bit
+slower but is easier to read and maintain.
+
+ use POSIX qw/strftime/;
+
+ my $week_of_the_year = strftime "%W", localtime;
+ my $day_of_the_year = strftime "%j", localtime;
+
+On some systems, the POSIX module's strftime() function has
+been extended in a non-standard way to use a C<%C> format,
+which they sometimes claim is the "century". It isn't,
+because on most such systems, this is only the first two
+digits of the four-digit year, and thus cannot be used to
+reliably determine the current century or millennium.
=head2 How can I compare two dates and find the difference?
@@ -285,76 +454,80 @@ and Date::Manip modules from CPAN.
Use the Time::JulianDay module (part of the Time-modules bundle
available from CPAN.)
-Before you immerse yourself too deeply in this, be sure to verify that it
-is the I<Julian> Day you really want. Are you really just interested in
-a way of getting serial days so that they can do date arithmetic? If you
+Before you immerse yourself too deeply in this, be sure to verify that
+it is the I<Julian> Day you really want. Are you interested in a way
+of getting serial days so that you just can tell how many days they
+are apart or so that you can do also other date arithmetic? If you
are interested in performing date arithmetic, this can be done using
-either Date::Manip or Date::Calc, without converting to Julian Day first.
-
-There is too much confusion on this issue to cover in this FAQ, but the
-term is applied (correctly) to a calendar now supplanted by the Gregorian
-Calendar, with the Julian Calendar failing to adjust properly for leap
-years on centennial years (among other annoyances). The term is also used
-(incorrectly) to mean: [1] days in the Gregorian Calendar; and [2] days
-since a particular starting time or `epoch', usually 1970 in the Unix
-world and 1980 in the MS-DOS/Windows world. If you find that it is not
-the first meaning that you really want, then check out the Date::Manip
-and Date::Calc modules. (Thanks to David Cassell for most of this text.)
+modules Date::Manip or Date::Calc.
+
+There is too many details and much confusion on this issue to cover in
+this FAQ, but the term is applied (correctly) to a calendar now
+supplanted by the Gregorian Calendar, with the Julian Calendar failing
+to adjust properly for leap years on centennial years (among other
+annoyances). The term is also used (incorrectly) to mean: [1] days in
+the Gregorian Calendar; and [2] days since a particular starting time
+or `epoch', usually 1970 in the Unix world and 1980 in the
+MS-DOS/Windows world. If you find that it is not the first meaning
+that you really want, then check out the Date::Manip and Date::Calc
+modules. (Thanks to David Cassell for most of this text.)
=head2 How do I find yesterday's date?
-The C<time()> function returns the current time in seconds since the
-epoch. Take twenty-four hours off that:
+If you only need to find the date (and not the same time), you
+can use the Date::Calc module.
+
+ use Date::Calc qw(Today Add_Delta_Days);
- $yesterday = time() - ( 24 * 60 * 60 );
+ my @date = Add_Delta_Days( Today(), -1 );
-Then you can pass this to C<localtime()> and get the individual year,
-month, day, hour, minute, seconds values.
+ print "@date\n";
-Note very carefully that the code above assumes that your days are
-twenty-four hours each. For most people, there are two days a year
-when they aren't: the switch to and from summer time throws this off.
-A solution to this issue is offered by Russ Allbery.
+Most people try to use the time rather than the calendar to
+figure out dates, but that assumes that your days are
+twenty-four hours each. For most people, there are two days
+a year when they aren't: the switch to and from summer time
+throws this off. Russ Allbery offers this solution.
sub yesterday {
- my $now = defined $_[0] ? $_[0] : time;
- my $then = $now - 60 * 60 * 24;
- my $ndst = (localtime $now)[8] > 0;
- my $tdst = (localtime $then)[8] > 0;
- $then - ($tdst - $ndst) * 60 * 60;
- }
- # Should give you "this time yesterday" in seconds since epoch relative to
- # the first argument or the current time if no argument is given and
- # suitable for passing to localtime or whatever else you need to do with
- # it. $ndst is whether we're currently in daylight savings time; $tdst is
- # whether the point 24 hours ago was in daylight savings time. If $tdst
- # and $ndst are the same, a boundary wasn't crossed, and the correction
- # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
- # from yesterday's time since we gained an extra hour while going off
- # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
- # negative hour (add an hour) to yesterday's time since we lost an hour.
- #
- # All of this is because during those days when one switches off or onto
- # DST, a "day" isn't 24 hours long; it's either 23 or 25.
- #
- # The explicit settings of $ndst and $tdst are necessary because localtime
- # only says it returns the system tm struct, and the system tm struct at
- # least on Solaris doesn't guarantee any particular positive value (like,
- # say, 1) for isdst, just a positive value. And that value can
- # potentially be negative, if DST information isn't available (this sub
- # just treats those cases like no DST).
- #
- # Note that between 2am and 3am on the day after the time zone switches
- # off daylight savings time, the exact hour of "yesterday" corresponding
- # to the current hour is not clearly defined. Note also that if used
- # between 2am and 3am the day after the change to daylight savings time,
- # the result will be between 3am and 4am of the previous day; it's
- # arguable whether this is correct.
- #
- # This sub does not attempt to deal with leap seconds (most things don't).
- #
- # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu>
- # This code is in the public domain
+ my $now = defined $_[0] ? $_[0] : time;
+ my $then = $now - 60 * 60 * 24;
+ my $ndst = (localtime $now)[8] > 0;
+ my $tdst = (localtime $then)[8] > 0;
+ $then - ($tdst - $ndst) * 60 * 60;
+ }
+
+Should give you "this time yesterday" in seconds since epoch relative to
+the first argument or the current time if no argument is given and
+suitable for passing to localtime or whatever else you need to do with
+it. $ndst is whether we're currently in daylight savings time; $tdst is
+whether the point 24 hours ago was in daylight savings time. If $tdst
+and $ndst are the same, a boundary wasn't crossed, and the correction
+will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
+from yesterday's time since we gained an extra hour while going off
+daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
+negative hour (add an hour) to yesterday's time since we lost an hour.
+
+All of this is because during those days when one switches off or onto
+DST, a "day" isn't 24 hours long; it's either 23 or 25.
+
+The explicit settings of $ndst and $tdst are necessary because localtime
+only says it returns the system tm struct, and the system tm struct at
+least on Solaris doesn't guarantee any particular positive value (like,
+say, 1) for isdst, just a positive value. And that value can
+potentially be negative, if DST information isn't available (this sub
+just treats those cases like no DST).
+
+Note that between 2am and 3am on the day after the time zone switches
+off daylight savings time, the exact hour of "yesterday" corresponding
+to the current hour is not clearly defined. Note also that if used
+between 2am and 3am the day after the change to daylight savings time,
+the result will be between 3am and 4am of the previous day; it's
+arguable whether this is correct.
+
+This sub does not attempt to deal with leap seconds (most things don't).
+
+
=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
@@ -422,14 +595,6 @@ a subroutine call (in list context) into a string:
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
-If you prefer scalar context, similar chicanery is also useful for
-arbitrary expressions:
-
- print "That yields ${\($n + 5)} widgets\n";
-
-Version 5.004 of Perl had a bug that gave list context to the
-expression in C<${...}>, but this is fixed in version 5.005.
-
See also ``How can I expand variables in text strings?'' in this
section of the FAQ.
@@ -440,20 +605,22 @@ matter how complicated. To find something between two single
characters, a pattern like C</x([^x]*)x/> will get the intervening
bits in $1. For multiple ones, then something more like
C</alpha(.*?)omega/> would be needed. But none of these deals with
-nested patterns, nor can they. For that you'll have to write a
-parser.
+nested patterns. For balanced expressions using C<(>, C<{>, C<[>
+or C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
+L<perlre/(??{ code })>. For other cases, you'll have to write a parser.
If you are serious about writing a parser, there are a number of
modules or oddities that will make your life a lot easier. There are
the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced;
-and the byacc program.
+and the byacc program. Starting from perl 5.8 the Text::Balanced
+is part of the standard distribution.
One simple destructive, inside-out approach that you might try is to
pull out the smallest nesting parts one at a time:
while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
# do something with $1
- }
+ }
A more complicated and sneaky approach is to make Perl's regular
expression engine do it for you. This is courtesy Dean Inada, and
@@ -467,7 +634,7 @@ really does work:
@( = ('(','');
@) = (')','');
($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
- @$ = (eval{/$re/},$@!~/unmatched/);
+ @$ = (eval{/$re/},$@!~/unmatched/i);
print join("\n",@$[0..$#$]) if( $$[-1] );
=head2 How do I reverse a string?
@@ -499,22 +666,33 @@ Use Text::Wrap (part of the standard Perl distribution):
The paragraphs you give to Text::Wrap should not contain embedded
newlines. Text::Wrap doesn't justify the lines (flush-right).
-=head2 How can I access/change the first N letters of a string?
+Or use the CPAN module Text::Autoformat. Formatting files can be easily
+done by making a shell alias, like so:
+
+ alias fmt="perl -i -MText::Autoformat -n0777 \
+ -e 'print autoformat $_, {all=>1}' $*"
+
+See the documentation for Text::Autoformat to appreciate its many
+capabilities.
+
+=head2 How can I access or change N characters of a string?
+
+You can access the first characters of a string with substr().
+To get the first character, for example, start at position 0
+and grab the string of length 1.
-There are many ways. If you just want to grab a copy, use
-substr():
- $first_byte = substr($a, 0, 1);
+ $string = "Just another Perl Hacker";
+ $first_char = substr( $string, 0, 1 ); # 'J'
-If you want to modify part of a string, the simplest way is often to
-use substr() as an lvalue:
+To change part of a string, you can use the optional fourth
+argument which is the replacement string.
- substr($a, 0, 3) = "Tom";
+ substr( $string, 13, 4, "Perl 5.8.0" );
-Although those with a pattern matching kind of thought process will
-likely prefer
+You can also use substr() as an lvalue.
- $a =~ s/^.../Tom/;
+ substr( $string, 13, 4 ) = "Perl 5.8.0";
=head2 How do I change the Nth occurrence of something?
@@ -567,6 +745,11 @@ integers:
while ($string =~ /-\d+/g) { $count++ }
print "There are $count negative numbers in the string";
+Another version uses a global match in list context, then assigns the
+result to a scalar, producing a count of the number of matches.
+
+ $count = () = $string =~ /-\d+/g;
+
=head2 How do I capitalize all the words on one line?
To make the first letter of each word upper case:
@@ -575,7 +758,7 @@ To make the first letter of each word upper case:
This has the strange effect of turning "C<don't do it>" into "C<Don'T
Do It>". Sometimes you might want this. Other times you might need a
-more thorough solution (Suggested by brian d. foy):
+more thorough solution (Suggested by brian d foy):
$string =~ s/ (
(^\w) #at the beginning of the line
@@ -602,20 +785,34 @@ case", but that's not quite accurate. Consider the proper
capitalization of the movie I<Dr. Strangelove or: How I Learned to
Stop Worrying and Love the Bomb>, for example.
-=head2 How can I split a [character] delimited string except when inside
-[character]? (Comma-separated files)
+Damian Conway's L<Text::Autoformat> module provides some smart
+case transformations:
+
+ use Text::Autoformat;
+ my $x = "Dr. Strangelove or: How I Learned to Stop ".
+ "Worrying and Love the Bomb";
+
+ print $x, "\n";
+ for my $style (qw( sentence title highlight ))
+ {
+ print autoformat($x, { case => $style }), "\n";
+ }
+
+=head2 How can I split a [character] delimited string except when inside [character]?
-Take the example case of trying to split a string that is comma-separated
-into its different fields. (We'll pretend you said comma-separated, not
-comma-delimited, which is different and almost never what you mean.) You
-can't use C<split(/,/)> because you shouldn't split if the comma is inside
-quotes. For example, take a data line like this:
+Several modules can handle this sort of pasing---Text::Balanced,
+Text::CVS, Text::CVS_XS, and Text::ParseWords, among others.
+
+Take the example case of trying to split a string that is
+comma-separated into its different fields. You can't use C<split(/,/)>
+because you shouldn't split if the comma is inside quotes. For
+example, take a data line like this:
SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex
-problem. Thankfully, we have Jeffrey Friedl, author of a highly
-recommended book on regular expressions, to handle these for us. He
+problem. Thankfully, we have Jeffrey Friedl, author of
+I<Mastering Regular Expressions>, to handle these for us. He
suggests (assuming your string is contained in $text):
@new = ();
@@ -628,8 +825,7 @@ suggests (assuming your string is contained in $text):
If you want to represent quotation marks inside a
quotation-mark-delimited field, escape them with backslashes (eg,
-C<"like \"this\"">. Unescaping them is a task addressed earlier in
-this section.
+C<"like \"this\"">.
Alternatively, the Text::ParseWords module (part of the standard Perl
distribution) lets you say:
@@ -660,10 +856,10 @@ Or more nicely written as:
This idiom takes advantage of the C<foreach> loop's aliasing
behavior to factor out common code. You can do this
-on several strings at once, or arrays, or even the
+on several strings at once, or arrays, or even the
values of a hash if you use a slice:
- # trim whitespace in the scalar, the array,
+ # trim whitespace in the scalar, the array,
# and all the values in the hash
foreach ($scalar, @array, @hash{keys %hash}) {
s/^\s+//;
@@ -672,9 +868,6 @@ values of a hash if you use a slice:
=head2 How do I pad a string with blanks or pad a number with zeroes?
-(This answer contributed by Uri Guttman, with kibitzing from
-Bart Lateur.)
-
In the following examples, C<$pad_len> is the length to which you wish
to pad the string, C<$text> or C<$num> contains the string to be padded,
and C<$pad_char> contains the padding character. You can use a single
@@ -689,13 +882,16 @@ right with blanks and it will truncate the result to a maximum length of
C<$pad_len>.
# Left padding a string with blanks (no truncation):
- $padded = sprintf("%${pad_len}s", $text);
+ $padded = sprintf("%${pad_len}s", $text);
+ $padded = sprintf("%*s", $pad_len, $text); # same thing
# Right padding a string with blanks (no truncation):
- $padded = sprintf("%-${pad_len}s", $text);
+ $padded = sprintf("%-${pad_len}s", $text);
+ $padded = sprintf("%-*s", $pad_len, $text); # same thing
- # Left padding a number with 0 (no truncation):
- $padded = sprintf("%0${pad_len}d", $num);
+ # Left padding a number with 0 (no truncation):
+ $padded = sprintf("%0${pad_len}d", $num);
+ $padded = sprintf("%0*d", $pad_len, $num); # same thing
# Right padding a string with blanks using pack (will truncate):
$padded = pack("A$pad_len",$text);
@@ -718,19 +914,19 @@ Left and right padding with any character, modifying C<$text> directly:
=head2 How do I extract selected columns from a string?
Use substr() or unpack(), both documented in L<perlfunc>.
-If you prefer thinking in terms of columns instead of widths,
+If you prefer thinking in terms of columns instead of widths,
you can use this kind of thing:
# determine the unpack format needed to split Linux ps output
# arguments are cut columns
my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
- sub cut2fmt {
+ sub cut2fmt {
my(@positions) = @_;
my $template = '';
my $lastpos = 1;
for my $place (@positions) {
- $template .= "A" . ($place - $lastpos) . " ";
+ $template .= "A" . ($place - $lastpos) . " ";
$lastpos = $place;
}
$template .= "A*";
@@ -768,7 +964,7 @@ be, you'd have to do this:
It's probably better in the general case to treat those
variables as entries in some special hash. For example:
- %user_defs = (
+ %user_defs = (
foo => 23,
bar => 19,
);
@@ -782,7 +978,7 @@ of the FAQ.
The problem is that those double-quotes force stringification--
coercing numbers and references into strings--even when you
don't want them to be strings. Think of it this way: double-quote
-expansion is used to produce new strings. If you already
+expansion is used to produce new strings. If you already
have a string, why do you need more?
If you get used to writing odd things like these:
@@ -813,27 +1009,27 @@ that actually do care about the difference between a string and a
number, such as the magical C<++> autoincrement operator or the
syscall() function.
-Stringification also destroys arrays.
+Stringification also destroys arrays.
@lines = `command`;
print "@lines"; # WRONG - extra blanks
print @lines; # right
-=head2 Why don't my <<HERE documents work?
+=head2 Why don't my E<lt>E<lt>HERE documents work?
Check for these three things:
=over 4
-=item 1. There must be no space after the << part.
+=item There must be no space after the E<lt>E<lt> part.
-=item 2. There (probably) should be a semicolon at the end.
+=item There (probably) should be a semicolon at the end.
-=item 3. You can't (easily) have any space in front of the tag.
+=item You can't (easily) have any space in front of the tag.
=back
-If you want to indent the text in the here document, you
+If you want to indent the text in the here document, you
can do this:
# all in one
@@ -843,7 +1039,7 @@ can do this:
HERE_TARGET
But the HERE_TARGET must still be flush against the margin.
-If you want that indented also, you'll have to quote
+If you want that indented also, you'll have to quote
in the indentation.
($quote = <<' FINIS') =~ s/^\s+//gm;
@@ -852,7 +1048,7 @@ in the indentation.
would deliver us. You are a liar, Saruman, and a corrupter
of men's hearts. --Theoden in /usr/src/perl/taint.c
FINIS
- $quote =~ s/\s*--/\n--/;
+ $quote =~ s/\s+--/\n--/;
A nice general-purpose fixer-upper function for indented here documents
follows. It expects to be called with a here document as its argument.
@@ -938,7 +1134,7 @@ with
@bad[0] = `same program that outputs several lines`;
-The C<use warnings> pragma and the B<-w> flag will warn you about these
+The C<use warnings> pragma and the B<-w> flag will warn you about these
matters.
=head2 How can I remove duplicate elements from a list or array?
@@ -994,7 +1190,7 @@ Like (d), but @in contains only small positive integers:
But perhaps you should have been using a hash all along, eh?
-=head2 How can I tell whether a list or array contains a certain element?
+=head2 How can I tell whether a certain element is contained in a list or array?
Hearing the word "in" is an I<in>dication that you probably should have
used a hash, not a list or array, to store your data. Hashes are
@@ -1002,11 +1198,11 @@ designed to answer this question quickly and efficiently. Arrays aren't.
That being said, there are several ways to approach this. If you
are going to make this query many times over arbitrary string values,
-the fastest way is probably to invert the original array and keep an
-associative array lying about whose keys are the first array's values.
+the fastest way is probably to invert the original array and maintain a
+hash whose keys are the first array's values.
@blues = qw/azure cerulean teal turquoise lapis-lazuli/;
- undef %is_blue;
+ %is_blue = ();
for (@blues) { $is_blue{$_} = 1 }
Now you can check whether $is_blue{$some_color}. It might have been a
@@ -1016,7 +1212,7 @@ If the values are all small integers, you could use a simple indexed
array. This kind of an array will take up less space:
@primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
- undef @is_tiny_prime;
+ @is_tiny_prime = ();
for (@primes) { $is_tiny_prime[$_] = 1 }
# or simply @istiny_prime[@primes] = (1) x @primes;
@@ -1094,8 +1290,8 @@ like this one. It uses the CPAN module FreezeThaw:
@a = @b = ( "this", "that", [ "more", "stuff" ] );
printf "a and b contain %s arrays\n",
- cmpStr(\@a, \@b) == 0
- ? "the same"
+ cmpStr(\@a, \@b) == 0
+ ? "the same"
: "different";
This approach also works for comparing hashes. Here
@@ -1105,7 +1301,7 @@ we'll demonstrate two different answers:
%a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
$a{EXTRA} = \%b;
- $b{EXTRA} = \%a;
+ $b{EXTRA} = \%a;
printf "a and b contain %s hashes\n",
cmpStr(\%a, \%b) == 0 ? "the same" : "different";
@@ -1120,16 +1316,37 @@ an exercise to the reader.
=head2 How do I find the first array element for which a condition is true?
-You can use this if you care about the index:
+To find the first array element which satisfies a condition, you can
+use the first() function in the List::Util module, which comes with
+Perl 5.8. This example finds the first element that contains "Perl".
- for ($i= 0; $i < @array; $i++) {
- if ($array[$i] eq "Waldo") {
- $found_index = $i;
- last;
- }
- }
+ use List::Util qw(first);
+
+ my $element = first { /Perl/ } @array;
+
+If you cannot use List::Util, you can make your own loop to do the
+same thing. Once you find the element, you stop the loop with last.
-Now C<$found_index> has what you want.
+ my $found;
+ foreach my $element ( @array )
+ {
+ if( /Perl/ ) { $found = $element; last }
+ }
+
+If you want the array index, you can iterate through the indices
+and check the array element at each index until you find one
+that satisfies the condition.
+
+ my( $found, $index ) = ( undef, -1 );
+ for( $i = 0; $i < @array; $i++ )
+ {
+ if( $array[$i] =~ /Perl/ )
+ {
+ $found = $array[$i];
+ $index = $i;
+ last;
+ }
+ }
=head2 How do I handle linked lists?
@@ -1190,20 +1407,33 @@ lists, or you could just do something like this with an array:
=head2 How do I shuffle an array randomly?
-Use this:
+If you either have Perl 5.8.0 or later installed, or if you have
+Scalar-List-Utils 1.03 or later installed, you can say:
+
+ use List::Util 'shuffle';
+
+ @shuffled = shuffle(@list);
+
+If not, you can use a Fisher-Yates shuffle.
- # fisher_yates_shuffle( \@array ) :
- # generate a random permutation of @array in place
sub fisher_yates_shuffle {
- my $array = shift;
- my $i;
- for ($i = @$array; --$i; ) {
+ my $deck = shift; # $deck is a reference to an array
+ my $i = @$deck;
+ while ($i--) {
my $j = int rand ($i+1);
- @$array[$i,$j] = @$array[$j,$i];
+ @$deck[$i,$j] = @$deck[$j,$i];
}
}
- fisher_yates_shuffle( \@array ); # permutes @array in place
+ # shuffle my mpeg collection
+ #
+ my @mpeg = <audio/*/*.mp3>;
+ fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
+ print @mpeg;
+
+Note that the above implementation shuffles an array in place,
+unlike the List::Util::shuffle() which takes a list and returns
+a new shuffled list.
You've probably seen shuffling algorithms that work using splice,
randomly picking another element to swap the current element with
@@ -1236,13 +1466,25 @@ Here's another; let's compute spherical volumes:
$_ *= (4/3) * 3.14159; # this will be constant folded
}
-If you want to do the same thing to modify the values of the hash,
-you may not use the C<values> function, oddly enough. You need a slice:
+which can also be done with map() which is made to transform
+one list into another:
- for $orbit ( @orbits{keys %orbits} ) {
- ($orbit **= 3) *= (4/3) * 3.14159;
+ @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
+
+If you want to do the same thing to modify the values of the
+hash, you can use the C<values> function. As of Perl 5.6
+the values are not copied, so if you modify $orbit (in this
+case), you modify the value.
+
+ for $orbit ( values %orbits ) {
+ ($orbit **= 3) *= (4/3) * 3.14159;
}
+Prior to perl 5.6 C<values> returned copies of the values,
+so older perl code often contains constructions such as
+C<@orbits{keys %orbits}> instead of C<values %orbits> where
+the hash is to be modified.
+
=head2 How do I select a random element from an array?
Use the rand() function (see L<perlfunc/rand>):
@@ -1255,33 +1497,53 @@ Use the rand() function (see L<perlfunc/rand>):
$element = $array[$index];
Make sure you I<only call srand once per program, if then>.
-If you are calling it more than once (such as before each
+If you are calling it more than once (such as before each
call to rand), you're almost certainly doing something wrong.
=head2 How do I permute N elements of a list?
-Here's a little program that generates all permutations
-of all the words on each line of input. The algorithm embodied
-in the permute() function should work on any list:
-
- #!/usr/bin/perl -n
- # tsc-permute: permute each word of input
- permute([split], []);
- sub permute {
- my @items = @{ $_[0] };
- my @perms = @{ $_[1] };
- unless (@items) {
- print "@perms\n";
- } else {
- my(@newitems,@newperms,$i);
- foreach $i (0 .. $#items) {
- @newitems = @items;
- @newperms = @perms;
- unshift(@newperms, splice(@newitems, $i, 1));
- permute([@newitems], [@newperms]);
- }
+Use the List::Permutor module on CPAN. If the list is
+actually an array, try the Algorithm::Permute module (also
+on CPAN). It's written in XS code and is very efficient.
+
+ use Algorithm::Permute;
+ my @array = 'a'..'d';
+ my $p_iterator = Algorithm::Permute->new ( \@array );
+ while (my @perm = $p_iterator->next) {
+ print "next permutation: (@perm)\n";
}
- }
+
+For even faster execution, you could do:
+
+ use Algorithm::Permute;
+ my @array = 'a'..'d';
+ Algorithm::Permute::permute {
+ print "next permutation: (@array)\n";
+ } @array;
+
+Here's a little program that generates all permutations of
+all the words on each line of input. The algorithm embodied
+in the permute() function is discussed in Volume 4 (still
+unpublished) of Knuth's I<The Art of Computer Programming>
+and will work on any list:
+
+ #!/usr/bin/perl -n
+ # Fischer-Kause ordered permutation generator
+
+ sub permute (&@) {
+ my $code = shift;
+ my @idx = 0..$#_;
+ while ( $code->(@_[@idx]) ) {
+ my $p = $#idx;
+ --$p while $idx[$p-1] > $idx[$p];
+ my $q = $p or return;
+ push @idx, reverse splice @idx, $p;
+ ++$q while $idx[$p-1] > $idx[$q];
+ @idx[$p-1,$q]=@idx[$q,$p-1];
+ }
+ }
+
+ permute {print"@_\n"} split;
=head2 How do I sort an array by (anything)?
@@ -1324,8 +1586,9 @@ If you need to sort on several fields, the following paradigm is useful.
This can be conveniently combined with precalculation of keys as given
above.
-See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about
-this approach.
+See the F<sort> article in the "Far More Than You Ever Wanted
+To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
+more about this approach.
See also the question below on sorting hashes.
@@ -1338,7 +1601,7 @@ For example, this sets $vec to have bit N set if $ints[N] was set:
$vec = '';
foreach(@ints) { vec($vec,$_,1) = 1 }
-And here's how, given a vector in $vec, you can
+Here's how, given a vector in $vec, you can
get those bits into your @ints array:
sub bitvec_to_list {
@@ -1373,11 +1636,27 @@ get those bits into your @ints array:
This method gets faster the more sparse the bit vector is.
(Courtesy of Tim Bunce and Winfried Koenig.)
-Here's a demo on how to use vec():
+You can make the while loop a lot shorter with this suggestion
+from Benjamin Goldberg:
+
+ while($vec =~ /[^\0]+/g ) {
+ push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
+ }
+
+Or use the CPAN module Bit::Vector:
+
+ $vector = Bit::Vector->new($num_of_bits);
+ $vector->Index_List_Store(@ints);
+ @ints = $vector->Index_List_Read();
+
+Bit::Vector provides efficient methods for bit vector, sets of small integers
+and "big int" math.
+
+Here's a more extensive illustration using vec():
# vec demo
$vector = "\xff\x0f\xef\xfe";
- print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
+ print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
unpack("N", $vector), "\n";
$is_set = vec($vector, 23, 1);
print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
@@ -1397,7 +1676,7 @@ Here's a demo on how to use vec():
set_vec(0,32,17);
set_vec(1,32,17);
- sub set_vec {
+ sub set_vec {
my ($offset, $width, $value) = @_;
my $vector = '';
vec($vector, $offset, $width) = $value;
@@ -1414,7 +1693,7 @@ Here's a demo on how to use vec():
print "vector length in bytes: ", length($vector), "\n";
@bytes = unpack("A8" x length($vector), $bits);
print "bits are: @bytes\n\n";
- }
+ }
=head2 Why does defined() return true on empty arrays and hashes?
@@ -1477,13 +1756,13 @@ worry you, you can always reverse the hash into a hash of arrays instead:
=head2 How can I know how many entries are in a hash?
If you mean how many keys, then all you have to do is
-take the scalar sense of the keys() function:
+use the keys() function in a scalar context:
- $num_keys = scalar keys %hash;
+ $num_keys = keys %hash;
-The keys() function also resets the iterator, which in void context is
-faster for tied hashes than would be iterating through the whole
-hash, one key-value pair at a time.
+The keys() function also resets the iterator, which means that you may
+see strange results if you use this between uses of other hash operators
+such as each().
=head2 How do I sort a hash (optionally by value instead of key)?
@@ -1517,15 +1796,17 @@ The Tie::IxHash module from CPAN might also be instructive.
=head2 What's the difference between "delete" and "undef" with hashes?
-Hashes are pairs of scalars: the first is the key, the second is the
-value. The key will be coerced to a string, although the value can be
-any kind of scalar: string, number, or reference. If a key C<$key> is
-present in the array, C<exists($key)> will return true. The value for
-a given key can be C<undef>, in which case C<$array{$key}> will be
-C<undef> while C<$exists{$key}> will return true. This corresponds to
-(C<$key>, C<undef>) being in the hash.
+Hashes contain pairs of scalars: the first is the key, the
+second is the value. The key will be coerced to a string,
+although the value can be any kind of scalar: string,
+number, or reference. If a key $key is present in
+%hash, C<exists($hash{$key})> will return true. The value
+for a given key can be C<undef>, in which case
+C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
+will return true. This corresponds to (C<$key>, C<undef>)
+being in the hash.
-Pictures help... here's the C<%ary> table:
+Pictures help... here's the %hash table:
keys values
+------+------+
@@ -1537,16 +1818,16 @@ Pictures help... here's the C<%ary> table:
And these conditions hold
- $ary{'a'} is true
- $ary{'d'} is false
- defined $ary{'d'} is true
- defined $ary{'a'} is true
- exists $ary{'a'} is true (Perl5 only)
- grep ($_ eq 'a', keys %ary) is true
+ $hash{'a'} is true
+ $hash{'d'} is false
+ defined $hash{'d'} is true
+ defined $hash{'a'} is true
+ exists $hash{'a'} is true (Perl5 only)
+ grep ($_ eq 'a', keys %hash) is true
If you now say
- undef $ary{'a'}
+ undef $hash{'a'}
your table now reads:
@@ -1561,18 +1842,18 @@ your table now reads:
and these conditions now hold; changes in caps:
- $ary{'a'} is FALSE
- $ary{'d'} is false
- defined $ary{'d'} is true
- defined $ary{'a'} is FALSE
- exists $ary{'a'} is true (Perl5 only)
- grep ($_ eq 'a', keys %ary) is true
+ $hash{'a'} is FALSE
+ $hash{'d'} is false
+ defined $hash{'d'} is true
+ defined $hash{'a'} is FALSE
+ exists $hash{'a'} is true (Perl5 only)
+ grep ($_ eq 'a', keys %hash) is true
Notice the last two: you have an undef value, but a defined key!
Now, consider this:
- delete $ary{'a'}
+ delete $hash{'a'}
your table now reads:
@@ -1585,23 +1866,22 @@ your table now reads:
and these conditions now hold; changes in caps:
- $ary{'a'} is false
- $ary{'d'} is false
- defined $ary{'d'} is true
- defined $ary{'a'} is false
- exists $ary{'a'} is FALSE (Perl5 only)
- grep ($_ eq 'a', keys %ary) is FALSE
+ $hash{'a'} is false
+ $hash{'d'} is false
+ defined $hash{'d'} is true
+ defined $hash{'a'} is false
+ exists $hash{'a'} is FALSE (Perl5 only)
+ grep ($_ eq 'a', keys %hash) is FALSE
See, the whole entry is gone!
=head2 Why don't my tied hashes make the defined/exists distinction?
-They may or may not implement the EXISTS() and DEFINED() methods
-differently. For example, there isn't the concept of undef with hashes
-that are tied to DBM* files. This means the true/false tables above
-will give different results when used on such a hash. It also means
-that exists and defined do the same thing with a DBM* file, and what
-they end up doing is not what they do with ordinary hashes.
+This depends on the tied hash's implementation of EXISTS().
+For example, there isn't the concept of undef with hashes
+that are tied to DBM* files. It also means that exists() and
+defined() do the same thing with a DBM* file, and what they
+end up doing is not what they do with ordinary hashes.
=head2 How do I reset an each() operation part-way through?
@@ -1647,11 +1927,11 @@ it on top of either DB_File or GDBM_File.
Use the Tie::IxHash from CPAN.
use Tie::IxHash;
- tie(%myhash, Tie::IxHash);
- for ($i=0; $i<20; $i++) {
+ tie my %myhash, 'Tie::IxHash';
+ for (my $i=0; $i<20; $i++) {
$myhash{$i} = 2*$i;
}
- @keys = keys %myhash;
+ my @keys = keys %myhash;
# @keys = (0,1,2,3,...)
=head2 Why does passing a subroutine an undefined element in a hash create it?
@@ -1691,7 +1971,7 @@ in L<perltoot>.
=head2 How can I use a reference as a hash key?
-You can't do this directly, but you could use the standard Tie::Refhash
+You can't do this directly, but you could use the standard Tie::RefHash
module distributed with Perl.
=head1 Data: Misc
@@ -1707,9 +1987,7 @@ this works fine (assuming the files are found):
On less elegant (read: Byzantine) systems, however, you have
to play tedious games with "text" versus "binary" files. See
-L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking
-systems are curses out of Microsoft, who seem to be committed to putting
-the backward into backward compatibility.
+L<perlfunc/"binmode"> or L<perlopentut>.
If you're concerned about 8-bit ASCII data, then see L<perllocale>.
@@ -1726,11 +2004,21 @@ Assuming that you don't care about IEEE notations like "NaN" or
if (/^-?\d+$/) { print "is an integer\n" }
if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
- if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number" }
+ if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
- { print "a C float" }
-
-If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
+ { print "a C float\n" }
+
+There are also some commonly used modules for the task.
+L<Scalar::Util> (distributed with 5.8) provides access to perl's
+internal function C<looks_like_number> for determining
+whether a variable looks like a number. L<Data::Types>
+exports functions that validate data types using both the
+above and other regular expressions. Thirdly, there is
+C<Regexp::Common> which has regular expressions to match
+various types of numbers. Those three modules are available
+from the CPAN.
+
+If you're on a POSIX system, Perl supports the C<POSIX::strtod>
function. Its semantics are somewhat cumbersome, so here's a C<getnum>
wrapper function for more convenient access. This function takes
a string and returns the number it found, or C<undef> for input that
@@ -1748,37 +2036,39 @@ if you just want to say, ``Is this a float?''
return undef;
} else {
return $num;
- }
- }
+ }
+ }
- sub is_numeric { defined getnum($_[0]) }
+ sub is_numeric { defined getnum($_[0]) }
-Or you could check out the String::Scanf module on CPAN instead. The
-POSIX module (part of the standard Perl distribution) provides the
-C<strtod> and C<strtol> for converting strings to double and longs,
+Or you could check out the L<String::Scanf> module on the CPAN
+instead. The POSIX module (part of the standard Perl distribution) provides
+the C<strtod> and C<strtol> for converting strings to double and longs,
respectively.
=head2 How do I keep persistent data across program calls?
For some specific applications, you can use one of the DBM modules.
-See L<AnyDBM_File>. More generically, you should consult the FreezeThaw,
-Storable, or Class::Eroot modules from CPAN. Here's one example using
-Storable's C<store> and C<retrieve> functions:
+See L<AnyDBM_File>. More generically, you should consult the FreezeThaw
+or Storable modules from CPAN. Starting from Perl 5.8 Storable is part
+of the standard distribution. Here's one example using Storable's C<store>
+and C<retrieve> functions:
- use Storable;
+ use Storable;
store(\%hash, "filename");
- # later on...
+ # later on...
$href = retrieve("filename"); # by ref
%hash = %{ retrieve("filename") }; # direct to hash
=head2 How do I print out or copy a recursive data structure?
The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
-for printing out data structures. The Storable module, found on CPAN,
-provides a function called C<dclone> that recursively copies its argument.
+for printing out data structures. The Storable module on CPAN (or the
+5.8 release of Perl), provides a function called C<dclone> that recursively
+copies its argument.
- use Storable qw(dclone);
+ use Storable qw(dclone);
$r2 = dclone($r1);
Where $r1 can be a reference to any kind of data structure you'd like.
@@ -1804,15 +2094,11 @@ the PDL module from CPAN instead--it makes number-crunching easy.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
+Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
All rights reserved.
-When included as part of the Standard Version of Perl, or as part of
-its complete documentation whether printed or otherwise, this work
-may be distributed only under the terms of Perl's Artistic License.
-Any distribution of this file or derivatives thereof I<outside>
-of that package require that special arrangements be made with
-copyright holder.
+This documentation is free; you can redistribute it and/or modify it
+under the same terms as Perl itself.
Irrespective of its distribution, all code examples in this file
are hereby placed into the public domain. You are permitted and
diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod
index 4ae7407e96..cad896d71f 100644
--- a/pod/perlfaq5.pod
+++ b/pod/perlfaq5.pod
@@ -1,6 +1,6 @@
=head1 NAME
-perlfaq5 - Files and Formats ($Revision: 1.38 $, $Date: 1999/05/23 16:08:30 $)
+perlfaq5 - Files and Formats ($Revision: 1.28 $, $Date: 2003/01/26 17:45:46 $)
=head1 DESCRIPTION
@@ -9,152 +9,61 @@ formats, and footers.
=head2 How do I flush/unbuffer an output filehandle? Why must I do this?
-The C standard I/O library (stdio) normally buffers characters sent to
-devices. This is done for efficiency reasons so that there isn't a
-system call for each byte. Any time you use print() or write() in
-Perl, you go though this buffering. syswrite() circumvents stdio and
-buffering.
-
-In most stdio implementations, the type of output buffering and the size of
-the buffer varies according to the type of device. Disk files are block
-buffered, often with a buffer size of more than 2k. Pipes and sockets
-are often buffered with a buffer size between 1/2 and 2k. Serial devices
-(e.g. modems, terminals) are normally line-buffered, and stdio sends
-the entire line when it gets the newline.
-
-Perl does not support truly unbuffered output (except insofar as you can
-C<syswrite(OUT, $char, 1)>). What it does instead support is "command
-buffering", in which a physical write is performed after every output
-command. This isn't as hard on your system as unbuffering, but does
-get the output where you want it when you want it.
-
-If you expect characters to get to your device when you print them there,
-you'll want to autoflush its handle.
-Use select() and the C<$|> variable to control autoflushing
-(see L<perlvar/$|> and L<perlfunc/select>):
+Perl does not support truly unbuffered output (except
+insofar as you can C<syswrite(OUT, $char, 1)>), although it
+does support is "command buffering", in which a physical
+write is performed after every output command.
+
+The C standard I/O library (stdio) normally buffers
+characters sent to devices so that there isn't a system call
+for each byte. In most stdio implementations, the type of
+output buffering and the size of the buffer varies according
+to the type of device. Perl's print() and write() functions
+normally buffer output, while syswrite() bypasses buffering
+all together.
+
+If you want your output to be sent immediately when you
+execute print() or write() (for instance, for some network
+protocols), you must set the handle's autoflush flag. This
+flag is the Perl variable $| and when it is set to a true
+value, Perl will flush the handle's buffer after each
+print() or write(). Setting $| affects buffering only for
+the currently selected default file handle. You choose this
+handle with the one argument select() call (see
+L<perlvar/$E<verbar>> and L<perlfunc/select>).
+
+Use select() to choose the desired handle, then set its
+per-filehandle variables.
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
-Or using the traditional idiom:
+Some idioms can handle this in a single statement:
select((select(OUTPUT_HANDLE), $| = 1)[0]);
-Or if don't mind slowly loading several thousand lines of module code
-just because you're afraid of the C<$|> variable:
+ $| = 1, select $_ for select OUTPUT_HANDLE;
- use FileHandle;
- open(DEV, "+</dev/tty"); # ceci n'est pas une pipe
- DEV->autoflush(1);
-
-or the newer IO::* modules:
+Some modules offer object-oriented access to handles and their
+variables, although they may be overkill if this is the only
+thing you do with them. You can use IO::Handle:
use IO::Handle;
open(DEV, ">/dev/printer"); # but is this?
DEV->autoflush(1);
-or even this:
+or IO::Socket:
use IO::Socket; # this one is kinda a pipe?
- $sock = IO::Socket::INET->new(PeerAddr => 'www.perl.com',
- PeerPort => 'http(80)',
- Proto => 'tcp');
- die "$!" unless $sock;
+ my $sock = IO::Socket::INET->new( 'www.example.com:80' ) ;
$sock->autoflush();
- print $sock "GET / HTTP/1.0" . "\015\012" x 2;
- $document = join('', <$sock>);
- print "DOC IS: $document\n";
-
-Note the bizarrely hardcoded carriage return and newline in their octal
-equivalents. This is the ONLY way (currently) to assure a proper flush
-on all platforms, including Macintosh. That's the way things work in
-network programming: you really should specify the exact bit pattern
-on the network line terminator. In practice, C<"\n\n"> often works,
-but this is not portable.
-
-See L<perlfaq9> for other examples of fetching URLs over the web.
=head2 How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to the beginning of a file?
-Those are operations of a text editor. Perl is not a text editor.
-Perl is a programming language. You have to decompose the problem into
-low-level calls to read, write, open, close, and seek.
-
-Although humans have an easy time thinking of a text file as being a
-sequence of lines that operates much like a stack of playing cards--or
-punch cards--computers usually see the text file as a sequence of bytes.
-In general, there's no direct way for Perl to seek to a particular line
-of a file, insert text into a file, or remove text from a file.
-
-(There are exceptions in special circumstances. You can add or remove
-data at the very end of the file. A sequence of bytes can be replaced
-with another sequence of the same length. The C<$DB_RECNO> array
-bindings as documented in L<DB_File> also provide a direct way of
-modifying a file. Files where all lines are the same length are also
-easy to alter.)
-
-The general solution is to create a temporary copy of the text file with
-the changes you want, then copy that over the original. This assumes
-no locking.
-
- $old = $file;
- $new = "$file.tmp.$$";
- $bak = "$file.orig";
-
- open(OLD, "< $old") or die "can't open $old: $!";
- open(NEW, "> $new") or die "can't open $new: $!";
-
- # Correct typos, preserving case
- while (<OLD>) {
- s/\b(p)earl\b/${1}erl/i;
- (print NEW $_) or die "can't write to $new: $!";
- }
-
- close(OLD) or die "can't close $old: $!";
- close(NEW) or die "can't close $new: $!";
-
- rename($old, $bak) or die "can't rename $old to $bak: $!";
- rename($new, $old) or die "can't rename $new to $old: $!";
-
-Perl can do this sort of thing for you automatically with the C<-i>
-command-line switch or the closely-related C<$^I> variable (see
-L<perlrun> for more details). Note that
-C<-i> may require a suffix on some non-Unix systems; see the
-platform-specific documentation that came with your port.
-
- # Renumber a series of tests from the command line
- perl -pi -e 's/(^\s+test\s+)\d+/ $1 . ++$count /e' t/op/taint.t
-
- # form a script
- local($^I, @ARGV) = ('.orig', glob("*.c"));
- while (<>) {
- if ($. == 1) {
- print "This line should appear at the top of each file\n";
- }
- s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case
- print;
- close ARGV if eof; # Reset $.
- }
-
-If you need to seek to an arbitrary line of a file that changes
-infrequently, you could build up an index of byte positions of where
-the line ends are in the file. If the file is large, an index of
-every tenth or hundredth line end would allow you to seek and read
-fairly efficiently. If the file is sorted, try the look.pl library
-(part of the standard perl distribution).
-
-In the unique case of deleting lines at the end of a file, you
-can use tell() and truncate(). The following code snippet deletes
-the last line of a file without making a copy or reading the
-whole file into memory:
-
- open (FH, "+< $file");
- while ( <FH> ) { $addr = tell(FH) unless eof(FH) }
- truncate(FH, $addr);
-
-Error checking is left as an exercise for the reader.
+Use the Tie::File module, which is included in the standard
+distribution since Perl 5.8.0.
=head2 How do I count the number of lines in a file?
@@ -172,34 +81,52 @@ proper text file, so this may report one fewer line than you expect.
This assumes no funny games with newline translations.
-=head2 How do I make a temporary file name?
+=head2 How can I use Perl's C<-i> option from within a program?
+
+C<-i> sets the value of Perl's C<$^I> variable, which in turn affects
+the behavior of C<< <> >>; see L<perlrun> for more details. By
+modifying the appropriate variables directly, you can get the same
+behavior within a larger program. For example:
+
+ # ...
+ {
+ local($^I, @ARGV) = ('.orig', glob("*.c"));
+ while (<>) {
+ if ($. == 1) {
+ print "This line should appear at the top of each file\n";
+ }
+ s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case
+ print;
+ close ARGV if eof; # Reset $.
+ }
+ }
+ # $^I and @ARGV return to their old values here
+
+This block modifies all the C<.c> files in the current directory,
+leaving a backup of the original data from each file in a new
+C<.c.orig> file.
-Use the C<new_tmpfile> class method from the IO::File module to get a
-filehandle opened for reading and writing. Use it if you don't
-need to know the file's name:
+=head2 How do I make a temporary file name?
- use IO::File;
- $fh = IO::File->new_tmpfile()
- or die "Unable to make new temporary file: $!";
+Use the File::Temp module, see L<File::Temp> for more information.
-If you do need to know the file's name, you can use the C<tmpnam>
-function from the POSIX module to get a filename that you then open
-yourself:
+ use File::Temp qw/ tempfile tempdir /;
+ $dir = tempdir( CLEANUP => 1 );
+ ($fh, $filename) = tempfile( DIR => $dir );
- use Fcntl;
- use POSIX qw(tmpnam);
+ # or if you don't need to know the filename
- # try new temporary filenames until we get one that didn't already
- # exist; the check should be unnecessary, but you can't be too careful
- do { $name = tmpnam() }
- until sysopen(FH, $name, O_RDWR|O_CREAT|O_EXCL);
+ $fh = tempfile( DIR => $dir );
- # install atexit-style handler so that when we exit or die,
- # we automatically delete this temporary file
- END { unlink($name) or die "Couldn't unlink $name : $!" }
+The File::Temp has been a standard module since Perl 5.6.1. If you
+don't have a modern enough Perl installed, use the C<new_tmpfile>
+class method from the IO::File module to get a filehandle opened for
+reading and writing. Use it if you don't need to know the file's name:
- # now go on to use the file ...
+ use IO::File;
+ $fh = IO::File->new_tmpfile()
+ or die "Unable to make new temporary file: $!";
If you're committed to creating a temporary file by hand, use the
process ID and/or the current time-value. If you need to have many
@@ -207,7 +134,7 @@ temporary files in one process, use a counter:
BEGIN {
use Fcntl;
- my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMP} || $ENV{TEMP};
+ my $temp_dir = -d '/tmp' ? '/tmp' : $ENV{TMPDIR} || $ENV{TEMP};
my $base_name = sprintf("%s/%d-%d-0000", $temp_dir, $$, time());
sub temp_file {
local *FH;
@@ -237,7 +164,7 @@ Berkeley-style ps:
# 15158 p5 T 0:00 perl /home/tchrist/scripts/now-what
$PS_T = 'A6 A4 A7 A5 A*';
open(PS, "ps|");
- print scalar <PS>;
+ print scalar <PS>;
while (<PS>) {
($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_);
for $var (qw!pid tt stat time command!) {
@@ -249,79 +176,36 @@ Berkeley-style ps:
We've used C<$$var> in a way that forbidden by C<use strict 'refs'>.
That is, we've promoted a string to a scalar variable reference using
-symbolic references. This is ok in small programs, but doesn't scale
+symbolic references. This is okay in small programs, but doesn't scale
well. It also only works on global variables, not lexicals.
=head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles?
-The fastest, simplest, and most direct way is to localize the typeglob
-of the filehandle in question:
+As of perl5.6, open() autovivifies file and directory handles
+as references if you pass it an uninitialized scalar variable.
+You can then pass these references just like any other scalar,
+and use them in the place of named handles.
- local *TmpHandle;
+ open my $fh, $file_name;
-Typeglobs are fast (especially compared with the alternatives) and
-reasonably easy to use, but they also have one subtle drawback. If you
-had, for example, a function named TmpHandle(), or a variable named
-%TmpHandle, you just hid it from yourself.
+ open local $fh, $file_name;
- sub findme {
- local *HostFile;
- open(HostFile, "</etc/hosts") or die "no /etc/hosts: $!";
- local $_; # <- VERY IMPORTANT
- while (<HostFile>) {
- print if /\b127\.(0\.0\.)?1\b/;
- }
- # *HostFile automatically closes/disappears here
- }
+ print $fh "Hello World!\n";
-Here's how to use typeglobs in a loop to open and store a bunch of
-filehandles. We'll use as values of the hash an ordered
-pair to make it easy to sort the hash in insertion order.
+ process_file( $fh );
- @names = qw(motd termcap passwd hosts);
- my $i = 0;
- foreach $filename (@names) {
- local *FH;
- open(FH, "/etc/$filename") || die "$filename: $!";
- $file{$filename} = [ $i++, *FH ];
- }
+Before perl5.6, you had to deal with various typeglob idioms
+which you may see in older code.
- # Using the filehandles in the array
- foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) {
- my $fh = $file{$name}[1];
- my $line = <$fh>;
- print "$name $. $line";
- }
+ open FILE, "> $filename";
+ process_typeglob( *FILE );
+ process_reference( \*FILE );
-For passing filehandles to functions, the easiest way is to
-preface them with a star, as in func(*STDIN).
-See L<perlfaq7/"Passing Filehandles"> for details.
+ sub process_typeglob { local *FH = shift; print FH "Typeglob!" }
+ sub process_reference { local $fh = shift; print $fh "Reference!" }
-If you want to create many anonymous handles, you should check out the
-Symbol, FileHandle, or IO::Handle (etc.) modules. Here's the equivalent
-code with Symbol::gensym, which is reasonably light-weight:
-
- foreach $filename (@names) {
- use Symbol;
- my $fh = gensym();
- open($fh, "/etc/$filename") || die "open /etc/$filename: $!";
- $file{$filename} = [ $i++, $fh ];
- }
-
-Here's using the semi-object-oriented FileHandle module, which certainly
-isn't light-weight:
-
- use FileHandle;
-
- foreach $filename (@names) {
- my $fh = FileHandle->new("/etc/$filename") or die "$filename: $!";
- $file{$filename} = [ $i++, $fh ];
- }
-
-Please understand that whether the filehandle happens to be a (probably
-localized) typeglob or an anonymous handle from one of the modules
-in no way affects the bizarre rules for managing indirect handles.
-See the next question.
+If you want to create many anonymous handles, you should
+check out the Symbol or IO::Handle modules.
=head2 How can I use a filehandle indirectly?
@@ -335,13 +219,10 @@ to get indirect filehandles:
$fh = \*SOME_FH; # ref to typeglob (bless-able)
$fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob
-Or, you can use the C<new> method from the FileHandle or IO modules to
+Or, you can use the C<new> method from one of the IO::* modules to
create an anonymous filehandle, store that in a scalar variable,
and use it as though it were a normal filehandle.
- use FileHandle;
- $fh = FileHandle->new();
-
use IO::Handle; # 5.004 or higher
$fh = IO::Handle->new();
@@ -349,7 +230,7 @@ Then use any of those as you would a normal filehandle. Anywhere that
Perl is expecting a filehandle, an indirect filehandle may be used
instead. An indirect filehandle is just a scalar variable that contains
a filehandle. Functions like C<print>, C<open>, C<seek>, or
-the C<< <FH> >> diamond operator will accept either a read filehandle
+the C<< <FH> >> diamond operator will accept either a named filehandle
or a scalar variable containing one:
($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR);
@@ -383,7 +264,7 @@ In the examples above, we assigned the filehandle to a scalar variable
before using it. That is because only simple scalar variables, not
expressions or subscripts of hashes or arrays, can be used with
built-ins like C<print>, C<printf>, or the diamond operator. Using
-something other than a simple scalar varaible as a filehandle is
+something other than a simple scalar variable as a filehandle is
illegal and won't even compile:
@fd = (*STDIN, *STDOUT, *STDERR);
@@ -401,17 +282,17 @@ an expression where you would place the filehandle:
That block is a proper block like any other, so you can put more
complicated code there. This sends the message out to one of two places:
- $ok = -x "/bin/cat";
+ $ok = -x "/bin/cat";
print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";
- print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";
+ print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";
This approach of treating C<print> and C<printf> like object methods
calls doesn't work for the diamond operator. That's because it's a
real operator, not just a function with a comma-less argument. Assuming
you've been storing typeglobs in your structure as we did above, you
-can use the built-in function named C<readline> to reads a record just
+can use the built-in function named C<readline> to read a record just
as C<< <> >> does. Given the initialization shown above for @fd, this
-would work, but only because readline() require a typeglob. It doesn't
+would work, but only because readline() requires a typeglob. It doesn't
work with objects or strings, which might be a bug we haven't fixed yet.
$got = readline($fd[0]);
@@ -432,44 +313,38 @@ See L<perlform/"Accessing Formatting Internals"> for an swrite() function.
=head2 How can I output my numbers with commas added?
-This one will do it for you:
+This subroutine will add commas to your number:
- sub commify {
- local $_ = shift;
- 1 while s/^([-+]?\d+)(\d{3})/$1,$2/;
- return $_;
- }
+ sub commify {
+ local $_ = shift;
+ 1 while s/^([-+]?\d+)(\d{3})/$1,$2/;
+ return $_;
+ }
- $n = 23659019423.2331;
- print "GOT: ", commify($n), "\n";
+This regex from Benjamin Goldberg will add commas to numbers:
- GOT: 23,659,019,423.2331
+ s/(^[-+]?\d+?(?=(?>(?:\d{3})+)(?!\d))|\G\d{3}(?=\d))/$1,/g;
-You can't just:
+It is easier to see with comments:
- s/^([-+]?\d+)(\d{3})/$1,$2/g;
-
-because you have to put the comma in and then recalculate your
-position.
-
-Alternatively, this code commifies all numbers in a line regardless of
-whether they have decimal portions, are preceded by + or -, or
-whatever:
-
- # from Andrew Johnson <ajohnson@gpu.srv.ualberta.ca>
- sub commify {
- my $input = shift;
- $input = reverse $input;
- $input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g;
- return scalar reverse $input;
- }
+ s/(
+ ^[-+]? # beginning of number.
+ \d{1,3}? # first digits before first comma
+ (?= # followed by, (but not included in the match) :
+ (?>(?:\d{3})+) # some positive multiple of three digits.
+ (?!\d) # an *exact* multiple, not x * 3 + 1 or whatever.
+ )
+ | # or:
+ \G\d{3} # after the last group, get three digits
+ (?=\d) # but they have to have more digits after them.
+ )/$1,/xg;
=head2 How can I translate tildes (~) in a filename?
Use the <> (glob()) operator, documented in L<perlfunc>. Older
versions of Perl require that you have a shell installed that groks
tildes. Recent perl versions have this feature built in. The
-Glob::KGlob module (available from CPAN) gives more portable glob
+File::KGlob module (available from CPAN) gives more portable glob
functionality.
Within Perl, you may use this directly:
@@ -494,7 +369,7 @@ I<then> gives you read-write access:
open(FH, "+> /path/name"); # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
-doesn't exist.
+doesn't exist.
open(FH, "+< /path/name"); # open for update
@@ -559,7 +434,7 @@ isn't as exclusive as you might wish.
See also the new L<perlopentut> if you have it (new for 5.6).
-=head2 Why do I sometimes get an "Argument list too long" when I use <*>?
+=head2 Why do I sometimes get an "Argument list too long" when I use E<lt>*E<gt>?
The C<< <> >> operator performs a globbing operation (see above).
In Perl versions earlier than v5.6.0, the internal glob() operator forks
@@ -569,7 +444,7 @@ C<Argument list too long>. People who installed tcsh as csh won't
have this problem, but their users may be surprised by it.
To get around this, either upgrade to Perl v5.6.0 or later, do the glob
-yourself with readdir() and patterns, or use a module like Glob::KGlob,
+yourself with readdir() and patterns, or use a module like File::KGlob,
one that doesn't use the shell to do globbing.
=head2 Is there a leak/bug in glob()?
@@ -583,55 +458,37 @@ best therefore to use glob() only in list context.
Normally perl ignores trailing blanks in filenames, and interprets
certain leading characters (or a trailing "|") to mean something
-special. To avoid this, you might want to use a routine like the one below.
-It turns incomplete pathnames into explicit relative ones, and tacks a
-trailing null byte on the name to make perl leave it alone:
-
- sub safe_filename {
- local $_ = shift;
- s#^([^./])#./$1#;
- $_ .= "\0";
- return $_;
- }
+special.
- $badpath = "<<<something really wicked ";
- $fn = safe_filename($badpath");
- open(FH, "> $fn") or "couldn't open $badpath: $!";
+The three argument form of open() lets you specify the mode
+separately from the filename. The open() function treats
+special mode characters and whitespace in the filename as
+literals
-This assumes that you are using POSIX (portable operating systems
-interface) paths. If you are on a closed, non-portable, proprietary
-system, you may have to adjust the C<"./"> above.
+ open FILE, "<", " file "; # filename is " file "
+ open FILE, ">", ">file"; # filename is ">file"
-It would be a lot clearer to use sysopen(), though:
+It may be a lot clearer to use sysopen(), though:
use Fcntl;
$badpath = "<<<something really wicked ";
sysopen (FH, $badpath, O_WRONLY | O_CREAT | O_TRUNC)
or die "can't open $badpath: $!";
-For more information, see also the new L<perlopentut> if you have it
-(new for 5.6).
-
=head2 How can I reliably rename a file?
-Well, usually you just use Perl's rename() function. That may not
-work everywhere, though, particularly when renaming files across file systems.
-Some sub-Unix systems have broken ports that corrupt the semantics of
-rename()--for example, WinNT does this right, but Win95 and Win98
-are broken. (The last two parts are not surprising, but the first is. :-)
-
-If your operating system supports a proper mv(1) program or its moral
-equivalent, this works:
+If your operating system supports a proper mv(1) utility or its
+functional equivalent, this works:
rename($old, $new) or system("mv", $old, $new);
-It may be more compelling to use the File::Copy module instead. You
-just copy to the new file to the new name (checking return values),
-then delete the old one. This isn't really the same semantically as a
-real rename(), though, which preserves metainformation like
+It may be more portable to use the File::Copy module instead.
+You just copy to the new file to the new name (checking return
+values), then delete the old one. This isn't really the same
+semantically as a rename(), which preserves meta-information like
permissions, timestamps, inode info, etc.
-Newer versions of File::Copy exports a move() function.
+Newer versions of File::Copy export a move() function.
=head2 How can I lock a file?
@@ -675,12 +532,12 @@ for your own system's idiosyncrasies (sometimes called "features").
Slavish adherence to portability concerns shouldn't get in the way of
your getting your job done.)
-For more information on file locking, see also
+For more information on file locking, see also
L<perlopentut/"File Locking"> if you have it (new for 5.6).
=back
-=head2 Why can't I just open(FH, ">file.lock")?
+=head2 Why can't I just open(FH, "E<gt>file.lock")?
A common bit of code B<NOT TO USE> is this:
@@ -692,7 +549,7 @@ which must be done in one. That's why computer hardware provides an
atomic test-and-set instruction. In theory, this "ought" to work:
sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT)
- or die "can't open file.lock: $!":
+ or die "can't open file.lock: $!";
except that lamentably, file creation (and deletion) is not atomic
over NFS, so this won't work (at least, not every time) over the net.
@@ -723,6 +580,34 @@ Here's a much better web-page hit counter:
If the count doesn't impress your friends, then the code might. :-)
+=head2 All I want to do is append a small amount of text to the end of a file. Do I still have to use locking?
+
+If you are on a system that correctly implements flock() and you use the
+example appending code from "perldoc -f flock" everything will be OK
+even if the OS you are on doesn't implement append mode correctly (if
+such a system exists.) So if you are happy to restrict yourself to OSs
+that implement flock() (and that's not really much of a restriction)
+then that is what you should do.
+
+If you know you are only going to use a system that does correctly
+implement appending (i.e. not Win32) then you can omit the seek() from
+the above code.
+
+If you know you are only writing code to run on an OS and filesystem that
+does implement append mode correctly (a local filesystem on a modern
+Unix for example), and you keep the file in block-buffered mode and you
+write less than one buffer-full of output between each manual flushing
+of the buffer then each bufferload is almost guaranteed to be written to
+the end of the file in one chunk without getting intermingled with
+anyone else's output. You can also use the syswrite() function which is
+simply a wrapper around your systems write(2) system call.
+
+There is still a small theoretical chance that a signal will interrupt
+the system level write() operation before completion. There is also a
+possibility that some STDIO implementations may call multiple system
+level write()s even if the buffer was empty to start. There may be some
+systems where this probability is reduced to zero.
+
=head2 How do I randomly update a binary file?
If you're just trying to patch a binary, in many cases something as
@@ -748,14 +633,17 @@ Don't forget them or you'll be quite sorry.
=head2 How do I get a file's timestamp in perl?
-If you want to retrieve the time at which the file was last read,
-written, or had its meta-data (owner, etc) changed, you use the B<-M>,
-B<-A>, or B<-C> filetest operations as documented in L<perlfunc>. These
-retrieve the age of the file (measured against the start-time of your
-program) in days as a floating point number. To retrieve the "raw"
-time in seconds since the epoch, you would call the stat function,
-then use localtime(), gmtime(), or POSIX::strftime() to convert this
-into human-readable form.
+If you want to retrieve the time at which the file was last
+read, written, or had its meta-data (owner, etc) changed,
+you use the B<-M>, B<-A>, or B<-C> file test operations as
+documented in L<perlfunc>. These retrieve the age of the
+file (measured against the start-time of your program) in
+days as a floating point number. Some platforms may not have
+all of these times. See L<perlport> for details. To
+retrieve the "raw" time in seconds since the epoch, you
+would call the stat function, then use localtime(),
+gmtime(), or POSIX::strftime() to convert this into
+human-readable form.
Here's an example:
@@ -798,30 +686,22 @@ utime() on those platforms.
=head2 How do I print to more than one file at once?
-If you only have to do this once, you can do this:
+To connect one filehandle to several output filehandles,
+you can use the IO::Tee or Tie::FileHandle::Multiplex modules.
- for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
+If you only have to do this once, you can print individually
+to each filehandle.
-To connect up to one filehandle to several output filehandles, it's
-easiest to use the tee(1) program if you have it, and let it take care
-of the multiplexing:
+ for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
- open (FH, "| tee file1 file2 file3");
+=head2 How can I read in an entire file all at once?
-Or even:
+You can use the File::Slurp module to do it in one step.
- # make STDOUT go to three files, plus original STDOUT
- open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n";
- print "whatever\n" or die "Writing: $!\n";
- close(STDOUT) or die "Closing: $!\n";
+ use File::Slurp;
-Otherwise you'll have to write your own multiplexing print
-function--or your own tee program--or use Tom Christiansen's,
-at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz , which is
-written in Perl and offers much greater functionality
-than the stock version.
-
-=head2 How can I read in an entire file all at once?
+ $all_of_it = read_file($filename); # entire file in scalar
+ @all_lines = read_file($filename); # one line perl element
The customary Perl approach for processing all the lines in a file is to
do so one line at a time:
@@ -830,7 +710,7 @@ do so one line at a time:
while (<INPUT>) {
chomp;
# do something with $_
- }
+ }
close(INPUT) || die "can't close $file: $!";
This is tremendously more efficient than reading the entire file into
@@ -840,27 +720,14 @@ you see someone do this:
@lines = <INPUT>;
-you should think long and hard about why you need everything loaded
-at once. It's just not a scalable solution. You might also find it
-more fun to use the standard DB_File module's $DB_RECNO bindings,
-which allow you to tie an array to a file so that accessing an element
-the array actually accesses the corresponding line in the file.
-
-On very rare occasion, you may have an algorithm that demands that
-the entire file be in memory at once as one scalar. The simplest solution
-to that is
-
- $var = `cat $file`;
-
-Being in scalar context, you get the whole thing. In list context,
-you'd get a list of all the lines:
-
- @lines = `cat $file`;
+you should think long and hard about why you need everything loaded at
+once. It's just not a scalable solution. You might also find it more
+fun to use the standard Tie::File module, or the DB_File module's
+$DB_RECNO bindings, which allow you to tie an array to a file so that
+accessing an element the array actually accesses the corresponding
+line in the file.
-This tiny but expedient solution is neat, clean, and portable to
-all systems on which decent tools have been installed. For those
-who prefer not to use the toolbox, you can of course read the file
-manually, although this makes for more complicated code.
+You can read the entire filehandle contents into a scalar.
{
local(*INPUT, $/);
@@ -868,11 +735,18 @@ manually, although this makes for more complicated code.
$var = <INPUT>;
}
-That temporarily undefs your record separator, and will automatically
+That temporarily undefs your record separator, and will automatically
close the file at block exit. If the file is already open, just use this:
$var = do { local $/; <INPUT> };
+For ordinary files you can also use the read function.
+
+ read( INPUT, $var, -s INPUT );
+
+The third argument tests the byte size of the data on the INPUT filehandle
+and reads that many bytes into the buffer $var.
+
=head2 How can I read in a file by paragraphs?
Use the C<$/> variable (see L<perlvar> for details). You can either
@@ -880,8 +754,8 @@ set it to C<""> to eliminate empty paragraphs (C<"abc\n\n\n\ndef">,
for instance, gets treated as two paragraphs and not three), or
C<"\n\n"> to accept empty paragraphs.
-Note that a blank line must have no blanks in it. Thus C<"fred\n
-\nstuff\n\n"> is one paragraph, but C<"fred\n\nstuff\n\n"> is two.
+Note that a blank line must have no blanks in it. Thus
+S<C<"fred\n \nstuff\n\n">> is one paragraph, but C<"fred\n\nstuff\n\n"> is two.
=head2 How can I read a single character from a file? From the keyboard?
@@ -955,52 +829,6 @@ include also support for non-portable systems as well.
printf "\nYou said %s, char number %03d\n",
$key, ord $key;
-For legacy DOS systems, Dan Carson <dbc@tc.fluke.COM> reports the following:
-
-To put the PC in "raw" mode, use ioctl with some magic numbers gleaned
-from msdos.c (Perl source file) and Ralf Brown's interrupt list (comes
-across the net every so often):
-
- $old_ioctl = ioctl(STDIN,0,0); # Gets device info
- $old_ioctl &= 0xff;
- ioctl(STDIN,1,$old_ioctl | 32); # Writes it back, setting bit 5
-
-Then to read a single character:
-
- sysread(STDIN,$c,1); # Read a single character
-
-And to put the PC back to "cooked" mode:
-
- ioctl(STDIN,1,$old_ioctl); # Sets it back to cooked mode.
-
-So now you have $c. If C<ord($c) == 0>, you have a two byte code, which
-means you hit a special key. Read another byte with C<sysread(STDIN,$c,1)>,
-and that value tells you what combination it was according to this
-table:
-
- # PC 2-byte keycodes = ^@ + the following:
-
- # HEX KEYS
- # --- ----
- # 0F SHF TAB
- # 10-19 ALT QWERTYUIOP
- # 1E-26 ALT ASDFGHJKL
- # 2C-32 ALT ZXCVBNM
- # 3B-44 F1-F10
- # 47-49 HOME,UP,PgUp
- # 4B LEFT
- # 4D RIGHT
- # 4F-53 END,DOWN,PgDn,Ins,Del
- # 54-5D SHF F1-F10
- # 5E-67 CTR F1-F10
- # 68-71 ALT F1-F10
- # 73-77 CTR LEFT,RIGHT,END,PgDn,HOME
- # 78-83 ALT 1234567890-=
- # 84 CTR PgUp
-
-This is all trial and error I did a long time ago; I hope I'm reading the
-file that worked...
-
=head2 How can I tell whether there's a character waiting on a filehandle?
The very first thing you should do is look into getting the Term::ReadKey
@@ -1049,7 +877,7 @@ Or write a small C program using the editor of champions:
% ./fionread
0x4004667f
-And then hard-code it, leaving porting as an exercise to your successor.
+And then hard code it, leaving porting as an exercise to your successor.
$FIONREAD = 0x4004667f; # XXX: opsys dependent
@@ -1103,7 +931,7 @@ Or even with a literal numeric descriptor:
Note that "<&STDIN" makes a copy, but "<&=STDIN" make
an alias. That means if you close an aliased handle, all
-aliases become inaccessible. This is not true with
+aliases become inaccessible. This is not true with
a copied one.
Error checking, as always, has been left as an exercise for the reader.
@@ -1121,13 +949,13 @@ to, you may be able to do this:
Or, just use the fdopen(3S) feature of open():
- {
- local *F;
+ {
+ local *F;
open F, "<&=$fd" or die "Cannot reopen fd=$fd: $!";
close F;
}
-=head2 Why can't I use "C:\temp\foo" in DOS paths? What doesn't `C:\temp\foo.exe` work?
+=head2 Why can't I use "C:\temp\foo" in DOS paths? Why doesn't `C:\temp\foo.exe` work?
Whoops! You just put a tab and a formfeed into that filename!
Remember that within double quoted strings ("like\this"), the
@@ -1153,9 +981,9 @@ documentation for details.
=head2 Why does Perl let me delete read-only files? Why does C<-i> clobber protected files? Isn't this a bug in Perl?
-This is elaborately and painstakingly described in the "Far More Than
-You Ever Wanted To Know" in
-http://www.perl.com/CPAN/doc/FMTEYEWTK/file-dir-perms .
+This is elaborately and painstakingly described in the
+F<file-dir-perms> article in the "Far More Than You Ever Wanted To
+Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz .
The executive summary: learn how your filesystem works. The
permissions on a file say what can happen to the data in that file.
@@ -1172,9 +1000,18 @@ Here's an algorithm from the Camel Book:
srand;
rand($.) < 1 && ($line = $_) while <>;
-This has a significant advantage in space over reading the whole
-file in. A simple proof by induction is available upon
-request if you doubt the algorithm's correctness.
+This has a significant advantage in space over reading the whole file
+in. You can find a proof of this method in I<The Art of Computer
+Programming>, Volume 2, Section 3.4.2, by Donald E. Knuth.
+
+You can use the File::Random module which provides a function
+for that algorithm:
+
+ use File::Random qw/random_line/;
+ my $line = random_line($filename);
+
+Another way is to use the Tie::File module, which treats the entire
+file as an array. Simply access a random array element.
=head2 Why do I get weird spaces when I print an array of lines?
@@ -1201,13 +1038,11 @@ If your array contains lines, just print them:
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
+Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
All rights reserved.
-When included as an integrated part of the Standard Distribution
-of Perl or of its documentation (printed or otherwise), this works is
-covered under Perl's Artistic License. For separate distributions of
-all or part of this FAQ outside of that, see L<perlfaq>.
+This documentation is free; you can redistribute it and/or modify it
+under the same terms as Perl itself.
Irrespective of its distribution, all code examples here are in the public
domain. You are permitted and encouraged to use this code and any
diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod
index ed6c01b31b..168233bd1b 100644
--- a/pod/perlfaq6.pod
+++ b/pod/perlfaq6.pod
@@ -1,6 +1,6 @@
=head1 NAME
-perlfaq6 - Regexes ($Revision: 1.27 $, $Date: 1999/05/23 16:08:30 $)
+perlfaq6 - Regular Expressions ($Revision: 1.20 $, $Date: 2003/01/03 20:05:28 $)
=head1 DESCRIPTION
@@ -8,8 +8,8 @@ This section is surprisingly small because the rest of the FAQ is
littered with answers involving regular expressions. For example,
decoding a URL and checking whether something is a number are handled
with regular expressions, but those answers are found elsewhere in
-this document (in L<perlfaq9>: ``How do I decode or create those %-encodings
-on the web'' and L<perfaq4>: ``How do I determine whether a scalar is
+this document (in L<perlfaq9>: ``How do I decode or create those %-encodings
+on the web'' and L<perlfaq4>: ``How do I determine whether a scalar is
a number/whole/integer/float'', to be precise).
=head2 How can I hope to use regular expressions without creating illegible and unmaintainable code?
@@ -70,9 +70,9 @@ delimiter within the pattern:
=head2 I'm having trouble matching over more than one line. What's wrong?
-Either you don't have more than one line in the string you're looking at
-(probably), or else you aren't using the correct modifier(s) on your
-pattern (possibly).
+Either you don't have more than one line in the string you're looking
+at (probably), or else you aren't using the correct modifier(s) on
+your pattern (possibly).
There are many ways to get multiline data into a string. If you want
it to happen automatically while reading input, you'll want to set $/
@@ -115,7 +115,7 @@ Here's code that finds everything between START and END in a paragraph:
undef $/; # read in whole file, not just one line or paragraph
while ( <> ) {
- while ( /START(.*?)END/sm ) { # /s makes . cross line boundaries
+ while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries
print "$1\n";
}
}
@@ -143,38 +143,38 @@ Here's another example of using C<..>:
# now choose between them
} continue {
reset if eof(); # fix $.
- }
+ }
=head2 I put a regular expression into $/ but it didn't work. What's wrong?
-$/ must be a string, not a regular expression. Awk has to be better
-for something. :-)
-
-Actually, you could do this if you don't mind reading the whole file
-into memory:
-
- undef $/;
- @records = split /your_pattern/, <FH>;
+Up to Perl 5.8.0, $/ has to be a string. This may change in 5.10,
+but don't get your hopes up. Until then, you can use these examples
+if you really need to do this.
-The Net::Telnet module (available from CPAN) has the capability to
-wait for a pattern in the input stream, or timeout if it doesn't
-appear within a certain time.
+Use the four argument form of sysread to continually add to
+a buffer. After you add to the buffer, you check if you have a
+complete line (using your regular expression).
- ## Create a file with three lines.
- open FH, ">file";
- print FH "The first line\nThe second line\nThe third line\n";
- close FH;
+ local $_ = "";
+ while( sysread FH, $_, 8192, length ) {
+ while( s/^((?s).*?)your_pattern/ ) {
+ my $record = $1;
+ # do stuff here.
+ }
+ }
- ## Get a read/write filehandle to it.
- $fh = new FileHandle "+<file";
+ You can do the same thing with foreach and a match using the
+ c flag and the \G anchor, if you do not mind your entire file
+ being in memory at the end.
- ## Attach it to a "stream" object.
- use Net::Telnet;
- $file = new Net::Telnet (-fhopen => $fh);
+ local $_ = "";
+ while( sysread FH, $_, 8192, length ) {
+ foreach my $record ( m/\G((?s).*?)your_pattern/gc ) {
+ # do stuff here.
+ }
+ substr( $_, 0, pos ) = "" if pos;
+ }
- ## Search for the second line and print out the third.
- $file->waitfor('/second line\n/');
- print $file->getline;
=head2 How do I substitute case insensitively on the LHS while preserving case on the RHS?
@@ -194,14 +194,14 @@ properties of bitwise xor on ASCII strings.
print;
-And here it is as a subroutine, modelled after the above:
+And here it is as a subroutine, modeled after the above:
sub preserve_case($$) {
my ($old, $new) = @_;
my $mask = uc $old ^ $old;
uc $new | $mask .
- substr($mask, -1) x (length($new) - length($old))
+ substr($mask, -1) x (length($new) - length($old))
}
$a = "this is a TEsT case";
@@ -212,6 +212,21 @@ This prints:
this is a SUcCESS case
+As an alternative, to keep the case of the replacement word if it is
+longer than the original, you can use this code, by Jeff Pinyan:
+
+ sub preserve_case {
+ my ($from, $to) = @_;
+ my ($lf, $lt) = map length, @_;
+
+ if ($lt < $lf) { $from = substr $from, 0, $lt }
+ else { $from .= substr $to, $lf }
+
+ return uc $to | ($from ^ uc $from);
+ }
+
+This changes the sentence to "this is a SUcCess case."
+
Just to show that C programmers can write C in any programming language,
if you prefer a more C-like solution, the following script makes the
substitution have the same case, letter by letter, as the original.
@@ -252,13 +267,21 @@ the case of the last character is used for the rest of the substitution.
=head2 How can I make C<\w> match national character sets?
-See L<perllocale>.
+Put C<use locale;> in your script. The \w character class is taken
+from the current locale.
+
+See L<perllocale> for details.
=head2 How can I match a locale-smart version of C</[a-zA-Z]/>?
-One alphabetic character would be C</[^\W\d_]/>, no matter what locale
-you're in. Non-alphabetics would be C</[\W\d_]/> (assuming you don't
-consider an underscore a letter).
+You can use the POSIX character class syntax C</[[:alpha:]]/>
+documented in L<perlre>.
+
+No matter which locale you are in, the alphabetic characters are
+the characters in \w without the digits and the underscore.
+As a regex, that looks like C</[^\W\d_]/>. Its complement,
+the non-alphabetics, is then everything in \W along with
+the digits and the underscore, or C</[\W\d_]/>.
=head2 How can I quote a variable to use in a regex?
@@ -269,14 +292,26 @@ a double-quoted string (see L<perlop> for more details). Remember
also that any regex special characters will be acted on unless you
precede the substitution with \Q. Here's an example:
- $string = "to die?";
- $lhs = "die?";
- $rhs = "sleep, no more";
+ $string = "Placido P. Octopus";
+ $regex = "P.";
+
+ $string =~ s/$regex/Polyp/;
+ # $string is now "Polypacido P. Octopus"
+
+Because C<.> is special in regular expressions, and can match any
+single character, the regex C<P.> here has matched the <Pl> in the
+original string.
+
+To escape the special meaning of C<.>, we use C<\Q>:
+
+ $string = "Placido P. Octopus";
+ $regex = "P.";
- $string =~ s/\Q$lhs/$rhs/;
- # $string is now "to sleep no more"
+ $string =~ s/\Q$regex/Polyp/;
+ # $string is now "Placido Polyp Octopus"
-Without the \Q, the regex would also spuriously match "di".
+The use of C<\Q> causes the <.> in the regex to be treated as a
+regular character, so that C<P.> matches a C<P> followed by a dot.
=head2 What is C</o> really for?
@@ -368,20 +403,30 @@ A slight modification also removes C++ comments:
=head2 Can I use Perl regular expressions to match balanced text?
-Although Perl regular expressions are more powerful than "mathematical"
-regular expressions because they feature conveniences like backreferences
-(C<\1> and its ilk), they still aren't powerful enough--with
-the possible exception of bizarre and experimental features in the
-development-track releases of Perl. You still need to use non-regex
-techniques to parse balanced text, such as the text enclosed between
-matching parentheses or braces, for example.
+Historically, Perl regular expressions were not capable of matching
+balanced text. As of more recent versions of perl including 5.6.1
+experimental features have been added that make it possible to do this.
+Look at the documentation for the (??{ }) construct in recent perlre manual
+pages to see an example of matching balanced parentheses. Be sure to take
+special notice of the warnings present in the manual before making use
+of this feature.
+
+CPAN contains many modules that can be useful for matching text
+depending on the context. Damian Conway provides some useful
+patterns in Regexp::Common. The module Text::Balanced provides a
+general solution to this problem.
+
+One of the common applications of balanced text matching is working
+with XML and HTML. There are many modules available that support
+these needs. Two examples are HTML::Parser and XML::Parser. There
+are many others.
An elaborate subroutine (for 7-bit ASCII only) to pull out balanced
and possibly nested single chars, like C<`> and C<'>, C<{> and C<}>,
or C<(> and C<)> can be found in
-http://www.perl.com/CPAN/authors/id/TOMC/scripts/pull_quotes.gz .
+http://www.cpan.org/authors/id/TOMC/scripts/pull_quotes.gz .
-The C::Scan module from CPAN contains such subs for internal use,
+The C::Scan module from CPAN also contains such subs for internal use,
but they are undocumented.
=head2 What does it mean that regexes are greedy? How can I get around it?
@@ -409,9 +454,9 @@ playing hot potato.
Use the split function:
while (<>) {
- foreach $word ( split ) {
+ foreach $word ( split ) {
# do something with $word here
- }
+ }
}
Note that this isn't really a word in the English sense; it's just
@@ -445,7 +490,7 @@ in the previous question:
If you wanted to do the same thing for lines, you wouldn't need a
regular expression:
- while (<>) {
+ while (<>) {
$seen{$_}++;
}
while ( ($line, $count) = each %seen ) {
@@ -467,12 +512,12 @@ The following is extremely inefficient:
@popstates = qw(CO ON MI WI MN);
while (defined($line = <>)) {
for $state (@popstates) {
- if ($line =~ /\b$state\b/i) {
+ if ($line =~ /\b$state\b/i) {
print $line;
last;
}
}
- }
+ }
That's because Perl has to recompile all those patterns for each of
the lines of the file. As of the 5.005 release, there's a much better
@@ -530,69 +575,96 @@ variable is no longer "expensive" the way the other two are.
=head2 What good is C<\G> in a regular expression?
-The notation C<\G> is used in a match or substitution in conjunction with
-the C</g> modifier to anchor the regular expression to the point just past
-where the last match occurred, i.e. the pos() point. A failed match resets
-the position of C<\G> unless the C</c> modifier is in effect. C<\G> can be
-used in a match without the C</g> modifier; it acts the same (i.e. still
-anchors at the pos() point) but of course only matches once and does not
-update pos(), as non-C</g> expressions never do. C<\G> in an expression
-applied to a target string that has never been matched against a C</g>
-expression before or has had its pos() reset is functionally equivalent to
-C<\A>, which matches at the beginning of the string.
-
-For example, suppose you had a line of text quoted in standard mail
-and Usenet notation, (that is, with leading C<< > >> characters), and
-you want change each leading C<< > >> into a corresponding C<:>. You
-could do so in this way:
-
- s/^(>+)/':' x length($1)/gem;
-
-Or, using C<\G>, the much simpler (and faster):
-
- s/\G>/:/g;
-
-A more sophisticated use might involve a tokenizer. The following
-lex-like example is courtesy of Jeffrey Friedl. It did not work in
-5.003 due to bugs in that release, but does work in 5.004 or better.
-(Note the use of C</c>, which prevents a failed match with C</g> from
-resetting the search position back to the beginning of the string.)
-
- while (<>) {
- chomp;
- PARSER: {
- m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
- m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
- m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
- m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
- }
- }
-
-Of course, that could have been written as
+You use the C<\G> anchor to start the next match on the same
+string where the last match left off. The regular
+expression engine cannot skip over any characters to find
+the next match with this anchor, so C<\G> is similar to the
+beginning of string anchor, C<^>. The C<\G> anchor is typically
+used with the C<g> flag. It uses the value of pos()
+as the position to start the next match. As the match
+operator makes successive matches, it updates pos() with the
+position of the next character past the last match (or the
+first character of the next match, depending on how you like
+to look at it). Each string has its own pos() value.
+
+Suppose you want to match all of consective pairs of digits
+in a string like "1122a44" and stop matching when you
+encounter non-digits. You want to match C<11> and C<22> but
+the letter <a> shows up between C<22> and C<44> and you want
+to stop at C<a>. Simply matching pairs of digits skips over
+the C<a> and still matches C<44>.
+
+ $_ = "1122a44";
+ my @pairs = m/(\d\d)/g; # qw( 11 22 44 )
+
+If you use the \G anchor, you force the match after C<22> to
+start with the C<a>. The regular expression cannot match
+there since it does not find a digit, so the next match
+fails and the match operator returns the pairs it already
+found.
+
+ $_ = "1122a44";
+ my @pairs = m/\G(\d\d)/g; # qw( 11 22 )
+
+You can also use the C<\G> anchor in scalar context. You
+still need the C<g> flag.
+
+ $_ = "1122a44";
+ while( m/\G(\d\d)/g )
+ {
+ print "Found $1\n";
+ }
+
+After the match fails at the letter C<a>, perl resets pos()
+and the next match on the same string starts at the beginning.
+
+ $_ = "1122a44";
+ while( m/\G(\d\d)/g )
+ {
+ print "Found $1\n";
+ }
+
+ print "Found $1 after while" if m/(\d\d)/g; # finds "11"
+
+You can disable pos() resets on fail with the C<c> flag.
+Subsequent matches start where the last successful match
+ended (the value of pos()) even if a match on the same
+string as failed in the meantime. In this case, the match
+after the while() loop starts at the C<a> (where the last
+match stopped), and since it does not use any anchor it can
+skip over the C<a> to find "44".
+
+ $_ = "1122a44";
+ while( m/\G(\d\d)/gc )
+ {
+ print "Found $1\n";
+ }
+
+ print "Found $1 after while" if m/(\d\d)/g; # finds "44"
+
+Typically you use the C<\G> anchor with the C<c> flag
+when you want to try a different match if one fails,
+such as in a tokenizer. Jeffrey Friedl offers this example
+which works in 5.004 or later.
while (<>) {
chomp;
PARSER: {
- if ( /\G( \d+\b )/gcx {
- print "number: $1\n";
- redo PARSER;
- }
- if ( /\G( \w+ )/gcx {
- print "word: $1\n";
- redo PARSER;
- }
- if ( /\G( \s+ )/gcx {
- print "space: $1\n";
- redo PARSER;
- }
- if ( /\G( [^\w\d]+ )/gcx {
- print "other: $1\n";
- redo PARSER;
- }
+ m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
+ m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
+ m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
+ m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
}
}
-but then you lose the vertical alignment of the regular expressions.
+For each line, the PARSER loop first tries to match a series
+of digits followed by a word boundary. This match has to
+start at the place the last match left off (or the beginning
+of the string on the first match). Since C<m/ \G( \d+\b
+)/gcx> uses the C<c> flag, if the string does not match that
+regular expression, perl does not reset pos() and the next
+match starts at the same position to try a different
+pattern.
=head2 Are Perl regexes DFAs or NFAs? Are they POSIX compliant?
@@ -607,20 +679,34 @@ guaranteed is slowness.) See the book "Mastering Regular Expressions"
hope to know on these matters (a full citation appears in
L<perlfaq2>).
-=head2 What's wrong with using grep or map in a void context?
+=head2 What's wrong with using grep in a void context?
+
+The problem is that grep builds a return list, regardless of the context.
+This means you're making Perl go to the trouble of building a list that
+you then just throw away. If the list is large, you waste both time and space.
+If your intent is to iterate over the list, then use a for loop for this
+purpose.
-Both grep and map build a return list, regardless of their context.
-This means you're making Perl go to the trouble of building up a
-return list that you then just ignore. That's no way to treat a
-programming language, you insensitive scoundrel!
+In perls older than 5.8.1, map suffers from this problem as well.
+But since 5.8.1, this has been fixed, and map is context aware - in void
+context, no lists are constructed.
=head2 How can I match strings with multibyte characters?
-This is hard, and there's no good way. Perl does not directly support
-wide characters. It pretends that a byte and a character are
-synonymous. The following set of approaches was offered by Jeffrey
-Friedl, whose article in issue #5 of The Perl Journal talks about this
-very matter.
+Starting from Perl 5.6 Perl has had some level of multibyte character
+support. Perl 5.8 or later is recommended. Supported multibyte
+character repertoires include Unicode, and legacy encodings
+through the Encode module. See L<perluniintro>, L<perlunicode>,
+and L<Encode>.
+
+If you are stuck with older Perls, you can do Unicode with the
+C<Unicode::String> module, and character conversions using the
+C<Unicode::Map8> and C<Unicode::Map> modules. If you are using
+Japanese encodings, you might try using the jperl 5.005_03.
+
+Finally, the following set of approaches was offered by Jeffrey
+Friedl, whose article in issue #5 of The Perl Journal talks about
+this very matter.
Let's suppose you have some weird Martian encoding where pairs of
ASCII uppercase letters encode single Martian letters (i.e. the two
@@ -639,8 +725,8 @@ looks like it is because "SG" is next to "XX", but there's no real
Here are a few ways, all painful, to deal with it:
- $martian =~ s/([A-Z][A-Z])/ $1 /g; # Make sure adjacent ``martian'' bytes
- # are no longer adjacent.
+ $martian =~ s/([A-Z][A-Z])/ $1 /g; # Make sure adjacent ``martian''
+ # bytes are no longer adjacent.
print "found GX!\n" if $martian =~ /GX/;
Or like this:
@@ -658,13 +744,21 @@ Or like this:
print "found GX!\n", last if $1 eq 'GX';
}
-Or like this:
+Here's another, slightly less painful, way to do it from Benjamin
+Goldberg:
+
+ $martian =~ m/
+ (?!<[A-Z])
+ (?:[A-Z][A-Z])*?
+ GX
+ /x;
- die "sorry, Perl doesn't (yet) have Martian support )-:\n";
+This succeeds if the "martian" character GX is in the string, and fails
+otherwise. If you don't like using (?!<), you can replace (?!<[A-Z])
+with (?:^|[^A-Z]).
-There are many double- (and multi-) byte encodings commonly used these
-days. Some versions of these have 1-, 2-, 3-, and 4-byte characters,
-all mixed.
+It does have the drawback of putting the wrong thing in $-[0] and $+[0],
+but this usually can be worked around.
=head2 How do I match a pattern that is supplied by the user?
@@ -694,15 +788,11 @@ in L<perlre>.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
+Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
All rights reserved.
-When included as part of the Standard Version of Perl, or as part of
-its complete documentation whether printed or otherwise, this work
-may be distributed only under the terms of Perl's Artistic License.
-Any distribution of this file or derivatives thereof I<outside>
-of that package require that special arrangements be made with
-copyright holder.
+This documentation is free; you can redistribute it and/or modify it
+under the same terms as Perl itself.
Irrespective of its distribution, all code examples in this file
are hereby placed into the public domain. You are permitted and
diff --git a/pod/perlfaq7.pod b/pod/perlfaq7.pod
index 0299c2d893..96d6b88d4a 100644
--- a/pod/perlfaq7.pod
+++ b/pod/perlfaq7.pod
@@ -1,6 +1,6 @@
=head1 NAME
-perlfaq7 - Perl Language Issues ($Revision: 1.28 $, $Date: 1999/05/23 20:36:18 $)
+perlfaq7 - General Perl Language Issues ($Revision: 1.15 $, $Date: 2003/07/24 02:17:21 $)
=head1 DESCRIPTION
@@ -38,7 +38,7 @@ really type specifiers:
Note that <FILE> is I<neither> the type specifier for files
nor the name of the handle. It is the C<< <> >> operator applied
to the handle FILE. It reads one line (well, record--see
-L<perlvar/$/>) from the handle FILE in scalar context, or I<all> lines
+L<perlvar/$E<sol>>) from the handle FILE in scalar context, or I<all> lines
in list context. When performing open, close, or any other operation
besides C<< <> >> on files, or even when talking about the handle, do
I<not> use the brackets. These are correct: C<eof(FH)>, C<seek(FH, 0,
@@ -82,6 +82,11 @@ Another way is to use undef as an element on the left-hand-side:
($dev, $ino, undef, undef, $uid, $gid) = stat($file);
+You can also use a list slice to select only the elements that
+you need:
+
+ ($dev, $ino, $uid, $gid) = ( stat($file) )[0,1,4,5];
+
=head2 How do I temporarily block warnings?
If you are running Perl 5.6.0 or better, the C<use warnings> pragma
@@ -167,81 +172,15 @@ details, read L<perlmod>. You'll also find L<Exporter> helpful. If
you're writing a C or mixed-language module with both C and Perl, then
you should study L<perlxstut>.
-Here's a convenient template you might wish you use when starting your
-own module. Make sure to change the names appropriately.
-
- package Some::Module; # assumes Some/Module.pm
-
- use strict;
- use warnings;
-
- BEGIN {
- use Exporter ();
- our ($VERSION, @ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS);
-
- ## set the version for version checking; uncomment to use
- ## $VERSION = 1.00;
-
- # if using RCS/CVS, this next line may be preferred,
- # but beware two-digit versions.
- $VERSION = do{my@r=q$Revision: 1.28 $=~/\d+/g;sprintf '%d.'.'%02d'x$#r,@r};
-
- @ISA = qw(Exporter);
- @EXPORT = qw(&func1 &func2 &func3);
- %EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ],
-
- # your exported package globals go here,
- # as well as any optionally exported functions
- @EXPORT_OK = qw($Var1 %Hashit);
- }
- our @EXPORT_OK;
-
- # exported package globals go here
- our $Var1;
- our %Hashit;
-
- # non-exported package globals go here
- our @more;
- our $stuff;
-
- # initialize package globals, first exported ones
- $Var1 = '';
- %Hashit = ();
-
- # then the others (which are still accessible as $Some::Module::stuff)
- $stuff = '';
- @more = ();
-
- # all file-scoped lexicals must be created before
- # the functions below that use them.
-
- # file-private lexicals go here
- my $priv_var = '';
- my %secret_hash = ();
-
- # here's a file-private function as a closure,
- # callable as &$priv_func; it cannot be prototyped.
- my $priv_func = sub {
- # stuff goes here.
- };
-
- # make all your functions, whether exported or not;
- # remember to put something interesting in the {} stubs
- sub func1 {} # no prototype
- sub func2() {} # proto'd void
- sub func3($$) {} # proto'd to 2 scalars
-
- # this one isn't exported, but could be called!
- sub func4(\%) {} # proto'd to 1 hash ref
-
- END { } # module clean-up code here (global destructor)
-
- 1; # modules must return true
-
-The h2xs program will create stubs for all the important stuff for you:
+The C<h2xs> program will create stubs for all the important stuff for you:
% h2xs -XA -n My::Module
+The C<-X> switch tells C<h2xs> that you are not using C<XS> extension
+code. The C<-A> switch tells C<h2xs> that you are not using the
+AutoLoader, and the C<-n> switch specifies the name of the module.
+See L<h2xs> for more details.
+
=head2 How do I create a class?
See L<perltoot> for an introduction to classes and objects, as well as
@@ -249,17 +188,9 @@ L<perlobj> and L<perlbot>.
=head2 How can I tell if a variable is tainted?
-See L<perlsec/"Laundering and Detecting Tainted Data">. Here's an
-example (which doesn't use any system calls, because the kill()
-is given no processes to signal):
-
- sub is_tainted {
- return ! eval { join('',@_), kill 0; 1; };
- }
-
-This is not C<-w> clean, however. There is no C<-w> clean way to
-detect taintedness--take this as a hint that you should untaint
-all possibly-tainted data.
+You can use the tainted() function of the Scalar::Util module, available
+from CPAN (or included with Perl since release 5.8.0).
+See also L<perlsec/"Laundering and Detecting Tainted Data">.
=head2 What's a closure?
@@ -372,37 +303,21 @@ reference to an existing or anonymous variable or function:
=item Passing Filehandles
-To pass filehandles to subroutines, use the C<*FH> or C<\*FH> notations.
-These are "typeglobs"--see L<perldata/"Typeglobs and Filehandles">
-and especially L<perlsub/"Pass by Reference"> for more information.
-
-Here's an excerpt:
-
-If you're passing around filehandles, you could usually just use the bare
-typeglob, like *STDOUT, but typeglobs references would be better because
-they'll still work properly under C<use strict 'refs'>. For example:
+As of Perl 5.6, you can represent filehandles with scalar variables
+which you treat as any other scalar.
- splutter(\*STDOUT);
- sub splutter {
- my $fh = shift;
- print $fh "her um well a hmmm\n";
- }
+ open my $fh, $filename or die "Cannot open $filename! $!";
+ func( $fh );
- $rec = get_rec(\*STDIN);
- sub get_rec {
- my $fh = shift;
- return scalar <$fh>;
- }
+ sub func {
+ my $passed_fh = shift;
-If you're planning on generating new filehandles, you could do this:
+ my $line = <$fh>;
+ }
- sub openit {
- my $path = shift;
- local *FH;
- return open (FH, $path) ? *FH : undef;
- }
- $fh = openit('< /etc/motd');
- print <$fh>;
+Before Perl 5.6, you had to use the C<*FH> or C<\*FH> notations.
+These are "typeglobs"--see L<perldata/"Typeglobs and Filehandles">
+and especially L<perlsub/"Pass by Reference"> for more information.
=item Passing Regexes
@@ -560,28 +475,38 @@ In summary, local() doesn't make what you think of as private, local
variables. It gives a global variable a temporary value. my() is
what you're looking for if you want private variables.
-See L<perlsub/"Private Variables via my()"> and
+See L<perlsub/"Private Variables via my()"> and
L<perlsub/"Temporary Values via local()"> for excruciating details.
=head2 How can I access a dynamic variable while a similarly named lexical is in scope?
-You can do this via symbolic references, provided you haven't set
-C<use strict "refs">. So instead of $var, use C<${'var'}>.
+If you know your package, you can just mention it explicitly, as in
+$Some_Pack::var. Note that the notation $::var is B<not> the dynamic $var
+in the current package, but rather the one in the "main" package, as
+though you had written $main::var.
- local $var = "global";
- my $var = "lexical";
+ use vars '$var';
+ local $var = "global";
+ my $var = "lexical";
- print "lexical is $var\n";
+ print "lexical is $var\n";
+ print "global is $main::var\n";
- no strict 'refs';
- print "global is ${'var'}\n";
+Alternatively you can use the compiler directive our() to bring a
+dynamic variable into the current lexical scope.
-If you know your package, you can just mention it explicitly, as in
-$Some_Pack::var. Note that the notation $::var is I<not> the dynamic
-$var in the current package, but rather the one in the C<main>
-package, as though you had written $main::var. Specifying the package
-directly makes you hard-code its name, but it executes faster and
-avoids running afoul of C<use strict "refs">.
+ require 5.006; # our() did not exist before 5.6
+ use vars '$var';
+
+ local $var = "global";
+ my $var = "lexical";
+
+ print "lexical is $var\n";
+
+ {
+ our $var;
+ print "global is $var\n";
+ }
=head2 What's the difference between deep and shallow binding?
@@ -594,7 +519,7 @@ However, dynamic variables (aka global, local, or package variables)
are effectively shallowly bound. Consider this just one more reason
not to use them. See the answer to L<"What's a closure?">.
-=head2 Why doesn't "my($foo) = <FILE>;" work right?
+=head2 Why doesn't "my($foo) = E<lt>FILEE<gt>;" work right?
C<my()> and C<local()> give list context to the right hand side
of C<=>. The <FH> read operation, like so many of Perl's
@@ -657,22 +582,32 @@ where they don't belong.
This is explained in more depth in the L<perlsyn>. Briefly, there's
no official case statement, because of the variety of tests possible
in Perl (numeric comparison, string comparison, glob comparison,
-regex matching, overloaded comparisons, ...). Larry couldn't decide
-how best to do this, so he left it out, even though it's been on the
-wish list since perl1.
+regex matching, overloaded comparisons, ...).
+Larry couldn't decide how best to do this, so he left it out, even
+though it's been on the wish list since perl1.
+
+Starting from Perl 5.8 to get switch and case one can use the
+Switch extension and say:
+
+ use Switch;
-The general answer is to write a construct like this:
+after which one has switch and case. It is not as fast as it could be
+because it's not really part of the language (it's done using source
+filters) but it is available, and it's very flexible.
+
+But if one wants to use pure Perl, the general answer is to write a
+construct like this:
for ($variable_to_test) {
if (/pat1/) { } # do something
elsif (/pat2/) { } # do something else
elsif (/pat3/) { } # do something else
else { } # default
- }
+ }
Here's a simple example of a switch based on pattern matching, this
time lined up in a way to make it look more like a switch statement.
-We'll do a multi-way conditional based on the type of reference stored
+We'll do a multiway conditional based on the type of reference stored
in $whatchamacallit:
SWITCH: for (ref $whatchamacallit) {
@@ -705,7 +640,7 @@ in $whatchamacallit:
}
-See C<perlsyn/"Basic BLOCKs and Switch Statements"> for many other
+See C<perlsyn/"Basic BLOCKs and Switch Statements"> for many other
examples in this style.
Sometimes you should change the positions of the constant and the variable.
@@ -723,7 +658,7 @@ C<"STOP"> here:
elsif ("LIST" =~ /^\Q$answer/i) { print "Action is list\n" }
elsif ("EDIT" =~ /^\Q$answer/i) { print "Action is edit\n" }
-A totally different approach is to create a hash of function references.
+A totally different approach is to create a hash of function references.
my %commands = (
"happy" => \&joy,
@@ -738,33 +673,18 @@ A totally different approach is to create a hash of function references.
$commands{$string}->();
} else {
print "No such command: $string\n";
- }
+ }
-=head2 How can I catch accesses to undefined variables/functions/methods?
+=head2 How can I catch accesses to undefined variables, functions, or methods?
The AUTOLOAD method, discussed in L<perlsub/"Autoloading"> and
L<perltoot/"AUTOLOAD: Proxy Methods">, lets you capture calls to
undefined functions and methods.
When it comes to undefined variables that would trigger a warning
-under C<-w>, you can use a handler to trap the pseudo-signal
-C<__WARN__> like this:
-
- $SIG{__WARN__} = sub {
-
- for ( $_[0] ) { # voici un switch statement
-
- /Use of uninitialized value/ && do {
- # promote warning to a fatal
- die $_;
- };
-
- # other warning cases to catch could go here;
-
- warn $_;
- }
+under C<use warnings>, you can promote the warning to an error.
- };
+ use warnings FATAL => qw(uninitialized);
=head2 Why can't a method included in this same file be found?
@@ -784,7 +704,7 @@ C<< Guru->find("Samy") >>) instead. Object notation is explained in
L<perlobj>.
Make sure to read about creating modules in L<perlmod> and
-the perils of indirect objects in L<perlobj/"WARNING">.
+the perils of indirect objects in L<perlobj/"Method Invocation">.
=head2 How can I find out my current package?
@@ -805,29 +725,29 @@ not necessarily the same as the one in which you were compiled):
=head2 How can I comment out a large block of perl code?
-Use embedded POD to discard it:
+You can use embedded POD to discard it. Enclose the blocks you want
+to comment out in POD markers, for example C<=for nobody> and C<=cut>
+(which marks ends of POD blocks).
# program is here
=for nobody
- This paragraph is commented out
-
- # program continues
-
- =begin comment text
all of this stuff
here will be ignored
by everyone
- =end comment text
-
=cut
-This can't go just anywhere. You have to put a pod directive where
-the parser is expecting a new statement, not just in the middle
-of an expression or some other arbitrary yacc grammar production.
+ # program continues
+
+The pod directives cannot go just anywhere. You must put a
+pod directive where the parser is expecting a new statement,
+not just in the middle of an expression or some other
+arbitrary grammar production.
+
+See L<perlpod> for more details.
=head2 How do I clear a package?
@@ -836,7 +756,7 @@ Use this code, provided by Mark-Jason Dominus:
sub scrub_package {
no strict 'refs';
my $pack = shift;
- die "Shouldn't delete main package"
+ die "Shouldn't delete main package"
if $pack eq "" || $pack eq "main";
my $stash = *{$pack . '::'}{HASH};
my $name;
@@ -851,7 +771,7 @@ Use this code, provided by Mark-Jason Dominus:
}
}
-Or, if you're using a recent release of Perl, you can
+Or, if you're using a recent release of Perl, you can
just use the Symbol::delete_package() function instead.
=head2 How can I use a variable as a variable name?
@@ -883,7 +803,7 @@ symbolic references, you are just using the package's symbol-table hash
(like C<%main::>) instead of a user-defined hash. The solution is to
use your own hash or a real reference instead.
- $fred = 23;
+ $USER_VARS{"fred"} = 23;
$varname = "fred";
$USER_VARS{$varname}++; # not $$varname++
@@ -919,7 +839,7 @@ wanted to use another scalar variable to refer to those by name.
$name = "fred";
$$name{WIFE} = "wilma"; # set %fred
- $name = "barney";
+ $name = "barney";
$$name{WIFE} = "betty"; # set %barney
This is still a symbolic reference, and is still saddled with the
@@ -943,7 +863,7 @@ can play around with the symbol table. For example:
for my $name (@colors) {
no strict 'refs'; # renege for the block
*$name = sub { "<FONT COLOR='$name'>@_</FONT>" };
- }
+ }
All those functions (red(), blue(), green(), etc.) appear to be separate,
but the real code in the closure actually was compiled only once.
@@ -954,17 +874,38 @@ subroutines, because they are always global--you can't use my() on them.
For scalars, arrays, and hashes, though--and usually for subroutines--
you probably only want to use hard references.
+=head2 What does "bad interpreter" mean?
+
+The "bad interpreter" message comes from the shell, not perl. The
+actual message may vary depending on your platform, shell, and locale
+settings.
+
+If you see "bad interpreter - no such file or directory", the first
+line in your perl script (the "shebang" line) does not contain the
+right path to perl (or any other program capable of running scripts).
+Sometimes this happens when you move the script from one machine to
+another and each machine has a different path to perl---/usr/bin/perl
+versus /usr/local/bin/perl for instance.
+
+If you see "bad interpreter: Permission denied", you need to make your
+script executable.
+
+In either case, you should still be able to run the scripts with perl
+explicitly:
+
+ % perl script.pl
+
+If you get a message like "perl: command not found", perl is not in
+your PATH, which might also mean that the location of perl is not
+where you expect it so you need to adjust your shebang line.
+
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
+Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
All rights reserved.
-When included as part of the Standard Version of Perl, or as part of
-its complete documentation whether printed or otherwise, this work
-may be distributed only under the terms of Perl's Artistic License.
-Any distribution of this file or derivatives thereof I<outside>
-of that package require that special arrangements be made with
-copyright holder.
+This documentation is free; you can redistribute it and/or modify it
+under the same terms as Perl itself.
Irrespective of its distribution, all code examples in this file
are hereby placed into the public domain. You are permitted and
diff --git a/pod/perlfaq8.pod b/pod/perlfaq8.pod
index 1df3b6ac0a..2fceab143f 100644
--- a/pod/perlfaq8.pod
+++ b/pod/perlfaq8.pod
@@ -1,6 +1,6 @@
=head1 NAME
-perlfaq8 - System Interaction ($Revision: 1.39 $, $Date: 1999/05/23 18:37:57 $)
+perlfaq8 - System Interaction ($Revision: 1.17 $, $Date: 2003/01/26 17:44:04 $)
=head1 DESCRIPTION
@@ -77,7 +77,7 @@ Or like this:
Controlling input buffering is a remarkably system-dependent matter.
On many systems, you can just use the B<stty> command as shown in
L<perlfunc/getc>, but as you see, that's already getting you into
-portability snags.
+portability snags.
open(TTY, "+</dev/tty") or die "no tty: $!";
system "stty cbreak </dev/tty >/dev/tty 2>&1";
@@ -188,14 +188,14 @@ positions, etc, you might wish to use Term::Cap module:
=head2 How do I get the screen size?
-If you have Term::ReadKey module installed from CPAN,
+If you have Term::ReadKey module installed from CPAN,
you can use it to fetch the width and height in characters
and in pixels:
use Term::ReadKey;
($wchar, $hchar, $wpixels, $hpixels) = GetTerminalSize();
-This is more portable than the raw C<ioctl>, but not as
+This is more portable than the raw C<ioctl>, but not as
illustrative:
require 'sys/ioctl.ph';
@@ -275,7 +275,7 @@ next.
If you expect characters to get to your device when you print() them,
you'll want to autoflush that filehandle. You can use select()
-and the C<$|> variable to control autoflushing (see L<perlvar/$|>
+and the C<$|> variable to control autoflushing (see L<perlvar/$E<verbar>>
and L<perlfunc/select>, or L<perlfaq5>, ``How do I flush/unbuffer an
output filehandle? Why must I do this?''):
@@ -294,7 +294,7 @@ of code just because you're afraid of a little $| variable:
DEV->autoflush(1);
As mentioned in the previous item, this still doesn't work when using
-socket I/O between Unix and Macintosh. You'll need to hardcode your
+socket I/O between Unix and Macintosh. You'll need to hard code your
line terminators, in that case.
=item non-blocking input
@@ -346,7 +346,12 @@ passwd(1), for example).
=head2 How do I start a process in the background?
-You could use
+Several modules can start other processes that do not block
+your Perl program. You can use IPC::Open3, Parallel::Jobs,
+IPC::Run, and some of the POE modules. See CPAN for more
+details.
+
+You could also use
system("cmd &")
@@ -375,10 +380,26 @@ not an issue with C<system("cmd&")>.
=item Zombies
-You have to be prepared to "reap" the child process when it finishes
+You have to be prepared to "reap" the child process when it finishes.
$SIG{CHLD} = sub { wait };
+ $SIG{CHLD} = 'IGNORE';
+
+You can also use a double fork. You immediately wait() for your
+first child, and the init daemon will wait() for your grandchild once
+it exits.
+
+ unless ($pid = fork) {
+ unless (fork) {
+ exec "what you really wanna do";
+ die "exec failed!";
+ }
+ exit 0;
+ }
+ waitpid($pid,0);
+
+
See L<perlipc/"Signals"> for other examples of code to do this.
Zombies are not an issue with C<system("prog &")>.
@@ -424,8 +445,8 @@ If perl was installed correctly and your shadow library was written
properly, the getpw*() functions described in L<perlfunc> should in
theory provide (read-only) access to entries in the shadow password
file. To change the file, make a new shadow password file (the format
-varies from system to system--see L<passwd(5)> for specifics) and use
-pwd_mkdb(8) to install it (see L<pwd_mkdb(8)> for more details).
+varies from system to system--see L<passwd> for specifics) and use
+pwd_mkdb(8) to install it (see L<pwd_mkdb> for more details).
=head2 How do I set the time and date?
@@ -435,7 +456,7 @@ program. (There is no way to set the time and date on a per-process
basis.) This mechanism will work for Unix, MS-DOS, Windows, and NT;
the VMS equivalent is C<set time>.
-However, if all you want to do is change your timezone, you can
+However, if all you want to do is change your time zone, you can
probably get away with setting an environment variable:
$ENV{TZ} = "MST7MDT"; # unixish
@@ -447,12 +468,14 @@ probably get away with setting an environment variable:
If you want finer granularity than the 1 second that the sleep()
function provides, the easiest way is to use the select() function as
documented in L<perlfunc/"select">. Try the Time::HiRes and
-the BSD::Itimer modules (available from CPAN).
+the BSD::Itimer modules (available from CPAN, and starting from
+Perl 5.8 Time::HiRes is part of the standard distribution).
=head2 How can I measure time under a second?
In general, you may not be able to. The Time::HiRes module (available
-from CPAN) provides this functionality for some systems.
+from CPAN, and starting from Perl 5.8 part of the standard distribution)
+provides this functionality for some systems.
If your system supports both the syscall() function in Perl as well as
a system call like gettimeofday(2), then you may be able to do
@@ -488,14 +511,14 @@ something like this:
Release 5 of Perl added the END block, which can be used to simulate
atexit(). Each package's END block is called when the program or
-thread ends (see L<perlmod> manpage for more details).
+thread ends (see L<perlmod> manpage for more details).
For example, you can use this to make sure your filter program
managed to finish its output without filling up the disk:
END {
close(STDOUT) || die "stdout close failed: $!";
- }
+ }
The END block isn't called when untrapped signals kill the program,
though, so if you use END blocks you should also use
@@ -533,7 +556,10 @@ syscall(), you can use the syscall function (documented in
L<perlfunc>).
Remember to check the modules that came with your distribution, and
-CPAN as well--someone may already have written a module to do it.
+CPAN as well---someone may already have written a module to do it. On
+Windows, try Win32::API. On Macs, try Mac::Carbon. If no module
+has an interface to the C function, you can inline a bit of C in your
+Perl source with Inline::C.
=head2 Where do I get the include files to do ioctl() or syscall()?
@@ -571,8 +597,8 @@ scripts inherently insecure. Perl gives you a number of options
The IPC::Open2 module (part of the standard perl distribution) is an
easy-to-use approach that internally uses pipe(), fork(), and exec() to do
the job. Make sure you read the deadlock warnings in its documentation,
-though (see L<IPC::Open2>). See
-L<perlipc/"Bidirectional Communication with Another Process"> and
+though (see L<IPC::Open2>). See
+L<perlipc/"Bidirectional Communication with Another Process"> and
L<perlipc/"Bidirectional Communication with Yourself">
You may also use the IPC::Open3 module (part of the standard perl
@@ -602,6 +628,68 @@ With system(), both STDOUT and STDERR will go the same place as the
script's STDOUT and STDERR, unless the system() command redirects them.
Backticks and open() read B<only> the STDOUT of your command.
+You can also use the open3() function from IPC::Open3. Benjamin
+Goldberg provides some sample code:
+
+To capture a program's STDOUT, but discard its STDERR:
+
+ use IPC::Open3;
+ use File::Spec;
+ use Symbol qw(gensym);
+ open(NULL, ">", File::Spec->devnull);
+ my $pid = open3(gensym, \*PH, ">&NULL", "cmd");
+ while( <PH> ) { }
+ waitpid($pid, 0);
+
+To capture a program's STDERR, but discard its STDOUT:
+
+ use IPC::Open3;
+ use File::Spec;
+ use Symbol qw(gensym);
+ open(NULL, ">", File::Spec->devnull);
+ my $pid = open3(gensym, ">&NULL", \*PH, "cmd");
+ while( <PH> ) { }
+ waitpid($pid, 0);
+
+To capture a program's STDERR, and let its STDOUT go to our own STDERR:
+
+ use IPC::Open3;
+ use Symbol qw(gensym);
+ my $pid = open3(gensym, ">&STDERR", \*PH, "cmd");
+ while( <PH> ) { }
+ waitpid($pid, 0);
+
+To read both a command's STDOUT and its STDERR separately, you can
+redirect them to temp files, let the command run, then read the temp
+files:
+
+ use IPC::Open3;
+ use Symbol qw(gensym);
+ use IO::File;
+ local *CATCHOUT = IO::File->new_tempfile;
+ local *CATCHERR = IO::File->new_tempfile;
+ my $pid = open3(gensym, ">&CATCHOUT", ">&CATCHERR", "cmd");
+ waitpid($pid, 0);
+ seek $_, 0, 0 for \*CATCHOUT, \*CATCHERR;
+ while( <CATCHOUT> ) {}
+ while( <CATCHERR> ) {}
+
+But there's no real need for *both* to be tempfiles... the following
+should work just as well, without deadlocking:
+
+ use IPC::Open3;
+ use Symbol qw(gensym);
+ use IO::File;
+ local *CATCHERR = IO::File->new_tempfile;
+ my $pid = open3(gensym, \*CATCHOUT, ">&CATCHERR", "cmd");
+ while( <CATCHOUT> ) {}
+ waitpid($pid, 0);
+ seek CATCHERR, 0, 0;
+ while( <CATCHERR> ) {}
+
+And it'll be faster, too, since we can begin processing the program's
+stdout immediately, rather than waiting for the program to finish.
+
With any of these, you can change file descriptors before the call:
open(STDOUT, ">logfile");
@@ -632,9 +720,10 @@ STDOUT).
Note that you I<must> use Bourne shell (sh(1)) redirection syntax in
backticks, not csh(1)! Details on why Perl's system() and backtick
-and pipe opens all use the Bourne shell are in
-http://www.perl.com/CPAN/doc/FMTEYEWTK/versus/csh.whynot .
-To capture a command's STDERR and STDOUT together:
+and pipe opens all use the Bourne shell are in the
+F<versus/csh.whynot> article in the "Far More Than You Ever Wanted To
+Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz . To
+capture a command's STDERR and STDOUT together:
$output = `cmd 2>&1`; # either with backticks
$pid = open(PH, "cmd 2>&1 |"); # or with an open pipe
@@ -677,50 +766,38 @@ there, and the old standard error shows up on the old standard out.
=head2 Why doesn't open() return an error when a pipe open fails?
-Because the pipe open takes place in two steps: first Perl calls
-fork() to start a new process, then this new process calls exec() to
-run the program you really wanted to open. The first step reports
-success or failure to your process, so open() can only tell you
-whether the fork() succeeded or not.
-
-To find out if the exec() step succeeded, you have to catch SIGCHLD
-and wait() to get the exit status. You should also catch SIGPIPE if
-you're writing to the child--you may not have found out the exec()
-failed by the time you write. This is documented in L<perlipc>.
-
-In some cases, even this won't work. If the second argument to a
-piped open() contains shell metacharacters, perl fork()s, then exec()s
-a shell to decode the metacharacters and eventually run the desired
-program. Now when you call wait(), you only learn whether or not the
-I<shell> could be successfully started...it's best to avoid shell
-metacharacters.
+If the second argument to a piped open() contains shell
+metacharacters, perl fork()s, then exec()s a shell to decode the
+metacharacters and eventually run the desired program. If the program
+couldn't be run, it's the shell that gets the message, not Perl. All
+your Perl program can find out is whether the shell itself could be
+successfully started. You can still capture the shell's STDERR and
+check it for error messages. See L<"How can I capture STDERR from an
+external command?"> elsewhere in this document, or use the
+IPC::Open3 module.
-On systems that follow the spawn() paradigm, open() I<might> do what
-you expect--unless perl uses a shell to start your command. In this
-case the fork()/exec() description still applies.
+If there are no shell metacharacters in the argument of open(), Perl
+runs the command directly, without using the shell, and can correctly
+report whether the command started.
=head2 What's wrong with using backticks in a void context?
Strictly speaking, nothing. Stylistically speaking, it's not a good
-way to write maintainable code because backticks have a (potentially
-humongous) return value, and you're ignoring it. It's may also not be very
-efficient, because you have to read in all the lines of output, allocate
-memory for them, and then throw it away. Too often people are lulled
-to writing:
+way to write maintainable code. Perl has several operators for
+running external commands. Backticks are one; they collect the output
+from the command for use in your program. The C<system> function is
+another; it doesn't do this.
- `cp file file.bak`;
-
-And now they think "Hey, I'll just always use backticks to run programs."
-Bad idea: backticks are for capturing a program's output; the system()
-function is for running programs.
+Writing backticks in your program sends a clear message to the readers
+of your code that you wanted to collect the output of the command.
+Why send a clear message that isn't true?
Consider this line:
`cat /etc/termcap`;
-You haven't assigned the output anywhere, so it just wastes memory
-(for a little while). You forgot to check C<$?> to see whether
-the program even ran correctly, too. Even if you wrote
+You forgot to check C<$?> to see whether the program even ran
+correctly. Even if you wrote
print `cat /etc/termcap`;
@@ -737,11 +814,20 @@ processing may take place, whereas backticks do not.
=head2 How can I call backticks without shell processing?
-This is a bit tricky. Instead of writing
+This is a bit tricky. You can't simply write the command
+like this:
@ok = `grep @opts '$search_string' @filenames`;
-You have to do this:
+As of Perl 5.8.0, you can use open() with multiple arguments.
+Just like the list forms of system() and exec(), no shell
+escapes happen.
+
+ open( GREP, "-|", 'grep', @opts, $search_string, @filenames );
+ chomp(@ok = <GREP>);
+ close GREP;
+
+You can also:
my @ok = ();
if (open(GREP, "-|")) {
@@ -757,12 +843,9 @@ You have to do this:
Just as with system(), no shell escapes happen when you exec() a list.
Further examples of this can be found in L<perlipc/"Safe Pipe Opens">.
-Note that if you're stuck on Microsoft, no solution to this vexing issue
+Note that if you're use Microsoft, no solution to this vexing issue
is even possible. Even if Perl were to emulate fork(), you'd still
-be hosed, because Microsoft gives no argc/argv-style API. Their API
-always reparses from a single string, which is fundamentally wrong,
-but you're not likely to get the Gods of Redmond to acknowledge this
-and fix it for you.
+be stuck, because Microsoft does not have a argc/argv-style API.
=head2 Why can't my script read from STDIN after I gave it EOF (^D on Unix, ^Z on MS-DOS)?
@@ -809,7 +892,7 @@ causes many inefficiencies.
=head2 Can I use perl to run a telnet or ftp session?
Try the Net::FTP, TCP::Client, and Net::Telnet modules (available from
-CPAN). http://www.perl.com/CPAN/scripts/netstuff/telnet.emul.shar
+CPAN). http://www.cpan.org/scripts/netstuff/telnet.emul.shar
will also help for emulating the telnet protocol, but Net::Telnet is
quite probably easier to use..
@@ -864,7 +947,7 @@ different process from the shell it was started from. Changes to a
process are not reflected in its parent--only in any children
created after the change. There is shell magic that may allow you to
fake it by eval()ing the script's output in your shell; check out the
-comp.unix.questions FAQ for details.
+comp.unix.questions FAQ for details.
=back
@@ -885,7 +968,7 @@ module for other solutions.
=item *
-Open /dev/tty and use the TIOCNOTTY ioctl on it. See L<tty(4)>
+Open /dev/tty and use the TIOCNOTTY ioctl on it. See L<tty>
for details. Or better yet, you can just use the POSIX::setsid()
function, so you don't have to worry about process groups.
@@ -938,6 +1021,9 @@ handler, as documented in L<perlipc/"Signals"> and the section on
``Signals'' in the Camel. You may instead use the more flexible
Sys::AlarmCall module available from CPAN.
+The alarm() function is not implemented on all versions of Windows.
+Check the documentation for your specific version of Perl.
+
=head2 How do I set CPU limits?
Use the BSD::Resource module from CPAN.
@@ -946,14 +1032,19 @@ Use the BSD::Resource module from CPAN.
Use the reaper code from L<perlipc/"Signals"> to call wait() when a
SIGCHLD is received, or else use the double-fork technique described
-in L<perlfunc/fork>.
+in L<perlfaq8/"How do I start a process in the background?">.
=head2 How do I use an SQL database?
-There are a number of excellent interfaces to SQL databases. See the
-DBD::* modules available from http://www.perl.com/CPAN/modules/DBD .
-A lot of information on this can be found at
-http://www.symbolstone.org/technology/perl/DBI/
+The DBI module provides an abstract interface to most database
+servers and types, including Oracle, DB2, Sybase, mysql, Postgresql,
+ODBC, and flat files. The DBI module accesses each database type
+through a database driver, or DBD. You can see a complete list of
+available drivers on CPAN: http://www.cpan.org/modules/by-module/DBD/ .
+You can read more about DBI on http://dbi.perl.org .
+
+Other modules provide more specific access: Win32::ODBC, Alzabo, iodbc,
+and others found on CPAN Search: http://search.cpan.org .
=head2 How do I make a system() exit on control-C?
@@ -962,7 +1053,7 @@ sample code) and then have a signal handler for the INT signal that
passes the signal on to the subprocess. Or you can check for it:
$rc = system($cmd);
- if ($rc & 127) { die "signal death" }
+ if ($rc & 127) { die "signal death" }
=head2 How do I open a file without blocking?
@@ -978,9 +1069,17 @@ sysopen():
=head2 How do I install a module from CPAN?
The easiest way is to have a module also named CPAN do it for you.
-This module comes with perl version 5.004 and later. To manually install
-the CPAN module, or any well-behaved CPAN module for that matter, follow
-these steps:
+This module comes with perl version 5.004 and later.
+
+ $ perl -MCPAN -e shell
+
+ cpan shell -- CPAN exploration and modules installation (v1.59_54)
+ ReadLine support enabled
+
+ cpan> install Some::Module
+
+To manually install the CPAN module, or any well-behaved CPAN module
+for that matter, follow these steps:
=over 4
@@ -1039,20 +1138,20 @@ In general, you usually want C<use> and a proper Perl module.
=head2 How do I keep my own module/library directory?
-When you build modules, use the PREFIX option when generating
+When you build modules, use the PREFIX and LIB options when generating
Makefiles:
- perl Makefile.PL PREFIX=/u/mydir/perl
+ perl Makefile.PL PREFIX=/mydir/perl LIB=/mydir/perl/lib
then either set the PERL5LIB environment variable before you run
scripts that use the modules/libraries (see L<perlrun>) or say
- use lib '/u/mydir/perl';
+ use lib '/mydir/perl/lib';
This is almost the same as
BEGIN {
- unshift(@INC, '/u/mydir/perl');
+ unshift(@INC, '/mydir/perl/lib');
}
except that the lib module checks for machine-dependent subdirectories.
@@ -1064,7 +1163,7 @@ See Perl's L<lib> for more information.
use lib "$FindBin::Bin";
use your_own_modules;
-=head2 How do I add a directory to my include path at runtime?
+=head2 How do I add a directory to my include path (@INC) at runtime?
Here are the suggested ways of modifying your include path:
@@ -1086,15 +1185,11 @@ but other times it is not. Modern programs C<use Socket;> instead.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
+Copyright (c) 1997-2003 Tom Christiansen and Nathan Torkington.
All rights reserved.
-When included as part of the Standard Version of Perl, or as part of
-its complete documentation whether printed or otherwise, this work
-may be distributed only under the terms of Perl's Artistic License.
-Any distribution of this file or derivatives thereof I<outside>
-of that package require that special arrangements be made with
-copyright holder.
+This documentation is free; you can redistribute it and/or modify it
+under the same terms as Perl itself.
Irrespective of its distribution, all code examples in this file
are hereby placed into the public domain. You are permitted and
diff --git a/pod/perlfaq9.pod b/pod/perlfaq9.pod
index 96763802c5..f73c619b98 100644
--- a/pod/perlfaq9.pod
+++ b/pod/perlfaq9.pod
@@ -1,45 +1,71 @@
=head1 NAME
-perlfaq9 - Networking ($Revision: 1.26 $, $Date: 1999/05/23 16:08:30 $)
+perlfaq9 - Networking ($Revision: 1.15 $, $Date: 2003/01/31 17:36:57 $)
=head1 DESCRIPTION
This section deals with questions related to networking, the internet,
and a few on the web.
-=head2 My CGI script runs from the command line but not the browser. (500 Server Error)
+=head2 What is the correct form of response from a CGI script?
-If you can demonstrate that you've read the following FAQs and that
-your problem isn't something simple that can be easily answered, you'll
-probably receive a courteous and useful reply to your question if you
-post it on comp.infosystems.www.authoring.cgi (if it's something to do
-with HTTP, HTML, or the CGI protocols). Questions that appear to be Perl
-questions but are really CGI ones that are posted to comp.lang.perl.misc
-may not be so well received.
+(Alan Flavell <flavell+www@a5.ph.gla.ac.uk> answers...)
+
+The Common Gateway Interface (CGI) specifies a software interface between
+a program ("CGI script") and a web server (HTTPD). It is not specific
+to Perl, and has its own FAQs and tutorials, and usenet group,
+comp.infosystems.www.authoring.cgi
+
+The original CGI specification is at: http://hoohoo.ncsa.uiuc.edu/cgi/
-The useful FAQs and related documents are:
+Current best-practice RFC draft at: http://CGI-Spec.Golux.Com/
- CGI FAQ
- http://www.webthing.com/tutorials/cgifaq.html
+Other relevant documentation listed in: http://www.perl.org/CGI_MetaFAQ.html
- Web FAQ
- http://www.boutell.com/faq/
+These Perl FAQs very selectively cover some CGI issues. However, Perl
+programmers are strongly advised to use the CGI.pm module, to take care
+of the details for them.
- WWW Security FAQ
- http://www.w3.org/Security/Faq/
+The similarity between CGI response headers (defined in the CGI
+specification) and HTTP response headers (defined in the HTTP
+specification, RFC2616) is intentional, but can sometimes be confusing.
- HTTP Spec
- http://www.w3.org/pub/WWW/Protocols/HTTP/
+The CGI specification defines two kinds of script: the "Parsed Header"
+script, and the "Non Parsed Header" (NPH) script. Check your server
+documentation to see what it supports. "Parsed Header" scripts are
+simpler in various respects. The CGI specification allows any of the
+usual newline representations in the CGI response (it's the server's
+job to create an accurate HTTP response based on it). So "\n" written in
+text mode is technically correct, and recommended. NPH scripts are more
+tricky: they must put out a complete and accurate set of HTTP
+transaction response headers; the HTTP specification calls for records
+to be terminated with carriage-return and line-feed, i.e ASCII \015\012
+written in binary mode.
+
+Using CGI.pm gives excellent platform independence, including EBCDIC
+systems. CGI.pm selects an appropriate newline representation
+($CGI::CRLF) and sets binmode as appropriate.
+
+=head2 My CGI script runs from the command line but not the browser. (500 Server Error)
+
+Several things could be wrong. You can go through the "Troubleshooting
+Perl CGI scripts" guide at
+
+ http://www.perl.org/troubleshooting_CGI.html
+
+If, after that, you can demonstrate that you've read the FAQs and that
+your problem isn't something simple that can be easily answered, you'll
+probably receive a courteous and useful reply to your question if you
+post it on comp.infosystems.www.authoring.cgi (if it's something to do
+with HTTP or the CGI protocols). Questions that appear to be Perl
+questions but are really CGI ones that are posted to comp.lang.perl.misc
+are not so well received.
- HTML Spec
- http://www.w3.org/TR/REC-html40/
- http://www.w3.org/pub/WWW/MarkUp/
+The useful FAQs, related documents, and troubleshooting guides are
+listed in the CGI Meta FAQ:
- CGI Spec
- http://www.w3.org/CGI/
+ http://www.perl.org/CGI_MetaFAQ.html
- CGI Security FAQ
- http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt
=head2 How can I get better error messages from a CGI program?
@@ -94,7 +120,7 @@ Here's one "simple-minded" approach, that works for most files:
If you want a more complete solution, see the 3-stage striphtml
program in
-http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz
+http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz
.
Here are some tricky cases that you should think about when picking
@@ -122,29 +148,44 @@ on text like this:
=head2 How do I extract URLs?
-A quick but imperfect approach is
+You can easily extract all sorts of URLs from HTML with
+C<HTML::SimpleLinkExtor> which handles anchors, images, objects,
+frames, and many other tags that can contain a URL. If you need
+anything more complex, you can create your own subclass of
+C<HTML::LinkExtor> or C<HTML::Parser>. You might even use
+C<HTML::SimpleLinkExtor> as an example for something specifically
+suited to your needs.
+
+You can use URI::Find to extract URLs from an arbitrary text document.
+
+Less complete solutions involving regular expressions can save
+you a lot of processing time if you know that the input is simple. One
+solution from Tom Christiansen runs 100 times faster than most
+module based approaches but only extracts URLs from anchors where the first
+attribute is HREF and there are no other attributes.
+
+ #!/usr/bin/perl -n00
+ # qxurl - tchrist@perl.com
+ print "$2\n" while m{
+ < \s*
+ A \s+ HREF \s* = \s* (["']) (.*?) \1
+ \s* >
+ }gsix;
- #!/usr/bin/perl -n00
- # qxurl - tchrist@perl.com
- print "$2\n" while m{
- < \s*
- A \s+ HREF \s* = \s* (["']) (.*?) \1
- \s* >
- }gsix;
-
-This version does not adjust relative URLs, understand alternate
-bases, deal with HTML comments, deal with HREF and NAME attributes
-in the same tag, understand extra qualifiers like TARGET, or accept
-URLs themselves as arguments. It also runs about 100x faster than a
-more "complete" solution using the LWP suite of modules, such as the
-http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.
=head2 How do I download a file from the user's machine? How do I open a file on another machine?
-In the context of an HTML form, you can use what's known as
-B<multipart/form-data> encoding. The CGI.pm module (available from
-CPAN) supports this in the start_multipart_form() method, which isn't
-the same as the startform() method.
+In this case, download means to use the file upload feature of HTML
+forms. You allow the web surfer to specify a file to send to your web
+server. To you it looks like a download, and to the user it looks
+like an upload. No matter what you call it, you do it with what's
+known as B<multipart/form-data> encoding. The CGI.pm module (which
+comes with Perl as part of the Standard Library) supports this in the
+start_multipart_form() method, which isn't the same as the startform()
+method.
+
+See the section in the CGI.pm documentation on file uploads for code
+examples and details.
=head2 How do I make a pop-up menu in HTML?
@@ -219,7 +260,7 @@ function to handle encoding.
The best source of detailed information on URI encoding is RFC 2396.
Basically, the following substitutions do it:
- s/([^\w()'*~!.-])/sprintf '%%%02x', $1/eg; # encode
+ s/([^\w()'*~!.-])/sprintf '%%%02x', ord $1/eg; # encode
s/%([A-Fa-f\d]{2})/chr hex $1/eg; # decode
@@ -233,46 +274,51 @@ regexp for breaking any arbitrary URI into components (Appendix B).
=head2 How do I redirect to another page?
-According to RFC 2616, "Hypertext Transfer Protocol -- HTTP/1.1", the
-preferred method is to send a C<Location:> header instead of a
-C<Content-Type:> header:
+Specify the complete URL of the destination (even if it is on the same
+server). This is one of the two different kinds of CGI "Location:"
+responses which are defined in the CGI specification for a Parsed Headers
+script. The other kind (an absolute URLpath) is resolved internally to
+the server without any HTTP redirection. The CGI specifications do not
+allow relative URLs in either case.
- Location: http://www.domain.com/newpage
+Use of CGI.pm is strongly recommended. This example shows redirection
+with a complete URL. This redirection is handled by the web browser.
-Note that relative URLs in these headers can cause strange effects
-because of "optimizations" that servers do.
+ use CGI qw/:standard/;
- $url = "http://www.perl.com/CPAN/";
- print "Location: $url\n\n";
- exit;
+ my $url = 'http://www.cpan.org/';
+ print redirect($url);
-To target a particular frame in a frameset, include the "Window-target:"
-in the header.
- print <<EOF;
- Location: http://www.domain.com/newpage
- Window-target: <FrameName>
+This example shows a redirection with an absolute URLpath. This
+redirection is handled by the local web server.
- EOF
+ my $url = '/CPAN/index.html';
+ print redirect($url);
+
+
+But if coded directly, it could be as follows (the final "\n" is
+shown separately, for clarity), using either a complete URL or
+an absolute URLpath.
+
+ print "Location: $url\n"; # CGI response header
+ print "\n"; # end of headers
-To be correct to the spec, each of those virtual newlines should
-really be physical C<"\015\012"> sequences by the time your message is
-received by the client browser. Except for NPH scripts, though, that
-local newline should get translated by your server into standard form,
-so you shouldn't have a problem here, even if you are stuck on MacOS.
-Everybody else probably won't even notice.
=head2 How do I put a password on my web pages?
-That depends. You'll need to read the documentation for your web
-server, or perhaps check some of the other FAQs referenced above.
+To enable authentication for your web server, you need to configure
+your web server. The configuration is different for different sorts
+of web servers---apache does it differently from iPlanet which does
+it differently from IIS. Check your web server documentation for
+the details for your particular server.
=head2 How do I edit my .htpasswd and .htgroup files with Perl?
The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a
consistent OO interface to these files, regardless of how they're
-stored. Databases may be text, dbm, Berkley DB or any database with a
-DBI compatible driver. HTTPD::UserAdmin supports files used by the
+stored. Databases may be text, dbm, Berkeley DB or any database with
+a DBI compatible driver. HTTPD::UserAdmin supports files used by the
`Basic' and `Digest' authentication schemes. Here's an example:
use HTTPD::UserAdmin ();
@@ -282,16 +328,9 @@ DBI compatible driver. HTTPD::UserAdmin supports files used by the
=head2 How do I make sure users can't enter values into a form that cause my CGI script to do bad things?
-Read the CGI security FAQ, at
-http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html , and the
-Perl/CGI FAQ at
-http://www.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html .
+See the security references listed in the CGI Meta FAQ
-In brief: use tainting (see L<perlsec>), which makes sure that data
-from outside your script (eg, CGI parameters) are never used in
-C<eval> or C<system> calls. In addition to tainting, never use the
-single-argument form of system() or exec(). Instead, supply the
-command and arguments as a list, which prevents shell globbing.
+ http://www.perl.org/CGI_MetaFAQ.html
=head2 How do I parse a mail header?
@@ -350,12 +389,20 @@ can have problems, because there are deliverable addresses that aren't
RFC-822 (the mail header standard) compliant, and addresses that aren't
deliverable which are compliant.
+You can use the Email::Valid or RFC::RFC822::Address which check
+the format of the address, although they cannot actually tell you
+if it is a deliverable address (i.e. that mail to the address
+will not bounce). Modules like Mail::CheckUser and Mail::EXPN
+try to interact with the domain name system or particular
+mail servers to learn even more, but their methods do not
+work everywhere---especially for security conscious administrators.
+
Many are tempted to try to eliminate many frequently-invalid
mail addresses with a simple regex, such as
C</^[\w.-]+\@(?:[\w-]+\.)+\w+$/>. It's a very bad idea. However,
this also throws out many valid ones, and says nothing about
potential deliverability, so it is not suggested. Instead, see
-http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz,
+http://www.cpan.org/authors/Tom_Christiansen/scripts/ckaddr.gz ,
which actually checks against the full RFC spec (except for nested
comments), looks for addresses you may not wish to accept mail to
(say, Bill Clinton or your postmaster), and then makes sure that the
@@ -471,7 +518,7 @@ Or you might be able use the CPAN module Mail::Mailer:
The Mail::Internet module uses Net::SMTP which is less Unix-centric than
Mail::Mailer, but less reliable. Avoid raw SMTP commands. There
are many reasons to use a mail transport agent like sendmail. These
-include queueing, MX records, and security.
+include queuing, MX records, and security.
=head2 How do I use MIME to make an attachment to a mail message?
@@ -504,23 +551,23 @@ MIME::Lite also includes a method for sending these things.
$msg->send;
-This defaults to using L<sendmail(1)> but can be customized to use
+This defaults to using L<sendmail> but can be customized to use
SMTP via L<Net::SMTP>.
=head2 How do I read mail?
While you could use the Mail::Folder module from CPAN (part of the
-MailFolder package) or the Mail::Internet module from CPAN (also part
+MailFolder package) or the Mail::Internet module from CPAN (part
of the MailTools package), often a module is overkill. Here's a
mail sorter.
#!/usr/bin/perl
- # bysub1 - simple sort by subject
+
my(@msgs, @sub);
my $msgno = -1;
$/ = ''; # paragraph reads
while (<>) {
- if (/^From/m) {
+ if (/^From /m) {
/^Subject:\s*(?:Re:\s*)*(.*)/mi;
$sub[++$msgno] = lc($1) || '';
}
@@ -585,15 +632,11 @@ an RPC stub generator and includes an RPC::ONC module.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
+Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington.
All rights reserved.
-When included as part of the Standard Version of Perl, or as part of
-its complete documentation whether printed or otherwise, this work
-may be distributed only under the terms of Perl's Artistic License.
-Any distribution of this file or derivatives thereof I<outside>
-of that package require that special arrangements be made with
-copyright holder.
+This documentation is free; you can redistribute it and/or modify it
+under the same terms as Perl itself.
Irrespective of its distribution, all code examples in this file
are hereby placed into the public domain. You are permitted and