diff options
-rw-r--r-- | pod/perlfaq.pod | 16 | ||||
-rw-r--r-- | pod/perlfaq1.pod | 24 | ||||
-rw-r--r-- | pod/perlfaq2.pod | 81 | ||||
-rw-r--r-- | pod/perlfaq3.pod | 78 | ||||
-rw-r--r-- | pod/perlfaq4.pod | 1456 | ||||
-rw-r--r-- | pod/perlfaq5.pod | 35 | ||||
-rw-r--r-- | pod/perlfaq6.pod | 497 | ||||
-rw-r--r-- | pod/perlfaq7.pod | 20 | ||||
-rw-r--r-- | pod/perlfaq8.pod | 578 | ||||
-rw-r--r-- | pod/perlfaq9.pod | 10 |
10 files changed, 1419 insertions, 1376 deletions
diff --git a/pod/perlfaq.pod b/pod/perlfaq.pod index 7b50831f59..3ddb3e23d7 100644 --- a/pod/perlfaq.pod +++ b/pod/perlfaq.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq - frequently asked questions about Perl ($Revision: 3606 $) +perlfaq - frequently asked questions about Perl =head1 DESCRIPTION @@ -24,15 +24,15 @@ at http://faq.perl.org/ . The perlfaq-workers periodically post extracts of the latest perlfaq to comp.lang.perl.misc. You can view the source tree at -https://svn.perl.org/modules/perlfaq/trunk/ (which is outside of the -main Perl source tree). The SVN repository notes all changes to the FAQ +http://cvs.perl.org/viewcvs/cvs-public/perlfaq/ (which is outside of the +main Perl source tree). The CVS repository notes all changes to the FAQ and holds the latest version of the working documents and may vary significantly from the version distributed with the latest version of Perl. Check the repository before sending your corrections. =head2 How to contribute to the perlfaq -You can mail corrections, additions, and suggestions to +You can mail corrections, additions, and suggestions to C<< <perlfaq-workers AT perl DOT org> >>. The perlfaq volunteers use this address to coordinate their efforts and track the perlfaq development. They appreciate your contributions to the FAQ but do not have time to @@ -171,7 +171,7 @@ Where can I get a list of Larry Wall witticisms? =item * -How can I convince my sysadmin/supervisor/employees to use version 5/5.6.1/Perl instead of some other language? +How can I convince others to use Perl? =back @@ -345,10 +345,6 @@ How can I compile my Perl program into byte code or C? =item * -How can I compile Perl into Java? - -=item * - How can I get C<#!perl> to work on [MS-DOS,NT,...]? =item * @@ -956,7 +952,7 @@ How can I match strings with multibyte characters? =item * -How do I match a pattern that is supplied by the user? +How do I match a regular expression that's in a variable? =back diff --git a/pod/perlfaq1.pod b/pod/perlfaq1.pod index 1078b7ee15..72a523287f 100644 --- a/pod/perlfaq1.pod +++ b/pod/perlfaq1.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq1 - General Questions About Perl ($Revision: 3606 $) +perlfaq1 - General Questions About Perl ($Revision: 7822 $) =head1 DESCRIPTION @@ -40,7 +40,7 @@ http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/ and http://archive.develooper.com/perl5-porters@perl.org/ or the news gateway nntp://nntp.perl.org/perl.perl5.porters or its web interface at http://nntp.perl.org/group/perl.perl5.porters , -or read the faq at http://simon-cozens.org/writings/p5p-faq , +or read the faq at http://dev.perl.org/perl5/docs/p5p-faq.html , or you can subscribe to the mailing list by sending perl5-porters-request@perl.org a subscription request (an empty message with no subject is fine). @@ -60,7 +60,7 @@ users the informal support will more than suffice. See the answer to There is often a matter of opinion and taste, and there isn't any one answer that fits anyone. In general, you want to use either the current -stable release, or the stable release immediately prior to that one. +stable release, or the stable release immediately prior to that one. Currently, those are perl5.8.x and perl5.6.x, respectively. Beyond that, you have to consider several things and decide which is best @@ -275,9 +275,7 @@ device drivers or context-switching code, complex multi-threaded shared-memory applications, or extremely large applications. You'll notice that perl is not itself written in Perl. -The new, native-code compiler for Perl may eventually reduce the -limitations given in the previous statement to some degree, but understand -that Perl remains fundamentally a dynamically typed language, not +Perl remains fundamentally a dynamically typed language, not a statically typed one. You certainly won't be chastised if you don't trust nuclear-plant or brain-surgery monitoring code to it. And Larry will sleep easier, too--Wall Street programs not withstanding. :-) @@ -312,14 +310,6 @@ tell you that a I<program> has been compiled to physical machine code once and can then be run multiple times, whereas a I<script> must be translated by a program each time it's used. -Perl programs are (usually) neither strictly compiled nor strictly -interpreted. They can be compiled to a byte-code form (something of a -Perl virtual machine) or to completely different languages, like C or -assembly language. You can't tell just by looking at it whether the -source is destined for a pure interpreter, a parse-tree interpreter, -a byte-code interpreter, or a native-code compiler, so it's hard to give -a definitive answer here. - Now that "script" and "scripting" are terms that have been seized by unscrupulous or unknowing marketeers for their own nefarious purposes, they have begun to take on strange and often pejorative meanings, @@ -373,7 +363,7 @@ In general, the benefit of a language is closely related to the skill of the people using that language. If you or your team can be more faster, better, and stronger through Perl, you'll deliver more value. Remember, people often respond better to what they get out of it. If you run -into resistance, figure out what those people get out of the other +into resistance, figure out what those people get out of the other choice and how Perl might satisfy that requirement. You don't have to worry about finding or paying for Perl; it's freely @@ -400,9 +390,9 @@ You might find these links useful: =head1 REVISION -Revision: $Revision: 3606 $ +Revision: $Revision: 7822 $ -Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $ +Date: $Date: 2006-09-11 14:22:59 +0200 (lun, 11 sep 2006) $ See L<perlfaq> for source control details and availability. diff --git a/pod/perlfaq2.pod b/pod/perlfaq2.pod index 45ad2b4a86..00ab474559 100644 --- a/pod/perlfaq2.pod +++ b/pod/perlfaq2.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq2 - Obtaining and Learning about Perl ($Revision: 3606 $) +perlfaq2 - Obtaining and Learning about Perl ($Revision: 6750 $) =head1 DESCRIPTION @@ -152,8 +152,8 @@ work, try looking in /usr/local/lib/perl5/pod for documentation. If all else fails, consult http://perldoc.perl.org/ which has the complete documentation in HTML and PDF format. -Many good books have been written about Perl--see the section below -for more details. +Many good books have been written about Perl--see the section later in +L<perlfaq2> for more details. Tutorial documents are included in current or upcoming Perl releases include L<perltoot> for objects or L<perlboot> for a beginner's @@ -252,31 +252,11 @@ And for more advanced information on writing larger programs, presented in the same style as the Llama book, continue your education with the Alpaca book: - Learning Perl Objects, References, and Modules (the "Alpaca Book") - by Randal L. Schwartz, with Tom Phoenix (foreword by Damian Conway) - ISBN 0-596-00478-8 [1st edition June 2003] + Intermediate Perl (the "Alpaca Book") + by Randal L. Schwartz and brian d foy, with Tom Phoenix (foreword by Damian Conway) + ISBN 0-596-10206-2 [1st edition March 2006] http://www.oreilly.com/catalog/lrnperlorm/ -If you're not an accidental programmer, but a more serious and -possibly even degreed computer scientist who doesn't need as much -hand-holding as we try to provide in the Llama, please check out the -delightful book - - Perl: The Programmer's Companion - by Nigel Chapman - ISBN 0-471-97563-X [1997, 3rd printing Spring 1998] - http://www.wiley.com/compbooks/catalog/97563-X.htm - http://www.wiley.com/compbooks/chapman/perl/perltpc.html (errata etc) - -If you are more at home in Windows the following is available -(though unfortunately rather dated). - - Learning Perl on Win32 Systems (the "Gecko Book") - by Randal L. Schwartz, Erik Olson, and Tom Christiansen, - with foreword by Larry Wall - ISBN 1-56592-324-3 [1st edition August 1997] - http://www.oreilly.com/catalog/lperlwin/ - Addison-Wesley ( http://www.awlonline.com/ ) and Manning ( http://www.manning.com/ ) are also publishers of some fine Perl books such as I<Object Oriented Programming with Perl> by Damian Conway and @@ -314,16 +294,16 @@ Recommended books on (or mostly on) Perl follow. Elements of Programming with Perl by Andrew L. Johnson ISBN 1-884777-80-5 [1st edition October 1999] - http://www.manning.com/Johnson/ + http://www.manning.com/johnson/ Learning Perl by Randal L. Schwartz, Tom Phoenix, and brian d foy ISBN 0-596-10105-8 [4th edition July 2005] http://www.oreilly.com/catalog/learnperl4/ - Learning Perl Objects, References, and Modules - by Randal L. Schwartz, with Tom Phoenix (foreword by Damian Conway) - ISBN 0-596-00478-8 [1st edition June 2003] + Intermediate Perl (the "Alpaca Book") + by Randal L. Schwartz and brian d foy, with Tom Phoenix (foreword by Damian Conway) + ISBN 0-596-10206-2 [1st edition March 2006] http://www.oreilly.com/catalog/lrnperlorm/ =item Task-Oriented @@ -380,7 +360,7 @@ Recommended books on (or mostly on) Perl follow. Damian Conway with foreword by Randal L. Schwartz ISBN 1-884777-79-1 [1st edition August 1999] - http://www.manning.com/Conway/ + http://www.manning.com/conway/ Data Munging with Perl Dave Cross @@ -406,27 +386,32 @@ Recommended books on (or mostly on) Perl follow. =head2 Which magazines have Perl content? -The first (and for a long time, only) periodical devoted to All Things Perl, -I<The Perl Journal> contains tutorials, demonstrations, case studies, -announcements, contests, and much more. I<TPJ> has columns on web -development, databases, Win32 Perl, graphical programming, regular -expressions, and networking, and sponsors the Obfuscated Perl Contest -and the Perl Poetry Contests. Beginning in November 2002, TPJ moved to a -reader-supported monthly e-zine format in which subscribers can download -issues as PDF documents. For more details on TPJ, see http://www.tpj.com/ - -Beyond this, magazines that frequently carry quality articles on -Perl are I<The Perl Review> ( http://www.theperlreview.com ), -I<Unix Review> ( http://www.unixreview.com/ ), -I<Linux Magazine> ( http://www.linuxmagazine.com/ ), -and Usenix's newsletter/magazine to its members, I<login:> -( http://www.usenix.org/ ) +I<The Perl Review> ( http://www.theperlreview.com ) focuses on Perl +almost completely (although it sometimes sneaks in an article about +another language). + +Magazines that frequently carry quality articles on Perl include I<The +Perl Review> ( http://www.theperlreview.com ), I<Unix Review> ( +http://www.unixreview.com/ ), I<Linux Magazine> ( +http://www.linuxmagazine.com/ ), and Usenix's newsletter/magazine to +its members, I<login:> ( http://www.usenix.org/ ) The Perl columns of Randal L. Schwartz are available on the web at http://www.stonehenge.com/merlyn/WebTechniques/ , http://www.stonehenge.com/merlyn/UnixReview/ , and http://www.stonehenge.com/merlyn/LinuxMag/ . +The first (and for a long time, only) periodical devoted to All Things +Perl, I<The Perl Journal> contains tutorials, demonstrations, case +studies, announcements, contests, and much more. I<TPJ> has columns +on web development, databases, Win32 Perl, graphical programming, +regular expressions, and networking, and sponsors the Obfuscated Perl +Contest and the Perl Poetry Contests. Beginning in November 2002, TPJ +moved to a reader-supported monthly e-zine format in which subscribers +can download issues as PDF documents. In 2006, TPJ merged with Dr. +Dobbs Journal (online edition). To read old TPJ articles, see +http://www.ddj.com/ . + =head2 What mailing lists are there for Perl? Most of the major modules (Tk, CGI, libwww-perl) have their own @@ -516,9 +501,9 @@ the I<What is CPAN?> question earlier in this document. =head1 REVISION -Revision: $Revision: 3606 $ +Revision: $Revision: 6750 $ -Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $ +Date: $Date: 2006-08-06 02:30:54 +0200 (dim, 06 aoĆ» 2006) $ See L<perlfaq> for source control details and availability. diff --git a/pod/perlfaq3.pod b/pod/perlfaq3.pod index 6eea58bf62..18f1345ce0 100644 --- a/pod/perlfaq3.pod +++ b/pod/perlfaq3.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq3 - Programming Tools ($Revision: 3606 $) +perlfaq3 - Programming Tools ($Revision: 7822 $) =head1 DESCRIPTION @@ -114,14 +114,14 @@ perl finds it. Before you do anything else, you can help yourself by ensuring that you let Perl tell you about problem areas in your code. By turning -on warnings and strictures, you can head off many problems before +on warnings and strictures, you can head off many problems before they get too big. You can find out more about these in L<strict> and L<warnings>. #!/usr/bin/perl use strict; use warnings; - + Beyond that, the simplest debugger is the C<print> function. Use it to look at values as you run your program: @@ -129,9 +129,9 @@ to look at values as you run your program: The C<Data::Dumper> module can pretty-print Perl data structures: - use Data::Dumper( Dump ); - print STDERR "The hash is " . Dump( \%hash ) . "\n"; - + use Data::Dumper qw( Dumper ); + print STDERR "The hash is " . Dumper( \%hash ) . "\n"; + Perl comes with an interactive debugger, which you can start with the C<-d> switch. It's fully explained in L<perldebug>. @@ -210,8 +210,8 @@ the following settings in vi and its clones: Put that in your F<.exrc> file (replacing the caret characters with control characters) and away you go. In insert mode, ^T is -for indenting, ^D is for undenting, and ^O is for blockdenting-- -as it were. A more complete example, with comments, can be found at +for indenting, ^D is for undenting, and ^O is for blockdenting--as +it were. A more complete example, with comments, can be found at http://www.cpan.org/authors/id/TOMC/scripts/toms.exrc.gz The a2ps http://www-inf.enst.fr/%7Edemaille/a2ps/black+white.ps.gz does @@ -222,6 +222,10 @@ documents, as does enscript at http://people.ssh.fi/mtr/genscript/ . (contributed by brian d foy) +Ctags uses an index to quickly find things in source code, and many +popular editors support ctags for several different languages, +including Perl. + Exuberent ctags supports Perl: http://ctags.sourceforge.net/ You might also try pltags: http://www.mscha.com/pltags.zip @@ -423,7 +427,7 @@ http://www.primate.wisc.edu/software/csh-tcsh-book/ =item Zsh -ftp://ftp.blarg.net/users/amol/zsh/ , see also http://www.zsh.org/ +http://www.zsh.org/ =back @@ -499,10 +503,11 @@ B<rep ps axu> similar to B<top>. =head2 How can I use X or Tk with Perl? -Tk is a completely Perl-based, object-oriented interface to the Tk toolkit -that doesn't force you to use Tcl just to get at Tk. Sx is an interface -to the Athena Widget set. Both are available from CPAN. See the -directory http://www.cpan.org/modules/by-category/08_User_Interfaces/ +The Tk.pm module is a completely Perl-based, object-oriented interface +to the Tk toolkit that doesn't force you to use Tcl just to get at Tk. +Sx is an interface to the Athena Widget set. Both are available from +CPAN. See the directory +http://www.cpan.org/modules/by-category/08_User_Interfaces/ Invaluable for Perl/Tk programming are the Perl/Tk FAQ at http://phaseit.net/claird/comp.lang.perl.tk/ptkFAQ.html , the Perl/Tk Reference @@ -758,11 +763,12 @@ You can try using encryption via source filters (Starting from Perl 5.8 the Filter::Simple and Filter::Util::Call modules are included in the standard distribution), but any decent programmer will be able to decrypt it. You can try using the byte code compiler and interpreter -described below, but the curious might still be able to de-compile it. -You can try using the native-code compiler described below, but -crackers might be able to disassemble it. These pose varying degrees -of difficulty to people wanting to get at your code, but none can -definitively conceal it (true of every language, not just Perl). +described later in L<perlfaq3>, but the curious might still be able to +de-compile it. You can try using the native-code compiler described +later, but crackers might be able to disassemble it. These pose +varying degrees of difficulty to people wanting to get at your code, +but none can definitively conceal it (true of every language, not just +Perl). It is very easy to recover the source of Perl programs. You simply feed the program to the perl interpreter and use the modules in @@ -790,16 +796,10 @@ You probably won't see much of a speed increase either, since most solutions simply bundle a Perl interpreter in the final product (but see L<How can I make my Perl program run faster?>). -The Perl Archive Toolkit ( http://par.perl.org/index.cgi ) is Perl's +The Perl Archive Toolkit ( http://par.perl.org/ ) is Perl's analog to Java's JAR. It's freely available and on CPAN ( http://search.cpan.org/dist/PAR/ ). -The B::* namespace, often called "the Perl compiler", but is really a way -for Perl programs to peek at its innards rather than create pre-compiled -versions of your program. However. the B::Bytecode module can turn your -script into a bytecode format that could be loaded later by the -ByteLoader module and executed as a regular Perl script. - There are also some commercial products that may work for you, although you have to buy a license for them. @@ -811,16 +811,6 @@ Perl2Exe ( http://www.indigostar.com/perl2exe.htm ) is a command line program for converting perl scripts to executable files. It targets both Windows and unix platforms. -=head2 How can I compile Perl into Java? - -You can also integrate Java and Perl with the -Perl Resource Kit from O'Reilly Media. See -http://www.oreilly.com/catalog/prkunix/ . - -Perl 5.6 comes with Java Perl Lingo, or JPL. JPL, still in -development, allows Perl code to be called from Java. See jpl/README -in the Perl source tree. - =head2 How can I get C<#!perl> to work on [MS-DOS,NT,...]? For OS/2 just use @@ -939,9 +929,8 @@ A good place to start is L<perltoot>, and you can use L<perlobj>, L<perlboot>, L<perltoot>, L<perltooc>, and L<perlbot> for reference. A good book on OO on Perl is the "Object-Oriented Perl" -by Damian Conway from Manning Publications, or "Learning Perl -References, Objects, & Modules" by Randal Schwartz and Tom -Phoenix from O'Reilly Media. +by Damian Conway from Manning Publications, or "Intermediate Perl" +by Randal Schwartz, brian d foy, and Tom Phoenix from O'Reilly Media. =head2 Where can I learn about linking C with Perl? @@ -984,15 +973,18 @@ or =head2 What's MakeMaker? -This module (part of the standard Perl distribution) is designed to -write a Makefile for an extension module from a Makefile.PL. For more -information, see L<ExtUtils::MakeMaker>. +(contributed by brian d foy) + +The C<ExtUtils::MakeMaker> module, better known simply as "MakeMaker", +turns a Perl script, typically called C<Makefile.PL>, into a Makefile. +The unix tool C<make> uses this file to manage dependencies and actions +to process and install a Perl distribution. =head1 REVISION -Revision: $Revision: 3606 $ +Revision: $Revision: 7822 $ -Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $ +Date: $Date: 2006-09-11 14:22:59 +0200 (lun, 11 sep 2006) $ See L<perlfaq> for source control details and availability. diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod index dc968e3b5a..e9c4ab316a 100644 --- a/pod/perlfaq4.pod +++ b/pod/perlfaq4.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq4 - Data Manipulation ($Revision: 3606 $) +perlfaq4 - Data Manipulation ($Revision: 6816 $) =head1 DESCRIPTION @@ -11,18 +11,18 @@ numbers, dates, strings, arrays, hashes, and miscellaneous data issues. =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)? -Internally, your computer represents floating-point numbers -in binary. Digital (as in powers of two) computers cannot -store all numbers exactly. Some real numbers lose precision -in the process. This is a problem with how computers store -numbers and affects all computer languages, not just Perl. +Internally, your computer represents floating-point numbers in binary. +Digital (as in powers of two) computers cannot store all numbers +exactly. Some real numbers lose precision in the process. This is a +problem with how computers store numbers and affects all computer +languages, not just Perl. -L<perlnumber> show the gory details of number -representations and conversions. +L<perlnumber> show the gory details of number representations and +conversions. -To limit the number of decimal places in your numbers, you -can use the printf or sprintf function. See the -L<"Floating Point Arithmetic"|perlop> for more details. +To limit the number of decimal places in your numbers, you can use the +printf or sprintf function. See the L<"Floating Point +Arithmetic"|perlop> for more details. printf "%.2f", 10/3; @@ -30,16 +30,16 @@ L<"Floating Point Arithmetic"|perlop> for more details. =head2 Why is int() broken? -Your int() is most probably working just fine. It's the numbers that +Your C<int()> is most probably working just fine. It's the numbers that aren't quite what you think. -First, see the above item "Why am I getting long decimals +First, see the answer to "Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?". For example, this - print int(0.6/0.2-2), "\n"; + print int(0.6/0.2-2), "\n"; will in most computers print 0, not 1, because even such simple numbers as 0.6 and 0.2 cannot be presented exactly by floating-point @@ -50,54 +50,54 @@ numbers. What you think in the above as 'three' is really more like Perl only understands octal and hex numbers as such when they occur as literals in your program. Octal literals in perl must start with a -leading "0" and hexadecimal literals must start with a leading "0x". +leading C<0> and hexadecimal literals must start with a leading C<0x>. If they are read in from somewhere and assigned, no automatic -conversion takes place. You must explicitly use oct() or hex() if you -want the values converted to decimal. oct() interprets hex ("0x350"), -octal ("0350" or even without the leading "0", like "377") and binary -("0b1010") numbers, while hex() only converts hexadecimal ones, with -or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef". +conversion takes place. You must explicitly use C<oct()> or C<hex()> if you +want the values converted to decimal. C<oct()> interprets hexadecimal (C<0x350>), +octal (C<0350> or even without the leading C<0>, like C<377>) and binary +(C<0b1010>) numbers, while C<hex()> only converts hexadecimal ones, with +or without a leading C<0x>, such as C<0x255>, C<3A>, C<ff>, or C<deadbeef>. The inverse mapping from decimal to octal can be done with either the -"%o" or "%O" sprintf() formats. +<%o> or C<%O> C<sprintf()> formats. -This problem shows up most often when people try using chmod(), mkdir(), -umask(), or sysopen(), which by widespread tradition typically take -permissions in octal. +This problem shows up most often when people try using C<chmod()>, +C<mkdir()>, C<umask()>, or C<sysopen()>, which by widespread tradition +typically take permissions in octal. - chmod(644, $file); # WRONG - chmod(0644, $file); # right + chmod(644, $file); # WRONG + chmod(0644, $file); # right Note the mistake in the first line was specifying the decimal literal -644, rather than the intended octal literal 0644. The problem can +C<644>, rather than the intended octal literal C<0644>. The problem can be seen with: - printf("%#o",644); # prints 01204 + printf("%#o",644); # prints 01204 Surely you had not intended C<chmod(01204, $file);> - did you? If you want to use numeric literals as arguments to chmod() et al. then please try to express them as octal constants, that is with a leading zero and -with the following digits restricted to the set 0..7. +with the following digits restricted to the set C<0..7>. =head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions? -Remember that int() merely truncates toward 0. For rounding to a -certain number of digits, sprintf() or printf() is usually the easiest -route. +Remember that C<int()> merely truncates toward 0. For rounding to a +certain number of digits, C<sprintf()> or C<printf()> is usually the +easiest route. - printf("%.3f", 3.1415926535); # prints 3.142 + printf("%.3f", 3.1415926535); # prints 3.142 -The POSIX module (part of the standard Perl distribution) implements -ceil(), floor(), and a number of other mathematical and trigonometric -functions. +The C<POSIX> module (part of the standard Perl distribution) +implements C<ceil()>, C<floor()>, and a number of other mathematical +and trigonometric functions. - use POSIX; - $ceil = ceil(3.5); # 4 - $floor = floor(3.5); # 3 + use POSIX; + $ceil = ceil(3.5); # 4 + $floor = floor(3.5); # 3 -In 5.000 to 5.003 perls, trigonometry was done in the Math::Complex -module. With 5.004, the Math::Trig module (part of the standard Perl +In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex> +module. With 5.004, the C<Math::Trig> module (part of the standard Perl distribution) implements the trigonometric functions. Internally it -uses the Math::Complex module and some functions can break out from +uses the C<Math::Complex> module and some functions can break out from the real axis into the complex plane, for example the inverse sine of 2. @@ -110,148 +110,148 @@ need yourself. To see why, notice how you'll still have an issue on half-way-point alternation: - for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i} + for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i} - 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7 - 0.8 0.8 0.9 0.9 1.0 1.0 + 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7 + 0.8 0.8 0.9 0.9 1.0 1.0 -Don't blame Perl. It's the same as in C. IEEE says we have to do this. -Perl numbers whose absolute values are integers under 2**31 (on 32 bit -machines) will work pretty much like mathematical integers. Other numbers -are not guaranteed. +Don't blame Perl. It's the same as in C. IEEE says we have to do +this. Perl numbers whose absolute values are integers under 2**31 (on +32 bit machines) will work pretty much like mathematical integers. +Other numbers are not guaranteed. =head2 How do I convert between numeric representations/bases/radixes? -As always with Perl there is more than one way to do it. Below -are a few examples of approaches to making common conversions -between number representations. This is intended to be representational -rather than exhaustive. +As always with Perl there is more than one way to do it. Below are a +few examples of approaches to making common conversions between number +representations. This is intended to be representational rather than +exhaustive. -Some of the examples below use the Bit::Vector module from CPAN. -The reason you might choose Bit::Vector over the perl built in -functions is that it works with numbers of ANY size, that it is -optimized for speed on some operations, and for at least some -programmers the notation might be familiar. +Some of the examples later in L<perlfaq4> use the C<Bit::Vector> +module from CPAN. The reason you might choose C<Bit::Vector> over the +perl built in functions is that it works with numbers of ANY size, +that it is optimized for speed on some operations, and for at least +some programmers the notation might be familiar. =over 4 =item How do I convert hexadecimal into decimal -Using perl's built in conversion of 0x notation: +Using perl's built in conversion of C<0x> notation: - $dec = 0xDEADBEEF; + $dec = 0xDEADBEEF; -Using the hex function: +Using the C<hex> function: - $dec = hex("DEADBEEF"); + $dec = hex("DEADBEEF"); -Using pack: +Using C<pack>: - $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8))); + $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8))); -Using the CPAN module Bit::Vector: +Using the CPAN module C<Bit::Vector>: - use Bit::Vector; - $vec = Bit::Vector->new_Hex(32, "DEADBEEF"); - $dec = $vec->to_Dec(); + use Bit::Vector; + $vec = Bit::Vector->new_Hex(32, "DEADBEEF"); + $dec = $vec->to_Dec(); =item How do I convert from decimal to hexadecimal -Using sprintf: +Using C<sprintf>: - $hex = sprintf("%X", 3735928559); # upper case A-F - $hex = sprintf("%x", 3735928559); # lower case a-f + $hex = sprintf("%X", 3735928559); # upper case A-F + $hex = sprintf("%x", 3735928559); # lower case a-f -Using unpack: +Using C<unpack>: - $hex = unpack("H*", pack("N", 3735928559)); + $hex = unpack("H*", pack("N", 3735928559)); -Using Bit::Vector: +Using C<Bit::Vector>: - use Bit::Vector; - $vec = Bit::Vector->new_Dec(32, -559038737); - $hex = $vec->to_Hex(); + use Bit::Vector; + $vec = Bit::Vector->new_Dec(32, -559038737); + $hex = $vec->to_Hex(); -And Bit::Vector supports odd bit counts: +And C<Bit::Vector> supports odd bit counts: - use Bit::Vector; - $vec = Bit::Vector->new_Dec(33, 3735928559); - $vec->Resize(32); # suppress leading 0 if unwanted - $hex = $vec->to_Hex(); + use Bit::Vector; + $vec = Bit::Vector->new_Dec(33, 3735928559); + $vec->Resize(32); # suppress leading 0 if unwanted + $hex = $vec->to_Hex(); =item How do I convert from octal to decimal Using Perl's built in conversion of numbers with leading zeros: - $dec = 033653337357; # note the leading 0! + $dec = 033653337357; # note the leading 0! -Using the oct function: +Using the C<oct> function: - $dec = oct("33653337357"); + $dec = oct("33653337357"); -Using Bit::Vector: +Using C<Bit::Vector>: - use Bit::Vector; - $vec = Bit::Vector->new(32); - $vec->Chunk_List_Store(3, split(//, reverse "33653337357")); - $dec = $vec->to_Dec(); + use Bit::Vector; + $vec = Bit::Vector->new(32); + $vec->Chunk_List_Store(3, split(//, reverse "33653337357")); + $dec = $vec->to_Dec(); =item How do I convert from decimal to octal -Using sprintf: +Using C<sprintf>: - $oct = sprintf("%o", 3735928559); + $oct = sprintf("%o", 3735928559); -Using Bit::Vector: +Using C<Bit::Vector>: - use Bit::Vector; - $vec = Bit::Vector->new_Dec(32, -559038737); - $oct = reverse join('', $vec->Chunk_List_Read(3)); + use Bit::Vector; + $vec = Bit::Vector->new_Dec(32, -559038737); + $oct = reverse join('', $vec->Chunk_List_Read(3)); =item How do I convert from binary to decimal Perl 5.6 lets you write binary numbers directly with -the 0b notation: +the C<0b> notation: - $number = 0b10110110; + $number = 0b10110110; -Using oct: +Using C<oct>: - my $input = "10110110"; - $decimal = oct( "0b$input" ); + my $input = "10110110"; + $decimal = oct( "0b$input" ); -Using pack and ord: +Using C<pack> and C<ord>: - $decimal = ord(pack('B8', '10110110')); + $decimal = ord(pack('B8', '10110110')); -Using pack and unpack for larger strings: +Using C<pack> and C<unpack> for larger strings: - $int = unpack("N", pack("B32", + $int = unpack("N", pack("B32", substr("0" x 32 . "11110101011011011111011101111", -32))); - $dec = sprintf("%d", $int); + $dec = sprintf("%d", $int); - # substr() is used to left pad a 32 character string with zeros. + # substr() is used to left pad a 32 character string with zeros. -Using Bit::Vector: +Using C<Bit::Vector>: - $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111"); - $dec = $vec->to_Dec(); + $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111"); + $dec = $vec->to_Dec(); =item How do I convert from decimal to binary -Using sprintf (perl 5.6+): +Using C<sprintf> (perl 5.6+): - $bin = sprintf("%b", 3735928559); + $bin = sprintf("%b", 3735928559); -Using unpack: +Using C<unpack>: - $bin = unpack("B*", pack("N", 3735928559)); + $bin = unpack("B*", pack("N", 3735928559)); -Using Bit::Vector: +Using C<Bit::Vector>: - use Bit::Vector; - $vec = Bit::Vector->new_Dec(32, -559038737); - $bin = $vec->to_Bin(); + use Bit::Vector; + $vec = Bit::Vector->new_Dec(32, -559038737); + $bin = $vec->to_Bin(); The remaining transformations (e.g. hex -> oct, bin -> hex, etc.) are left as an exercise to the inclined reader. @@ -274,16 +274,16 @@ Most problems with C<&> and C<|> arise because the programmer thinks they have a number but really it's a string. The rest arise because the programmer says: - if ("\020\020" & "\101\101") { - # ... - } + if ("\020\020" & "\101\101") { + # ... + } but a string consisting of two null bytes (the result of C<"\020\020" & "\101\101">) is not a false value in Perl. You need: - if ( ("\020\020" & "\101\101") !~ /[^\000]/) { - # ... - } + if ( ("\020\020" & "\101\101") !~ /[^\000]/) { + # ... + } =head2 How do I multiply matrices? @@ -295,38 +295,38 @@ or the PDL extension (also available from CPAN). To call a function on each element in an array, and collect the results, use: - @results = map { my_func($_) } @array; + @results = map { my_func($_) } @array; For example: - @triple = map { 3 * $_ } @single; + @triple = map { 3 * $_ } @single; To call a function on each element of an array, but ignore the results: - foreach $iterator (@array) { - some_func($iterator); - } + foreach $iterator (@array) { + some_func($iterator); + } To call a function on each integer in a (small) range, you B<can> use: - @results = map { some_func($_) } (5 .. 25); + @results = map { some_func($_) } (5 .. 25); but you should be aware that the C<..> operator creates an array of all integers in the range. This can take a lot of memory for large ranges. Instead use: - @results = (); - for ($i=5; $i < 500_005; $i++) { - push(@results, some_func($i)); - } + @results = (); + for ($i=5; $i < 500_005; $i++) { + push(@results, some_func($i)); + } This situation has been fixed in Perl5.005. Use of C<..> in a C<for> loop will iterate over the range, without creating the entire range. - for my $i (5 .. 500_005) { - push(@results, some_func($i)); - } + for my $i (5 .. 500_005) { + push(@results, some_func($i)); + } will not create a list of 500,000 integers. @@ -342,19 +342,19 @@ once at the start of your program to seed the random number generator. BEGIN { srand() if $] < 5.004 } 5.004 and later automatically call C<srand> at the beginning. Don't -call C<srand> more than once---you make your numbers less random, rather -than more. +call C<srand> more than once--you make your numbers less random, +rather than more. Computers are good at being predictable and bad at being random (despite appearances caused by bugs in your programs :-). see the F<random> article in the "Far More Than You Ever Wanted To Know" -collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of -Tom Phoenix, talks more about this. John von Neumann said, "Anyone +collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy +of Tom Phoenix, talks more about this. John von Neumann said, "Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin." If you want numbers that are more random than C<rand> with C<srand> -provides, you should also check out the Math::TrulyRandom module from +provides, you should also check out the C<Math::TrulyRandom> module from CPAN. It uses the imperfections in your system's timer to generate random numbers, but this takes quite a while. If you want a better pseudorandom generator than comes with your operating system, look at @@ -380,7 +380,7 @@ Hence you derive the following simple function to abstract that. It selects a random integer between the two given integers (inclusive), For example: C<random_int_between(50,120)>. - sub random_int_between ($$) { + sub random_int_between { my($min, $max) = @_; # Assumes that the two arguments are integers themselves! return $min if $min == $max; @@ -397,22 +397,21 @@ argument localtime uses the current time. $day_of_year = (localtime)[7]; -The POSIX module can also format a date as the day of the year or +The C<POSIX> module can also format a date as the day of the year or week of the year. use POSIX qw/strftime/; my $day_of_year = strftime "%j", localtime; my $week_of_year = strftime "%W", localtime; -To get the day of year for any date, use the Time::Local module to get +To get the day of year for any date, use C<POSIX>'s C<mktime> to get a time in epoch seconds for the argument to localtime. - use POSIX qw/strftime/; - use Time::Local; + use POSIX qw/mktime strftime/; my $week_of_year = strftime "%W", - localtime( timelocal( 0, 0, 0, 18, 11, 1987 ) ); + localtime( mktime( 0, 0, 0, 18, 11, 87 ) ); -The Date::Calc module provides two functions to calculate these. +The C<Date::Calc> module provides two functions to calculate these. use Date::Calc; my $day_of_year = Day_of_Year( 1987, 12, 18 ); @@ -422,62 +421,62 @@ The Date::Calc module provides two functions to calculate these. Use the following simple functions: - sub get_century { - return int((((localtime(shift || time))[5] + 1999))/100); - } + sub get_century { + return int((((localtime(shift || time))[5] + 1999))/100); + } - sub get_millennium { - return 1+int((((localtime(shift || time))[5] + 1899))/1000); - } + sub get_millennium { + return 1+int((((localtime(shift || time))[5] + 1899))/1000); + } -On some systems, the POSIX module's strftime() function has -been extended in a non-standard way to use a C<%C> format, -which they sometimes claim is the "century". It isn't, -because on most such systems, this is only the first two -digits of the four-digit year, and thus cannot be used to -reliably determine the current century or millennium. +On some systems, the C<POSIX> module's C<strftime()> function has been +extended in a non-standard way to use a C<%C> format, which they +sometimes claim is the "century". It isn't, because on most such +systems, this is only the first two digits of the four-digit year, and +thus cannot be used to reliably determine the current century or +millennium. =head2 How can I compare two dates and find the difference? (contributed by brian d foy) -You could just store all your dates as a number and then subtract. Life -isn't always that simple though. If you want to work with formatted -dates, the Date::Manip, Date::Calc, or DateTime modules can help you. - +You could just store all your dates as a number and then subtract. +Life isn't always that simple though. If you want to work with +formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime> +modules can help you. =head2 How can I take a string and turn it into epoch seconds? If it's a regular enough string that it always has the same format, you can split it up and pass the parts to C<timelocal> in the standard -Time::Local module. Otherwise, you should look into the Date::Calc -and Date::Manip modules from CPAN. +C<Time::Local> module. Otherwise, you should look into the C<Date::Calc> +and C<Date::Manip> modules from CPAN. =head2 How can I find the Julian Day? (contributed by brian d foy and Dave Cross) -You can use the Time::JulianDay module available on CPAN. Ensure that -you really want to find a Julian day, though, as many people have +You can use the C<Time::JulianDay> module available on CPAN. Ensure +that you really want to find a Julian day, though, as many people have different ideas about Julian days. See http://www.hermetic.ch/cal_stud/jdn.htm for instance. -You can also try the DateTime module, which can convert a date/time +You can also try the C<DateTime> module, which can convert a date/time to a Julian Day. - $ perl -MDateTime -le'print DateTime->today->jd' - 2453401.5 + $ perl -MDateTime -le'print DateTime->today->jd' + 2453401.5 Or the modified Julian Day - $ perl -MDateTime -le'print DateTime->today->mjd' - 53401 + $ perl -MDateTime -le'print DateTime->today->mjd' + 53401 Or even the day of the year (which is what some people think of as a Julian day) - $ perl -MDateTime -le'print DateTime->today->doy' - 31 + $ perl -MDateTime -le'print DateTime->today->doy' + 31 =head2 How do I find yesterday's date? @@ -506,10 +505,10 @@ dates, but that assumes that days are twenty-four hours each. For most people, there are two days a year when they aren't: the switch to and from summer time throws this off. Let the modules do the work. -=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant? +=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant? Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is -Y2K compliant (whatever that means). The programmers you've hired to +Y2K compliant (whatever that means). The programmers you've hired to use it, however, probably are not. Long answer: The question belies a true understanding of the issue. @@ -557,7 +556,7 @@ It depends just what you mean by "escape". URL escapes are dealt with in L<perlfaq9>. Shell escapes with the backslash (C<\>) character are removed with - s/\\(.)/$1/g; + s/\\(.)/$1/g; This won't expand C<"\n"> or C<"\t"> or any other special escapes. @@ -572,7 +571,7 @@ store the matched character in the back-reference C<\1> and we use that to require that the same thing immediately follow it. We replace that part of the string with the character in C<$1>. - s/(.)\1/$1/g; + s/(.)\1/$1/g; We can also use the transliteration operator, C<tr///>. In this example, the search list side of our C<tr///> contains nothing, but @@ -584,7 +583,7 @@ duplicated and consecutive characters in the string so a character does not show up next to itself my $str = 'Haarlem'; # in the Netherlands - $str =~ tr///cs; # Now Harlem, like in New York + $str =~ tr///cs; # Now Harlem, like in New York =head2 How do I expand function calls in a string? @@ -625,7 +624,7 @@ as well. In most cases, it is probably easier to simply use string concatenation, which also forces scalar context. - print "The time is " . localtime . ".\n"; + print "The time is " . localtime() . ".\n"; =head2 How do I find matching/nesting anything? @@ -633,7 +632,7 @@ This isn't something that can be done in one regular expression, no matter how complicated. To find something between two single characters, a pattern like C</x([^x]*)x/> will get the intervening bits in $1. For multiple ones, then something more like -C</alpha(.*?)omega/> would be needed. But none of these deals with +C</alpha(.*?)omega/> would be needed. But none of these deals with nested patterns. For balanced expressions using C<(>, C<{>, C<[> or C<< < >> as delimiters, use the CPAN module Regexp::Common, or see L<perlre/(??{ code })>. For other cases, you'll have to write a @@ -641,68 +640,68 @@ parser. If you are serious about writing a parser, there are a number of modules or oddities that will make your life a lot easier. There are -the CPAN modules Parse::RecDescent, Parse::Yapp, and Text::Balanced; -and the byacc program. Starting from perl 5.8 the Text::Balanced is -part of the standard distribution. +the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and +C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8 +the C<Text::Balanced> is part of the standard distribution. One simple destructive, inside-out approach that you might try is to pull out the smallest nesting parts one at a time: - while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) { - # do something with $1 - } + while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) { + # do something with $1 + } A more complicated and sneaky approach is to make Perl's regular expression engine do it for you. This is courtesy Dean Inada, and rather has the nature of an Obfuscated Perl Contest entry, but it really does work: - # $_ contains the string to parse - # BEGIN and END are the opening and closing markers for the - # nested text. + # $_ contains the string to parse + # BEGIN and END are the opening and closing markers for the + # nested text. - @( = ('(',''); - @) = (')',''); - ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs; - @$ = (eval{/$re/},$@!~/unmatched/i); - print join("\n",@$[0..$#$]) if( $$[-1] ); + @( = ('(',''); + @) = (')',''); + ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs; + @$ = (eval{/$re/},$@!~/unmatched/i); + print join("\n",@$[0..$#$]) if( $$[-1] ); =head2 How do I reverse a string? -Use reverse() in scalar context, as documented in +Use C<reverse()> in scalar context, as documented in L<perlfunc/reverse>. - $reversed = reverse $string; + $reversed = reverse $string; =head2 How do I expand tabs in a string? You can do it yourself: - 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e; + 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e; -Or you can just use the Text::Tabs module (part of the standard Perl +Or you can just use the C<Text::Tabs> module (part of the standard Perl distribution). - use Text::Tabs; - @expanded_lines = expand(@lines_with_tabs); + use Text::Tabs; + @expanded_lines = expand(@lines_with_tabs); =head2 How do I reformat a paragraph? -Use Text::Wrap (part of the standard Perl distribution): +Use C<Text::Wrap> (part of the standard Perl distribution): - use Text::Wrap; - print wrap("\t", ' ', @paragraphs); + use Text::Wrap; + print wrap("\t", ' ', @paragraphs); -The paragraphs you give to Text::Wrap should not contain embedded -newlines. Text::Wrap doesn't justify the lines (flush-right). +The paragraphs you give to C<Text::Wrap> should not contain embedded +newlines. C<Text::Wrap> doesn't justify the lines (flush-right). -Or use the CPAN module Text::Autoformat. Formatting files can be easily -done by making a shell alias, like so: +Or use the CPAN module C<Text::Autoformat>. Formatting files can be +easily done by making a shell alias, like so: - alias fmt="perl -i -MText::Autoformat -n0777 \ - -e 'print autoformat $_, {all=>1}' $*" + alias fmt="perl -i -MText::Autoformat -n0777 \ + -e 'print autoformat $_, {all=>1}' $*" -See the documentation for Text::Autoformat to appreciate its many +See the documentation for C<Text::Autoformat> to appreciate its many capabilities. =head2 How can I access or change N characters of a string? @@ -713,16 +712,16 @@ and grab the string of length 1. $string = "Just another Perl Hacker"; - $first_char = substr( $string, 0, 1 ); # 'J' + $first_char = substr( $string, 0, 1 ); # 'J' To change part of a string, you can use the optional fourth argument which is the replacement string. - substr( $string, 13, 4, "Perl 5.8.0" ); + substr( $string, 13, 4, "Perl 5.8.0" ); You can also use substr() as an lvalue. - substr( $string, 13, 4 ) = "Perl 5.8.0"; + substr( $string, 13, 4 ) = "Perl 5.8.0"; =head2 How do I change the Nth occurrence of something? @@ -731,29 +730,29 @@ to change the fifth occurrence of C<"whoever"> or C<"whomever"> into C<"whosoever"> or C<"whomsoever">, case insensitively. These all assume that $_ contains the string to be altered. - $count = 0; - s{((whom?)ever)}{ - ++$count == 5 # is it the 5th? - ? "${2}soever" # yes, swap - : $1 # renege and leave it there - }ige; + $count = 0; + s{((whom?)ever)}{ + ++$count == 5 # is it the 5th? + ? "${2}soever" # yes, swap + : $1 # renege and leave it there + }ige; In the more general case, you can use the C</g> modifier in a C<while> loop, keeping count of matches. - $WANT = 3; - $count = 0; - $_ = "One fish two fish red fish blue fish"; - while (/(\w+)\s+fish\b/gi) { - if (++$count == $WANT) { - print "The third fish is a $1 one.\n"; - } - } + $WANT = 3; + $count = 0; + $_ = "One fish two fish red fish blue fish"; + while (/(\w+)\s+fish\b/gi) { + if (++$count == $WANT) { + print "The third fish is a $1 one.\n"; + } + } That prints out: C<"The third fish is a red one."> You can also use a repetition count and repeated pattern like this: - /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i; + /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i; =head2 How can I count the number of occurrences of a substring within a string? @@ -761,9 +760,9 @@ There are a number of ways, with varying efficiency. If you want a count of a certain single character (X) within a string, you can use the C<tr///> function like so: - $string = "ThisXlineXhasXsomeXx'sXinXit"; - $count = ($string =~ tr/X//); - print "There are $count X characters in the string"; + $string = "ThisXlineXhasXsomeXx'sXinXit"; + $count = ($string =~ tr/X//); + print "There are $count X characters in the string"; This is fine if you are just looking for a single character. However, if you are trying to count multiple character substrings within a @@ -771,9 +770,9 @@ larger string, C<tr///> won't work. What you can do is wrap a while() loop around a global pattern match. For example, let's count negative integers: - $string = "-9 55 48 -2 23 -76 4 14 -44"; - while ($string =~ /-\d+/g) { $count++ } - print "There are $count negative numbers in the string"; + $string = "-9 55 48 -2 23 -76 4 14 -44"; + while ($string =~ /-\d+/g) { $count++ } + print "There are $count negative numbers in the string"; Another version uses a global match in list context, then assigns the result to a scalar, producing a count of the number of matches. @@ -784,27 +783,28 @@ result to a scalar, producing a count of the number of matches. To make the first letter of each word upper case: - $line =~ s/\b(\w)/\U$1/g; + $line =~ s/\b(\w)/\U$1/g; This has the strange effect of turning "C<don't do it>" into "C<Don'T Do It>". Sometimes you might want this. Other times you might need a more thorough solution (Suggested by brian d foy): - $string =~ s/ ( - (^\w) #at the beginning of the line - | # or - (\s\w) #preceded by whitespace - ) - /\U$1/xg; - $string =~ /([\w']+)/\u\L$1/g; + $string =~ s/ ( + (^\w) #at the beginning of the line + | # or + (\s\w) #preceded by whitespace + ) + /\U$1/xg; + + $string =~ s/([\w']+)/\u\L$1/g; To make the whole line upper case: - $line = uc($line); + $line = uc($line); To force each word to be lower case, with the first letter upper case: - $line =~ s/(\w+)/\u\L$1/g; + $line =~ s/(\w+)/\u\L$1/g; You can (and probably should) enable locale awareness of those characters by placing a C<use locale> pragma in your program. @@ -818,52 +818,49 @@ Stop Worrying and Love the Bomb>, for example. Damian Conway's L<Text::Autoformat> module provides some smart case transformations: - use Text::Autoformat; - my $x = "Dr. Strangelove or: How I Learned to Stop ". - "Worrying and Love the Bomb"; + use Text::Autoformat; + my $x = "Dr. Strangelove or: How I Learned to Stop ". + "Worrying and Love the Bomb"; - print $x, "\n"; - for my $style (qw( sentence title highlight )) - { - print autoformat($x, { case => $style }), "\n"; - } + print $x, "\n"; + for my $style (qw( sentence title highlight )) { + print autoformat($x, { case => $style }), "\n"; + } =head2 How can I split a [character] delimited string except when inside [character]? -Several modules can handle this sort of pasing---Text::Balanced, -Text::CSV, Text::CSV_XS, and Text::ParseWords, among others. +Several modules can handle this sort of parsing--C<Text::Balanced>, +C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others. Take the example case of trying to split a string that is comma-separated into its different fields. You can't use C<split(/,/)> because you shouldn't split if the comma is inside quotes. For example, take a data line like this: - SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped" + SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped" Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl, author of I<Mastering Regular Expressions>, to handle these for us. He -suggests (assuming your string is contained in $text): +suggests (assuming your string is contained in C<$text>): - @new = (); - push(@new, $+) while $text =~ m{ - "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes - | ([^,]+),? - | , - }gx; - push(@new, undef) if substr($text,-1,1) eq ','; + @new = (); + push(@new, $+) while $text =~ m{ + "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes + | ([^,]+),? + | , + }gx; + push(@new, undef) if substr($text,-1,1) eq ','; If you want to represent quotation marks inside a quotation-mark-delimited field, escape them with backslashes (eg, C<"like \"this\"">. -Alternatively, the Text::ParseWords module (part of the standard Perl -distribution) lets you say: - - use Text::ParseWords; - @new = quotewords(",", 0, $text); +Alternatively, the C<Text::ParseWords> module (part of the standard +Perl distribution) lets you say: -There's also a Text::CSV (Comma-Separated Values) module on CPAN. + use Text::ParseWords; + @new = quotewords(",", 0, $text); =head2 How do I strip blank space from the beginning/end of a string? @@ -904,7 +901,7 @@ to each logical line in the string by adding the C</m> flag (for embedded newline, so it doesn't remove it. It still removes the newline at the end of the string. - $string =~ s/^\s+|\s+$//gm; + $string =~ s/^\s+|\s+$//gm; Remember that lines consisting entirely of whitespace will disappear, since the first part of the alternation can match the entire string @@ -929,20 +926,20 @@ truncate the result. The C<pack> function can only pad strings on the right with blanks and it will truncate the result to a maximum length of C<$pad_len>. - # Left padding a string with blanks (no truncation): + # Left padding a string with blanks (no truncation): $padded = sprintf("%${pad_len}s", $text); $padded = sprintf("%*s", $pad_len, $text); # same thing - # Right padding a string with blanks (no truncation): + # Right padding a string with blanks (no truncation): $padded = sprintf("%-${pad_len}s", $text); $padded = sprintf("%-*s", $pad_len, $text); # same thing - # Left padding a number with 0 (no truncation): + # Left padding a number with 0 (no truncation): $padded = sprintf("%0${pad_len}d", $num); $padded = sprintf("%0*d", $pad_len, $num); # same thing - # Right padding a string with blanks using pack (will truncate): - $padded = pack("A$pad_len",$text); + # Right padding a string with blanks using pack (will truncate): + $padded = pack("A$pad_len",$text); If you need to pad with a character other than blank or zero you can use one of the following methods. They all generate a pad string with the @@ -951,50 +948,50 @@ not truncate C<$text>. Left and right padding with any character, creating a new string: - $padded = $pad_char x ( $pad_len - length( $text ) ) . $text; - $padded = $text . $pad_char x ( $pad_len - length( $text ) ); + $padded = $pad_char x ( $pad_len - length( $text ) ) . $text; + $padded = $text . $pad_char x ( $pad_len - length( $text ) ); Left and right padding with any character, modifying C<$text> directly: - substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ); - $text .= $pad_char x ( $pad_len - length( $text ) ); + substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ); + $text .= $pad_char x ( $pad_len - length( $text ) ); =head2 How do I extract selected columns from a string? -Use substr() or unpack(), both documented in L<perlfunc>. +Use C<substr()> or C<unpack()>, both documented in L<perlfunc>. If you prefer thinking in terms of columns instead of widths, you can use this kind of thing: - # determine the unpack format needed to split Linux ps output - # arguments are cut columns - my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72); - - sub cut2fmt { - my(@positions) = @_; - my $template = ''; - my $lastpos = 1; - for my $place (@positions) { - $template .= "A" . ($place - $lastpos) . " "; - $lastpos = $place; - } - $template .= "A*"; - return $template; - } + # determine the unpack format needed to split Linux ps output + # arguments are cut columns + my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72); + + sub cut2fmt { + my(@positions) = @_; + my $template = ''; + my $lastpos = 1; + for my $place (@positions) { + $template .= "A" . ($place - $lastpos) . " "; + $lastpos = $place; + } + $template .= "A*"; + return $template; + } =head2 How do I find the soundex value of a string? (contributed by brian d foy) You can use the Text::Soundex module. If you want to do fuzzy or close -matching, you might also try the String::Approx, and Text::Metaphone, -and Text::DoubleMetaphone modules. +matching, you might also try the C<String::Approx>, and +C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules. =head2 How can I expand variables in text strings? Let's assume that you have a string that contains placeholder variables. - $text = 'this has a $foo in it and a $bar'; + $text = 'this has a $foo in it and a $bar'; You can use a substitution with a double evaluation. The first /e turns C<$1> into C<$foo>, and the second /e turns @@ -1002,48 +999,49 @@ C<$foo> into its value. You may want to wrap this in an C<eval>: if you try to get the value of an undeclared variable while running under C<use strict>, you get a fatal error. - eval { $text =~ s/(\$\w+)/$1/eeg }; - die if $@; + eval { $text =~ s/(\$\w+)/$1/eeg }; + die if $@; It's probably better in the general case to treat those variables as entries in some special hash. For example: - %user_defs = ( - foo => 23, - bar => 19, - ); - $text =~ s/\$(\w+)/$user_defs{$1}/g; + %user_defs = ( + foo => 23, + bar => 19, + ); + $text =~ s/\$(\w+)/$user_defs{$1}/g; =head2 What's wrong with always quoting "$vars"? -The problem is that those double-quotes force stringification-- -coercing numbers and references into strings--even when you -don't want them to be strings. Think of it this way: double-quote -expansion is used to produce new strings. If you already -have a string, why do you need more? +The problem is that those double-quotes force +stringification--coercing numbers and references into +strings--even when you don't want them to be strings. Think +of it this way: double-quote expansion is used to produce +new strings. If you already have a string, why do you need +more? If you get used to writing odd things like these: - print "$var"; # BAD - $new = "$old"; # BAD - somefunc("$var"); # BAD + print "$var"; # BAD + $new = "$old"; # BAD + somefunc("$var"); # BAD You'll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct: - print $var; - $new = $old; - somefunc($var); + print $var; + $new = $old; + somefunc($var); Otherwise, besides slowing you down, you're going to break code when the thing in the scalar is actually neither a string nor a number, but a reference: - func(\@array); - sub func { - my $aref = shift; - my $oref = "$aref"; # WRONG - } + func(\@array); + sub func { + my $aref = shift; + my $oref = "$aref"; # WRONG + } You can also get into subtle problems on those few operations in Perl that actually do care about the difference between a string and a @@ -1052,9 +1050,9 @@ syscall() function. Stringification also destroys arrays. - @lines = `command`; - print "@lines"; # WRONG - extra blanks - print @lines; # right + @lines = `command`; + print "@lines"; # WRONG - extra blanks + print @lines; # right =head2 Why don't my E<lt>E<lt>HERE documents work? @@ -1112,7 +1110,7 @@ subsequent line. This works with leading special strings, dynamically determined: - $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP'; + $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP'; @@@ int @@@ runops() { @@@ SAVEI32(runlevel); @@ -1121,12 +1119,12 @@ This works with leading special strings, dynamically determined: @@@ TAINT_NOT; @@@ return 0; @@@ } - MAIN_INTERPRETER_LOOP + MAIN_INTERPRETER_LOOP Or with a fixed amount of leading whitespace, with remaining indentation correctly preserved: - $poem = fix<<EVER_ON_AND_ON; + $poem = fix<<EVER_ON_AND_ON; Now far ahead the Road has gone, And I must follow, if I can, Pursuing it with eager feet, @@ -1134,29 +1132,29 @@ indentation correctly preserved: Where many paths and errands meet. And whither then? I cannot say. --Bilbo in /usr/src/perl/pp_ctl.c - EVER_ON_AND_ON + EVER_ON_AND_ON =head1 Data: Arrays =head2 What is the difference between a list and an array? -An array has a changeable length. A list does not. An array is something -you can push or pop, while a list is a set of values. Some people make -the distinction that a list is a value while an array is a variable. -Subroutines are passed and return lists, you put things into list -context, you initialize arrays with lists, and you foreach() across -a list. C<@> variables are arrays, anonymous arrays are arrays, arrays -in scalar context behave like the number of elements in them, subroutines -access their arguments through the array C<@_>, and push/pop/shift only work -on arrays. +An array has a changeable length. A list does not. An array is +something you can push or pop, while a list is a set of values. Some +people make the distinction that a list is a value while an array is a +variable. Subroutines are passed and return lists, you put things into +list context, you initialize arrays with lists, and you C<foreach()> +across a list. C<@> variables are arrays, anonymous arrays are +arrays, arrays in scalar context behave like the number of elements in +them, subroutines access their arguments through the array C<@_>, and +C<push>/C<pop>/C<shift> only work on arrays. As a side note, there's no such thing as a list in scalar context. When you say - $scalar = (2, 5, 7, 9); + $scalar = (2, 5, 7, 9); you're using the comma operator in scalar context, so it uses the scalar -comma operator. There never was a list there at all! This causes the +comma operator. There never was a list there at all! This causes the last value to be returned: 9. =head2 What is the difference between $array[1] and @array[1]? @@ -1169,11 +1167,11 @@ scalar value in it (very, very rarely; nearly never, in fact). Sometimes it doesn't make a difference, but sometimes it does. For example, compare: - $good[0] = `some program that outputs several lines`; + $good[0] = `some program that outputs several lines`; with - @bad[0] = `same program that outputs several lines`; + @bad[0] = `same program that outputs several lines`; The C<use warnings> pragma and the B<-w> flag will warn you about these matters. @@ -1190,11 +1188,21 @@ create the hash then extract the keys. It's not important how you create that hash: just that you use C<keys> to get the unique elements. - my %hash = map { $_, 1 } @array; - # or a hash slice: @hash{ @array } = (); - # or a foreach: $hash{$_} = 1 foreach ( @array ); + my %hash = map { $_, 1 } @array; + # or a hash slice: @hash{ @array } = (); + # or a foreach: $hash{$_} = 1 foreach ( @array ); + + my @unique = keys %hash; - my @unique = keys %hash; +If you want to use a module, try the C<uniq> function from +C<List::MoreUtils>. In list context it returns the unique elements, +preserving their order in the list. In scalar context, it returns the +number of unique elements. + + use List::MoreUtils qw(uniq); + + my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7 + my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7 You can also go through each element and skip the ones you've seen before. Use a hash to keep track. The first time the loop sees an @@ -1203,8 +1211,8 @@ creates the key and immediately uses its value, which is C<undef>, so the loop continues to the C<push> and increments the value for that key. The next time the loop sees that same element, its key exists in the hash I<and> the value for that key is true (since it's not 0 or -undef), so the next skips that iteration and the loop goes to the next -element. +C<undef>), so the next skips that iteration and the loop goes to the +next element. my @unique = (); my %seen = (); @@ -1218,8 +1226,8 @@ element. You can write this more briefly using a grep, which does the same thing. - my %seen = (); - my @unique = grep { ! $seen{ $_ }++ } @array; + my %seen = (); + my @unique = grep { ! $seen{ $_ }++ } @array; =head2 How can I tell whether a certain element is contained in a list or array? @@ -1234,29 +1242,29 @@ are going to make this query many times over arbitrary string values, the fastest way is probably to invert the original array and maintain a hash whose keys are the first array's values. - @blues = qw/azure cerulean teal turquoise lapis-lazuli/; - %is_blue = (); - for (@blues) { $is_blue{$_} = 1 } + @blues = qw/azure cerulean teal turquoise lapis-lazuli/; + %is_blue = (); + for (@blues) { $is_blue{$_} = 1 } -Now you can check whether $is_blue{$some_color}. It might have been a -good idea to keep the blues all in a hash in the first place. +Now you can check whether C<$is_blue{$some_color}>. It might have +been a good idea to keep the blues all in a hash in the first place. If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space: - @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31); - @is_tiny_prime = (); - for (@primes) { $is_tiny_prime[$_] = 1 } - # or simply @istiny_prime[@primes] = (1) x @primes; + @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31); + @is_tiny_prime = (); + for (@primes) { $is_tiny_prime[$_] = 1 } + # or simply @istiny_prime[@primes] = (1) x @primes; Now you check whether $is_tiny_prime[$some_number]. If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead: - @articles = ( 1..10, 150..2000, 2017 ); - undef $read; - for (@articles) { vec($read,$_,1) = 1 } + @articles = ( 1..10, 150..2000, 2017 ); + undef $read; + for (@articles) { vec($read,$_,1) = 1 } Now check whether C<vec($read,$n,1)> is true for some C<$n>. @@ -1264,7 +1272,7 @@ These methods guarantee fast individual tests but require a re-organization of the original list or array. They only pay off if you have to test multiple values against the same array. -If you are testing only once, the standard module List::Util exports +If you are testing only once, the standard module C<List::Util> exports the function C<first> for this purpose. It works by stopping once it finds the element. It's written in C for speed, and its Perl equivalant looks like this subroutine: @@ -1291,62 +1299,62 @@ list context. =head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays? -Use a hash. Here's code to do both and more. It assumes that -each element is unique in a given array: +Use a hash. Here's code to do both and more. It assumes that each +element is unique in a given array: - @union = @intersection = @difference = (); - %count = (); - foreach $element (@array1, @array2) { $count{$element}++ } - foreach $element (keys %count) { - push @union, $element; - push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element; - } + @union = @intersection = @difference = (); + %count = (); + foreach $element (@array1, @array2) { $count{$element}++ } + foreach $element (keys %count) { + push @union, $element; + push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element; + } -Note that this is the I<symmetric difference>, that is, all elements in -either A or in B but not in both. Think of it as an xor operation. +Note that this is the I<symmetric difference>, that is, all elements +in either A or in B but not in both. Think of it as an xor operation. =head2 How do I test whether two arrays or hashes are equal? -The following code works for single-level arrays. It uses a stringwise -comparison, and does not distinguish defined versus undefined empty -strings. Modify if you have other needs. +The following code works for single-level arrays. It uses a +stringwise comparison, and does not distinguish defined versus +undefined empty strings. Modify if you have other needs. - $are_equal = compare_arrays(\@frogs, \@toads); + $are_equal = compare_arrays(\@frogs, \@toads); - sub compare_arrays { - my ($first, $second) = @_; - no warnings; # silence spurious -w undef complaints - return 0 unless @$first == @$second; - for (my $i = 0; $i < @$first; $i++) { - return 0 if $first->[$i] ne $second->[$i]; - } - return 1; - } + sub compare_arrays { + my ($first, $second) = @_; + no warnings; # silence spurious -w undef complaints + return 0 unless @$first == @$second; + for (my $i = 0; $i < @$first; $i++) { + return 0 if $first->[$i] ne $second->[$i]; + } + return 1; + } For multilevel structures, you may wish to use an approach more -like this one. It uses the CPAN module FreezeThaw: +like this one. It uses the CPAN module C<FreezeThaw>: - use FreezeThaw qw(cmpStr); - @a = @b = ( "this", "that", [ "more", "stuff" ] ); + use FreezeThaw qw(cmpStr); + @a = @b = ( "this", "that", [ "more", "stuff" ] ); - printf "a and b contain %s arrays\n", - cmpStr(\@a, \@b) == 0 - ? "the same" - : "different"; + printf "a and b contain %s arrays\n", + cmpStr(\@a, \@b) == 0 + ? "the same" + : "different"; -This approach also works for comparing hashes. Here -we'll demonstrate two different answers: +This approach also works for comparing hashes. Here we'll demonstrate +two different answers: - use FreezeThaw qw(cmpStr cmpStrHard); + use FreezeThaw qw(cmpStr cmpStrHard); - %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] ); - $a{EXTRA} = \%b; - $b{EXTRA} = \%a; + %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] ); + $a{EXTRA} = \%b; + $b{EXTRA} = \%a; - printf "a and b contain %s hashes\n", + printf "a and b contain %s hashes\n", cmpStr(\%a, \%b) == 0 ? "the same" : "different"; - printf "a and b contain %s hashes\n", + printf "a and b contain %s hashes\n", cmpStrHard(\%a, \%b) == 0 ? "the same" : "different"; @@ -1357,19 +1365,19 @@ an exercise to the reader. =head2 How do I find the first array element for which a condition is true? To find the first array element which satisfies a condition, you can -use the first() function in the List::Util module, which comes with -Perl 5.8. This example finds the first element that contains "Perl". +use the C<first()> function in the C<List::Util> module, which comes +with Perl 5.8. This example finds the first element that contains +"Perl". use List::Util qw(first); my $element = first { /Perl/ } @array; -If you cannot use List::Util, you can make your own loop to do the +If you cannot use C<List::Util>, you can make your own loop to do the same thing. Once you find the element, you stop the loop with last. my $found; - foreach ( @array ) - { + foreach ( @array ) { if( /Perl/ ) { $found = $_; last } } @@ -1378,10 +1386,8 @@ and check the array element at each index until you find one that satisfies the condition. my( $found, $index ) = ( undef, -1 ); - for( $i = 0; $i < @array; $i++ ) - { - if( $array[$i] =~ /Perl/ ) - { + for( $i = 0; $i < @array; $i++ ) { + if( $array[$i] =~ /Perl/ ) { $found = $array[$i]; $index = $i; last; @@ -1391,49 +1397,50 @@ that satisfies the condition. =head2 How do I handle linked lists? In general, you usually don't need a linked list in Perl, since with -regular arrays, you can push and pop or shift and unshift at either end, -or you can use splice to add and/or remove arbitrary number of elements at -arbitrary points. Both pop and shift are both O(1) operations on Perl's -dynamic arrays. In the absence of shifts and pops, push in general -needs to reallocate on the order every log(N) times, and unshift will -need to copy pointers each time. +regular arrays, you can push and pop or shift and unshift at either +end, or you can use splice to add and/or remove arbitrary number of +elements at arbitrary points. Both pop and shift are both O(1) +operations on Perl's dynamic arrays. In the absence of shifts and +pops, push in general needs to reallocate on the order every log(N) +times, and unshift will need to copy pointers each time. If you really, really wanted, you could use structures as described in -L<perldsc> or L<perltoot> and do just what the algorithm book tells you -to do. For example, imagine a list node like this: +L<perldsc> or L<perltoot> and do just what the algorithm book tells +you to do. For example, imagine a list node like this: - $node = { - VALUE => 42, - LINK => undef, - }; + $node = { + VALUE => 42, + LINK => undef, + }; You could walk the list this way: - print "List: "; - for ($node = $head; $node; $node = $node->{LINK}) { - print $node->{VALUE}, " "; - } - print "\n"; + print "List: "; + for ($node = $head; $node; $node = $node->{LINK}) { + print $node->{VALUE}, " "; + } + print "\n"; You could add to the list this way: - my ($head, $tail); - $tail = append($head, 1); # grow a new head - for $value ( 2 .. 10 ) { - $tail = append($tail, $value); - } + my ($head, $tail); + $tail = append($head, 1); # grow a new head + for $value ( 2 .. 10 ) { + $tail = append($tail, $value); + } - sub append { - my($list, $value) = @_; - my $node = { VALUE => $value }; - if ($list) { - $node->{LINK} = $list->{LINK}; - $list->{LINK} = $node; - } else { - $_[0] = $node; # replace caller's version - } - return $node; - } + sub append { + my($list, $value) = @_; + my $node = { VALUE => $value }; + if ($list) { + $node->{LINK} = $list->{LINK}; + $list->{LINK} = $node; + } + else { + $_[0] = $node; # replace caller's version + } + return $node; + } But again, Perl's built-in are virtually always good enough. @@ -1442,71 +1449,81 @@ But again, Perl's built-in are virtually always good enough. Circular lists could be handled in the traditional fashion with linked lists, or you could just do something like this with an array: - unshift(@array, pop(@array)); # the last shall be first - push(@array, shift(@array)); # and vice versa + unshift(@array, pop(@array)); # the last shall be first + push(@array, shift(@array)); # and vice versa + +You can also use C<Tie::Cycle>: + + use Tie::Cycle; + + tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ]; + + print $cycle; # FFFFFF + print $cycle; # 000000 + print $cycle; # FFFF00 =head2 How do I shuffle an array randomly? If you either have Perl 5.8.0 or later installed, or if you have Scalar-List-Utils 1.03 or later installed, you can say: - use List::Util 'shuffle'; + use List::Util 'shuffle'; @shuffled = shuffle(@list); If not, you can use a Fisher-Yates shuffle. - sub fisher_yates_shuffle { - my $deck = shift; # $deck is a reference to an array - my $i = @$deck; - while (--$i) { - my $j = int rand ($i+1); - @$deck[$i,$j] = @$deck[$j,$i]; - } - } + sub fisher_yates_shuffle { + my $deck = shift; # $deck is a reference to an array + my $i = @$deck; + while (--$i) { + my $j = int rand ($i+1); + @$deck[$i,$j] = @$deck[$j,$i]; + } + } - # shuffle my mpeg collection - # - my @mpeg = <audio/*/*.mp3>; - fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place - print @mpeg; + # shuffle my mpeg collection + # + my @mpeg = <audio/*/*.mp3>; + fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place + print @mpeg; Note that the above implementation shuffles an array in place, -unlike the List::Util::shuffle() which takes a list and returns +unlike the C<List::Util::shuffle()> which takes a list and returns a new shuffled list. You've probably seen shuffling algorithms that work using splice, randomly picking another element to swap the current element with - srand; - @new = (); - @old = 1 .. 10; # just a demo - while (@old) { - push(@new, splice(@old, rand @old, 1)); - } + srand; + @new = (); + @old = 1 .. 10; # just a demo + while (@old) { + push(@new, splice(@old, rand @old, 1)); + } -This is bad because splice is already O(N), and since you do it N times, -you just invented a quadratic algorithm; that is, O(N**2). This does -not scale, although Perl is so efficient that you probably won't notice -this until you have rather largish arrays. +This is bad because splice is already O(N), and since you do it N +times, you just invented a quadratic algorithm; that is, O(N**2). +This does not scale, although Perl is so efficient that you probably +won't notice this until you have rather largish arrays. =head2 How do I process/modify each element of an array? Use C<for>/C<foreach>: - for (@lines) { + for (@lines) { s/foo/bar/; # change that word tr/XZ/ZX/; # swap those letters - } + } Here's another; let's compute spherical volumes: - for (@volumes = @radii) { # @volumes has changed parts + for (@volumes = @radii) { # @volumes has changed parts $_ **= 3; $_ *= (4/3) * 3.14159; # this will be constant folded - } + } -which can also be done with map() which is made to transform +which can also be done with C<map()> which is made to transform one list into another: @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii; @@ -1516,9 +1533,9 @@ hash, you can use the C<values> function. As of Perl 5.6 the values are not copied, so if you modify $orbit (in this case), you modify the value. - for $orbit ( values %orbits ) { + for $orbit ( values %orbits ) { ($orbit **= 3) *= (4/3) * 3.14159; - } + } Prior to perl 5.6 C<values> returned copies of the values, so older perl code often contains constructions such as @@ -1527,38 +1544,39 @@ the hash is to be modified. =head2 How do I select a random element from an array? -Use the rand() function (see L<perlfunc/rand>): +Use the C<rand()> function (see L<perlfunc/rand>): - $index = rand @array; - $element = $array[$index]; + $index = rand @array; + $element = $array[$index]; Or, simply: - my $element = $array[ rand @array ]; + + my $element = $array[ rand @array ]; =head2 How do I permute N elements of a list? -Use the List::Permutor module on CPAN. If the list is -actually an array, try the Algorithm::Permute module (also -on CPAN). It's written in XS code and is very efficient. +Use the C<List::Permutor> module on CPAN. If the list is actually an +array, try the C<Algorithm::Permute> module (also on CPAN). It's +written in XS code and is very efficient. use Algorithm::Permute; my @array = 'a'..'d'; my $p_iterator = Algorithm::Permute->new ( \@array ); while (my @perm = $p_iterator->next) { print "next permutation: (@perm)\n"; - } + } For even faster execution, you could do: - use Algorithm::Permute; - my @array = 'a'..'d'; - Algorithm::Permute::permute { - print "next permutation: (@array)\n"; - } @array; + use Algorithm::Permute; + my @array = 'a'..'d'; + Algorithm::Permute::permute { + print "next permutation: (@array)\n"; + } @array; Here's a little program that generates all permutations of all the words on each line of input. The algorithm embodied -in the permute() function is discussed in Volume 4 (still +in the C<permute()> function is discussed in Volume 4 (still unpublished) of Knuth's I<The Art of Computer Programming> and will work on any list: @@ -1584,7 +1602,7 @@ and will work on any list: Supply a comparison function to sort() (described in L<perlfunc/sort>): - @list = sort { $a <=> $b } @list; + @list = sort { $a <=> $b } @list; The default sort function is cmp, string comparison, which would sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is @@ -1597,26 +1615,27 @@ same element. Here's an example of how to pull out the first word after the first number on each item, and then sort those words case-insensitively. - @idx = (); - for (@data) { - ($item) = /\d+\s*(\S+)/; - push @idx, uc($item); - } - @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ]; + @idx = (); + for (@data) { + ($item) = /\d+\s*(\S+)/; + push @idx, uc($item); + } + @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ]; which could also be written this way, using a trick that's come to be known as the Schwartzian Transform: - @sorted = map { $_->[0] } - sort { $a->[1] cmp $b->[1] } - map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data; + @sorted = map { $_->[0] } + sort { $a->[1] cmp $b->[1] } + map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data; If you need to sort on several fields, the following paradigm is useful. - @sorted = sort { field1($a) <=> field1($b) || - field2($a) cmp field2($b) || - field3($a) cmp field3($b) - } @data; + @sorted = sort { + field1($a) <=> field1($b) || + field2($a) cmp field2($b) || + field3($a) cmp field3($b) + } @data; This can be conveniently combined with precalculation of keys as given above. @@ -1625,48 +1644,53 @@ See the F<sort> article in the "Far More Than You Ever Wanted To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for more about this approach. -See also the question below on sorting hashes. +See also the question later in L<perlfaq4> on sorting hashes. =head2 How do I manipulate arrays of bits? -Use pack() and unpack(), or else vec() and the bitwise operations. - -For example, this sets $vec to have bit N set if $ints[N] was set: - - $vec = ''; - foreach(@ints) { vec($vec,$_,1) = 1 } - -Here's how, given a vector in $vec, you can -get those bits into your @ints array: - - sub bitvec_to_list { - my $vec = shift; - my @ints; - # Find null-byte density then select best algorithm - if ($vec =~ tr/\0// / length $vec > 0.95) { - use integer; - my $i; - # This method is faster with mostly null-bytes - while($vec =~ /[^\0]/g ) { - $i = -9 + 8 * pos $vec; - push @ints, $i if vec($vec, ++$i, 1); - push @ints, $i if vec($vec, ++$i, 1); - push @ints, $i if vec($vec, ++$i, 1); - push @ints, $i if vec($vec, ++$i, 1); - push @ints, $i if vec($vec, ++$i, 1); - push @ints, $i if vec($vec, ++$i, 1); - push @ints, $i if vec($vec, ++$i, 1); - push @ints, $i if vec($vec, ++$i, 1); - } - } else { - # This method is a fast general algorithm - use integer; - my $bits = unpack "b*", $vec; - push @ints, 0 if $bits =~ s/^(\d)// && $1; - push @ints, pos $bits while($bits =~ /1/g); - } - return \@ints; - } +Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise +operations. + +For example, this sets C<$vec> to have bit N set if C<$ints[N]> was +set: + + $vec = ''; + foreach(@ints) { vec($vec,$_,1) = 1 } + +Here's how, given a vector in C<$vec>, you can get those bits into your +C<@ints> array: + + sub bitvec_to_list { + my $vec = shift; + my @ints; + # Find null-byte density then select best algorithm + if ($vec =~ tr/\0// / length $vec > 0.95) { + use integer; + my $i; + + # This method is faster with mostly null-bytes + while($vec =~ /[^\0]/g ) { + $i = -9 + 8 * pos $vec; + push @ints, $i if vec($vec, ++$i, 1); + push @ints, $i if vec($vec, ++$i, 1); + push @ints, $i if vec($vec, ++$i, 1); + push @ints, $i if vec($vec, ++$i, 1); + push @ints, $i if vec($vec, ++$i, 1); + push @ints, $i if vec($vec, ++$i, 1); + push @ints, $i if vec($vec, ++$i, 1); + push @ints, $i if vec($vec, ++$i, 1); + } + } + else { + # This method is a fast general algorithm + use integer; + my $bits = unpack "b*", $vec; + push @ints, 0 if $bits =~ s/^(\d)// && $1; + push @ints, pos $bits while($bits =~ /1/g); + } + + return \@ints; + } This method gets faster the more sparse the bit vector is. (Courtesy of Tim Bunce and Winfried Koenig.) @@ -1675,60 +1699,60 @@ You can make the while loop a lot shorter with this suggestion from Benjamin Goldberg: while($vec =~ /[^\0]+/g ) { - push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8; - } + push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8; + } -Or use the CPAN module Bit::Vector: +Or use the CPAN module C<Bit::Vector>: - $vector = Bit::Vector->new($num_of_bits); - $vector->Index_List_Store(@ints); - @ints = $vector->Index_List_Read(); + $vector = Bit::Vector->new($num_of_bits); + $vector->Index_List_Store(@ints); + @ints = $vector->Index_List_Read(); -Bit::Vector provides efficient methods for bit vector, sets of small integers -and "big int" math. +C<Bit::Vector> provides efficient methods for bit vector, sets of +small integers and "big int" math. Here's a more extensive illustration using vec(): - # vec demo - $vector = "\xff\x0f\xef\xfe"; - print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ", + # vec demo + $vector = "\xff\x0f\xef\xfe"; + print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ", unpack("N", $vector), "\n"; - $is_set = vec($vector, 23, 1); - print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n"; - pvec($vector); - - set_vec(1,1,1); - set_vec(3,1,1); - set_vec(23,1,1); - - set_vec(3,1,3); - set_vec(3,2,3); - set_vec(3,4,3); - set_vec(3,4,7); - set_vec(3,8,3); - set_vec(3,8,7); - - set_vec(0,32,17); - set_vec(1,32,17); - - sub set_vec { - my ($offset, $width, $value) = @_; - my $vector = ''; - vec($vector, $offset, $width) = $value; - print "offset=$offset width=$width value=$value\n"; + $is_set = vec($vector, 23, 1); + print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n"; pvec($vector); - } - sub pvec { - my $vector = shift; - my $bits = unpack("b*", $vector); - my $i = 0; - my $BASE = 8; + set_vec(1,1,1); + set_vec(3,1,1); + set_vec(23,1,1); + + set_vec(3,1,3); + set_vec(3,2,3); + set_vec(3,4,3); + set_vec(3,4,7); + set_vec(3,8,3); + set_vec(3,8,7); + + set_vec(0,32,17); + set_vec(1,32,17); + + sub set_vec { + my ($offset, $width, $value) = @_; + my $vector = ''; + vec($vector, $offset, $width) = $value; + print "offset=$offset width=$width value=$value\n"; + pvec($vector); + } - print "vector length in bytes: ", length($vector), "\n"; - @bytes = unpack("A8" x length($vector), $bits); - print "bits are: @bytes\n\n"; - } + sub pvec { + my $vector = shift; + my $bits = unpack("b*", $vector); + my $i = 0; + my $BASE = 8; + + print "vector length in bytes: ", length($vector), "\n"; + @bytes = unpack("A8" x length($vector), $bits); + print "bits are: @bytes\n\n"; + } =head2 Why does defined() return true on empty arrays and hashes? @@ -1743,9 +1767,9 @@ in the 5.004 release or later of Perl for more detail. Use the each() function (see L<perlfunc/each>) if you don't care whether it's sorted: - while ( ($key, $value) = each %hash) { - print "$key = $value\n"; - } + while ( ($key, $value) = each %hash) { + print "$key = $value\n"; + } If you want it sorted, you'll have to use foreach() on the result of sorting the keys as shown in an earlier question. @@ -1766,23 +1790,23 @@ entry for C<each()> in L<perlfunc>. Create a reverse hash: - %by_value = reverse %by_key; - $key = $by_value{$value}; + %by_value = reverse %by_key; + $key = $by_value{$value}; That's not particularly efficient. It would be more space-efficient to use: - while (($key, $value) = each %by_key) { - $by_value{$value} = $key; - } + while (($key, $value) = each %by_key) { + $by_value{$value} = $key; + } If your hash could have repeated values, the methods above will only find one of the associated keys. This may or may not worry you. If it does worry you, you can always reverse the hash into a hash of arrays instead: - while (($key, $value) = each %by_key) { - push @{$key_list_by_value{$value}}, $key; - } + while (($key, $value) = each %by_key) { + push @{$key_list_by_value{$value}}, $key; + } =head2 How can I know how many entries are in a hash? @@ -1843,24 +1867,28 @@ we can provide a secondary sort on the hash key. } keys %hash; =head2 How can I always keep my hash sorted? +X<hash tie sort DB_File Tie::IxHash> -You can look into using the DB_File module and tie() using the -$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">. -The Tie::IxHash module from CPAN might also be instructive. +You can look into using the C<DB_File> module and C<tie()> using the +C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory +Databases">. The C<Tie::IxHash> module from CPAN might also be +instructive. Although this does keep your hash sorted, you might not +like the slow down you suffer from the tie interface. Are you sure you +need to do this? :) =head2 What's the difference between "delete" and "undef" with hashes? Hashes contain pairs of scalars: the first is the key, the second is the value. The key will be coerced to a string, although the value can be any kind of scalar: string, -number, or reference. If a key $key is present in +number, or reference. If a key C<$key> is present in %hash, C<exists($hash{$key})> will return true. The value for a given key can be C<undef>, in which case C<$hash{$key}> will be C<undef> while C<exists $hash{$key}> will return true. This corresponds to (C<$key>, C<undef>) being in the hash. -Pictures help... here's the %hash table: +Pictures help... here's the C<%hash> table: keys values +------+------+ @@ -1941,34 +1969,34 @@ end up doing is not what they do with ordinary hashes. Using C<keys %hash> in scalar context returns the number of keys in the hash I<and> resets the iterator associated with the hash. You may -need to do this if you use C<last> to exit a loop early so that when you -re-enter it, the hash iterator has been reset. +need to do this if you use C<last> to exit a loop early so that when +you re-enter it, the hash iterator has been reset. =head2 How can I get the unique keys from two hashes? First you extract the keys from the hashes into lists, then solve the "removing duplicates" problem described above. For example: - %seen = (); - for $element (keys(%foo), keys(%bar)) { - $seen{$element}++; - } - @uniq = keys %seen; + %seen = (); + for $element (keys(%foo), keys(%bar)) { + $seen{$element}++; + } + @uniq = keys %seen; Or more succinctly: - @uniq = keys %{{%foo,%bar}}; + @uniq = keys %{{%foo,%bar}}; Or if you really want to save space: - %seen = (); - while (defined ($key = each %foo)) { - $seen{$key}++; - } - while (defined ($key = each %bar)) { - $seen{$key}++; - } - @uniq = keys %seen; + %seen = (); + while (defined ($key = each %foo)) { + $seen{$key}++; + } + while (defined ($key = each %bar)) { + $seen{$key}++; + } + @uniq = keys %seen; =head2 How can I store a multidimensional array in a DBM file? @@ -1978,21 +2006,24 @@ it on top of either DB_File or GDBM_File. =head2 How can I make my hash remember the order I put elements into it? -Use the Tie::IxHash from CPAN. +Use the C<Tie::IxHash> from CPAN. - use Tie::IxHash; - tie my %myhash, 'Tie::IxHash'; - for (my $i=0; $i<20; $i++) { - $myhash{$i} = 2*$i; - } - my @keys = keys %myhash; - # @keys = (0,1,2,3,...) + use Tie::IxHash; + + tie my %myhash, 'Tie::IxHash'; + + for (my $i=0; $i<20; $i++) { + $myhash{$i} = 2*$i; + } + + my @keys = keys %myhash; + # @keys = (0,1,2,3,...) =head2 Why does passing a subroutine an undefined element in a hash create it? If you say something like: - somefunc($hash{"nonesuch key here"}); + somefunc($hash{"nonesuch key here"}); Then that element "autovivifies"; that is, it springs into existence whether you store something there or not. That's because functions @@ -2009,14 +2040,14 @@ awk's behavior. Usually a hash ref, perhaps like this: - $record = { - NAME => "Jason", - EMPNO => 132, - TITLE => "deputy peon", - AGE => 23, - SALARY => 37_000, - PALS => [ "Norbert", "Rhys", "Phineas"], - }; + $record = { + NAME => "Jason", + EMPNO => 132, + TITLE => "deputy peon", + AGE => 23, + SALARY => 37_000, + PALS => [ "Norbert", "Rhys", "Phineas"], + }; References are documented in L<perlref> and the upcoming L<perlreftut>. Examples of complex data structures are given in L<perldsc> and @@ -2029,32 +2060,27 @@ in L<perltoot>. Hash keys are strings, so you can't really use a reference as the key. When you try to do that, perl turns the reference into its stringified -form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get back -the reference from the stringified form, at least without doing some -extra work on your own. Also remember that hash keys must be unique, but -two different variables can store the same reference (and those variables -can change later). +form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get +back the reference from the stringified form, at least without doing +some extra work on your own. Also remember that hash keys must be +unique, but two different variables can store the same reference (and +those variables can change later). -The Tie::RefHash module, which is distributed with perl, might be what -you want. It handles that extra work. +The C<Tie::RefHash> module, which is distributed with perl, might be +what you want. It handles that extra work. =head1 Data: Misc =head2 How do I handle binary data correctly? -Perl is binary clean, so this shouldn't be a problem. For example, -this works fine (assuming the files are found): - - if (`cat /vmunix` =~ /gzip/) { - print "Your kernel is GNU-zip enabled!\n"; - } - -On less elegant (read: Byzantine) systems, however, you have -to play tedious games with "text" versus "binary" files. See -L<perlfunc/"binmode"> or L<perlopentut>. +Perl is binary clean, so it can handle binary data just fine. +On Windows or DOS, however, you have to use C<binmode> for binary +files to avoid conversions for line endings. In general, you should +use C<binmode> any time you want to work with binary data. -If you're concerned about 8-bit ASCII data, then see L<perllocale>. +Also see L<perlfunc/"binmode"> or L<perlopentut>. +If you're concerned about 8-bit textual data then see L<perllocale>. If you want to deal with multibyte characters, however, there are some gotchas. See the section on Regular Expressions. @@ -2063,104 +2089,104 @@ some gotchas. See the section on Regular Expressions. Assuming that you don't care about IEEE notations like "NaN" or "Infinity", you probably just want to use a regular expression. - if (/\D/) { print "has nondigits\n" } - if (/^\d+$/) { print "is a whole number\n" } - if (/^-?\d+$/) { print "is an integer\n" } - if (/^[+-]?\d+$/) { print "is a +/- integer\n" } - if (/^-?\d+\.?\d*$/) { print "is a real number\n" } - if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" } - if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/) + if (/\D/) { print "has nondigits\n" } + if (/^\d+$/) { print "is a whole number\n" } + if (/^-?\d+$/) { print "is an integer\n" } + if (/^[+-]?\d+$/) { print "is a +/- integer\n" } + if (/^-?\d+\.?\d*$/) { print "is a real number\n" } + if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" } + if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/) { print "a C float\n" } There are also some commonly used modules for the task. L<Scalar::Util> (distributed with 5.8) provides access to perl's -internal function C<looks_like_number> for determining -whether a variable looks like a number. L<Data::Types> -exports functions that validate data types using both the -above and other regular expressions. Thirdly, there is -C<Regexp::Common> which has regular expressions to match -various types of numbers. Those three modules are available -from the CPAN. +internal function C<looks_like_number> for determining whether a +variable looks like a number. L<Data::Types> exports functions that +validate data types using both the above and other regular +expressions. Thirdly, there is C<Regexp::Common> which has regular +expressions to match various types of numbers. Those three modules are +available from the CPAN. If you're on a POSIX system, Perl supports the C<POSIX::strtod> -function. Its semantics are somewhat cumbersome, so here's a C<getnum> -wrapper function for more convenient access. This function takes -a string and returns the number it found, or C<undef> for input that -isn't a C float. The C<is_numeric> function is a front end to C<getnum> -if you just want to say, "Is this a float?" - - sub getnum { - use POSIX qw(strtod); - my $str = shift; - $str =~ s/^\s+//; - $str =~ s/\s+$//; - $! = 0; - my($num, $unparsed) = strtod($str); - if (($str eq '') || ($unparsed != 0) || $!) { - return undef; - } else { - return $num; - } - } +function. Its semantics are somewhat cumbersome, so here's a +C<getnum> wrapper function for more convenient access. This function +takes a string and returns the number it found, or C<undef> for input +that isn't a C float. The C<is_numeric> function is a front end to +C<getnum> if you just want to say, "Is this a float?" + + sub getnum { + use POSIX qw(strtod); + my $str = shift; + $str =~ s/^\s+//; + $str =~ s/\s+$//; + $! = 0; + my($num, $unparsed) = strtod($str); + if (($str eq '') || ($unparsed != 0) || $!) { + return undef; + } + else { + return $num; + } + } - sub is_numeric { defined getnum($_[0]) } + sub is_numeric { defined getnum($_[0]) } Or you could check out the L<String::Scanf> module on the CPAN -instead. The POSIX module (part of the standard Perl distribution) provides -the C<strtod> and C<strtol> for converting strings to double and longs, -respectively. +instead. The C<POSIX> module (part of the standard Perl distribution) +provides the C<strtod> and C<strtol> for converting strings to double +and longs, respectively. =head2 How do I keep persistent data across program calls? For some specific applications, you can use one of the DBM modules. -See L<AnyDBM_File>. More generically, you should consult the FreezeThaw -or Storable modules from CPAN. Starting from Perl 5.8 Storable is part -of the standard distribution. Here's one example using Storable's C<store> +See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw> +or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part +of the standard distribution. Here's one example using C<Storable>'s C<store> and C<retrieve> functions: - use Storable; - store(\%hash, "filename"); + use Storable; + store(\%hash, "filename"); - # later on... - $href = retrieve("filename"); # by ref - %hash = %{ retrieve("filename") }; # direct to hash + # later on... + $href = retrieve("filename"); # by ref + %hash = %{ retrieve("filename") }; # direct to hash =head2 How do I print out or copy a recursive data structure? -The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great -for printing out data structures. The Storable module on CPAN (or the +The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great +for printing out data structures. The C<Storable> module on CPAN (or the 5.8 release of Perl), provides a function called C<dclone> that recursively copies its argument. - use Storable qw(dclone); - $r2 = dclone($r1); + use Storable qw(dclone); + $r2 = dclone($r1); -Where $r1 can be a reference to any kind of data structure you'd like. +Where C<$r1> can be a reference to any kind of data structure you'd like. It will be deeply copied. Because C<dclone> takes and returns references, you'd have to add extra punctuation if you had a hash of arrays that you wanted to copy. - %newhash = %{ dclone(\%oldhash) }; + %newhash = %{ dclone(\%oldhash) }; =head2 How do I define methods for every class/object? -Use the UNIVERSAL class (see L<UNIVERSAL>). +Use the C<UNIVERSAL> class (see L<UNIVERSAL>). =head2 How do I verify a credit card checksum? -Get the Business::CreditCard module from CPAN. +Get the C<Business::CreditCard> module from CPAN. =head2 How do I pack arrays of doubles or floats for XS code? -The kgbpack.c code in the PGPLOT module on CPAN does just this. +The kgbpack.c code in the C<PGPLOT> module on CPAN does just this. If you're doing a lot of float or double processing, consider using -the PDL module from CPAN instead--it makes number-crunching easy. +the C<PDL> module from CPAN instead--it makes number-crunching easy. =head1 REVISION -Revision: $Revision: 3606 $ +Revision: $Revision: 6816 $ -Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $ +Date: $Date: 2006-08-20 21:20:03 +0200 (dim, 20 aoĆ» 2006) $ See L<perlfaq> for source control details and availability. diff --git a/pod/perlfaq5.pod b/pod/perlfaq5.pod index 98be1b05ce..b4d3e7553e 100644 --- a/pod/perlfaq5.pod +++ b/pod/perlfaq5.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq5 - Files and Formats ($Revision: 3606 $) +perlfaq5 - Files and Formats ($Revision: 6019 $) =head1 DESCRIPTION @@ -40,12 +40,6 @@ per-filehandle variables. $| = 1; select($old_fh); -Some idioms can handle this in a single statement: - - select((select(OUTPUT_HANDLE), $| = 1)[0]); - - $| = 1, select $_ for select OUTPUT_HANDLE; - Some modules offer object-oriented access to handles and their variables, although they may be overkill if this is the only thing you do with them. You can use IO::Handle: @@ -176,7 +170,7 @@ temporary files in one process, use a counter: if (defined(fileno(FH)) return (*FH, $base_name); - } + } else { return (); } @@ -215,7 +209,7 @@ Storing the keys in an array means it's easy to operate on them as a group or loop over them with for. It also avoids polluting the program with global variables and using symbolic references. -=head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles? +=head2 How can I make a filehandle local to a subroutine? How do I pass filehandles between subroutines? How do I make an array of filehandles? X<filehandle, local> X<filehandle, passing> X<filehandle, reference> As of perl5.6, open() autovivifies file and directory handles @@ -233,12 +227,12 @@ and use them in the place of named handles. If you like, you can store these filehandles in an array or a hash. If you access them directly, they aren't simple scalars and you -need to give C<print> a little help by placing the filehandle +need to give C<print> a little help by placing the filehandle reference in braces. Perl can only figure it out on its own when the filehandle reference is a simple scalar. my @fhs = ( $fh1, $fh2, $fh3 ); - + for( $i = 0; $i <= $#fhs; $i++ ) { print {$fhs[$i]} "just another Perl answer, \n"; } @@ -876,7 +870,7 @@ turns off echo processing as well. $term->setcc(VTIME, 1); $term->setattr($fd_stdin, TCSANOW); } - + sub cooked { $term->setlflag($oterm); $term->setcc(VTIME, 0); @@ -967,7 +961,7 @@ FIONREAD requires a filehandle connected to a stream, meaning that sockets, pipes, and tty devices work, but I<not> files. =head2 How do I do a C<tail -f> in perl? -X<tail> +X<tail> X<IO::Handle> X<File::Tail> X<clearerr> First try @@ -975,7 +969,7 @@ First try The statement C<seek(GWFILE, 0, 1)> doesn't change the current position, but it does clear the end-of-file condition on the handle, so that the -next <GWFILE> makes Perl try again to read something. +next C<< <GWFILE> >> makes Perl try again to read something. If that doesn't work (it relies on features of your stdio implementation), then you need something more like this: @@ -988,12 +982,11 @@ then you need something more like this: seek(GWFILE, $curpos, 0); # seek to where we had been } -If this still doesn't work, look into the POSIX module. POSIX defines -the clearerr() method, which can remove the end of file condition on a -filehandle. The method: read until end of file, clearerr(), read some -more. Lather, rinse, repeat. +If this still doesn't work, look into the C<clearerr> method +from C<IO::Handle>, which resets the error and end-of-file states +on the handle. -There's also a File::Tail module from CPAN. +There's also a C<File::Tail> module from CPAN. =head2 How do I dup() a filehandle in Perl? X<dup> @@ -1122,9 +1115,9 @@ If your array contains lines, just print them: =head1 REVISION -Revision: $Revision: 3606 $ +Revision: $Revision: 6019 $ -Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $ +Date: $Date: 2006-05-04 19:04:31 +0200 (jeu, 04 mai 2006) $ See L<perlfaq> for source control details and availability. diff --git a/pod/perlfaq6.pod b/pod/perlfaq6.pod index b07c522389..8fe9c2e4c5 100644 --- a/pod/perlfaq6.pod +++ b/pod/perlfaq6.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq6 - Regular Expressions ($Revision: 3606 $) +perlfaq6 - Regular Expressions ($Revision: 6479 $) =head1 DESCRIPTION @@ -26,9 +26,9 @@ understandable. Describe what you're doing and how you're doing it, using normal Perl comments. - # turn the line into the first word, a colon, and the - # number of characters on the rest of the line - s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg; + # turn the line into the first word, a colon, and the + # number of characters on the rest of the line + s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg; =item Comments Inside the Regex @@ -39,20 +39,20 @@ a lot. C</x> lets you turn this: - s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs; + s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs; into this: - s{ < # opening angle bracket - (?: # Non-backreffing grouping paren - [^>'"] * # 0 or more things that are neither > nor ' nor " - | # or else - ".*?" # a section between double quotes (stingy match) - | # or else - '.*?' # a section between single quotes (stingy match) - ) + # all occurring one or more times - > # closing angle bracket - }{}gsx; # replace with nothing, i.e. delete + s{ < # opening angle bracket + (?: # Non-backreffing grouping paren + [^>'"] * # 0 or more things that are neither > nor ' nor " + | # or else + ".*?" # a section between double quotes (stingy match) + | # or else + '.*?' # a section between single quotes (stingy match) + ) + # all occurring one or more times + > # closing angle bracket + }{}gsx; # replace with nothing, i.e. delete It's still not quite so clear as prose, but it is very useful for describing the meaning of each part of the pattern. @@ -65,8 +65,8 @@ describes this. For example, the C<s///> above uses braces as delimiters. Selecting another delimiter can avoid quoting the delimiter within the pattern: - s/\/usr\/local/\/usr\/share/g; # bad delimiter choice - s#/usr/local#/usr/share#g; # better + s/\/usr\/local/\/usr\/share/g; # bad delimiter choice + s#/usr/local#/usr/share#g; # better =back @@ -97,31 +97,31 @@ to newlines. But it's imperative that $/ be set to something other than the default, or else we won't actually ever have a multiline record read in. - $/ = ''; # read in more whole paragraph, not just one line - while ( <> ) { - while ( /\b([\w'-]+)(\s+\1)+\b/gi ) { # word starts alpha - print "Duplicate $1 at paragraph $.\n"; + $/ = ''; # read in more whole paragraph, not just one line + while ( <> ) { + while ( /\b([\w'-]+)(\s+\1)+\b/gi ) { # word starts alpha + print "Duplicate $1 at paragraph $.\n"; + } } - } Here's code that finds sentences that begin with "From " (which would be mangled by many mailers): - $/ = ''; # read in more whole paragraph, not just one line - while ( <> ) { - while ( /^From /gm ) { # /m makes ^ match next to \n - print "leading from in paragraph $.\n"; + $/ = ''; # read in more whole paragraph, not just one line + while ( <> ) { + while ( /^From /gm ) { # /m makes ^ match next to \n + print "leading from in paragraph $.\n"; + } } - } Here's code that finds everything between START and END in a paragraph: - undef $/; # read in whole file, not just one line or paragraph - while ( <> ) { - while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries - print "$1\n"; + undef $/; # read in whole file, not just one line or paragraph + while ( <> ) { + while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries + print "$1\n"; + } } - } =head2 How can I pull out lines between two patterns that are themselves on different lines? X<..> @@ -129,11 +129,11 @@ X<..> You can use Perl's somewhat exotic C<..> operator (documented in L<perlop>): - perl -ne 'print if /START/ .. /END/' file1 file2 ... + perl -ne 'print if /START/ .. /END/' file1 file2 ... If you wanted text and not lines, you would use - perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ... + perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ... But if you want nested occurrences of C<START> through C<END>, you'll run up against the problem described in the question in this section @@ -141,13 +141,13 @@ on matching balanced text. Here's another example of using C<..>: - while (<>) { - $in_header = 1 .. /^$/; - $in_body = /^$/ .. eof(); + while (<>) { + $in_header = 1 .. /^$/; + $in_body = /^$/ .. eof(); # now choose between them - } continue { - reset if eof(); # fix $. - } + } continue { + reset if eof(); # fix $. + } =head2 I put a regular expression into $/ but it didn't work. What's wrong? X<$/, regexes in> X<$INPUT_RECORD_SEPARATOR, regexes in> @@ -159,13 +159,14 @@ if you really need to do this. If you have File::Stream, this is easy. - use File::Stream; - my $stream = File::Stream->new( - $filehandle, - separator => qr/\s*,\s*/, - ); + use File::Stream; + + my $stream = File::Stream->new( + $filehandle, + separator => qr/\s*,\s*/, + ); - print "$_\n" while <$stream>; + print "$_\n" while <$stream>; If you don't have File::Stream, you have to do a little more work. @@ -173,25 +174,25 @@ You can use the four argument form of sysread to continually add to a buffer. After you add to the buffer, you check if you have a complete line (using your regular expression). - local $_ = ""; - while( sysread FH, $_, 8192, length ) { - while( s/^((?s).*?)your_pattern/ ) { - my $record = $1; - # do stuff here. - } - } + local $_ = ""; + while( sysread FH, $_, 8192, length ) { + while( s/^((?s).*?)your_pattern/ ) { + my $record = $1; + # do stuff here. + } + } You can do the same thing with foreach and a match using the c flag and the \G anchor, if you do not mind your entire file being in memory at the end. - local $_ = ""; - while( sysread FH, $_, 8192, length ) { - foreach my $record ( m/\G((?s).*?)your_pattern/gc ) { - # do stuff here. - } - substr( $_, 0, pos ) = "" if pos; - } + local $_ = ""; + while( sysread FH, $_, 8192, length ) { + foreach my $record ( m/\G((?s).*?)your_pattern/gc ) { + # do stuff here. + } + substr( $_, 0, pos ) = "" if pos; + } =head2 How do I substitute case insensitively on the LHS while preserving case on the RHS? @@ -201,49 +202,49 @@ X<substitution, case preserving> X<s, case preserving> Here's a lovely Perlish solution by Larry Rosler. It exploits properties of bitwise xor on ASCII strings. - $_= "this is a TEsT case"; + $_= "this is a TEsT case"; - $old = 'test'; - $new = 'success'; + $old = 'test'; + $new = 'success'; - s{(\Q$old\E)} - { uc $new | (uc $1 ^ $1) . - (uc(substr $1, -1) ^ substr $1, -1) x - (length($new) - length $1) - }egi; + s{(\Q$old\E)} + { uc $new | (uc $1 ^ $1) . + (uc(substr $1, -1) ^ substr $1, -1) x + (length($new) - length $1) + }egi; - print; + print; And here it is as a subroutine, modeled after the above: - sub preserve_case($$) { - my ($old, $new) = @_; - my $mask = uc $old ^ $old; + sub preserve_case($$) { + my ($old, $new) = @_; + my $mask = uc $old ^ $old; - uc $new | $mask . - substr($mask, -1) x (length($new) - length($old)) + uc $new | $mask . + substr($mask, -1) x (length($new) - length($old)) } - $a = "this is a TEsT case"; - $a =~ s/(test)/preserve_case($1, "success")/egi; - print "$a\n"; + $a = "this is a TEsT case"; + $a =~ s/(test)/preserve_case($1, "success")/egi; + print "$a\n"; This prints: - this is a SUcCESS case + this is a SUcCESS case As an alternative, to keep the case of the replacement word if it is longer than the original, you can use this code, by Jeff Pinyan: - sub preserve_case { - my ($from, $to) = @_; - my ($lf, $lt) = map length, @_; + sub preserve_case { + my ($from, $to) = @_; + my ($lf, $lt) = map length, @_; - if ($lt < $lf) { $from = substr $from, 0, $lt } - else { $from .= substr $to, $lf } + if ($lt < $lf) { $from = substr $from, 0, $lt } + else { $from .= substr $to, $lf } - return uc $to | ($from ^ uc $from); - } + return uc $to | ($from ^ uc $from); + } This changes the sentence to "this is a SUcCess case." @@ -254,36 +255,36 @@ substitution have the same case, letter by letter, as the original. If the substitution has more characters than the string being substituted, the case of the last character is used for the rest of the substitution. - # Original by Nathan Torkington, massaged by Jeffrey Friedl - # - sub preserve_case($$) - { - my ($old, $new) = @_; - my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc - my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new)); - my ($len) = $oldlen < $newlen ? $oldlen : $newlen; - - for ($i = 0; $i < $len; $i++) { - if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) { - $state = 0; - } elsif (lc $c eq $c) { - substr($new, $i, 1) = lc(substr($new, $i, 1)); - $state = 1; - } else { - substr($new, $i, 1) = uc(substr($new, $i, 1)); - $state = 2; - } - } - # finish up with any remaining new (for when new is longer than old) - if ($newlen > $oldlen) { - if ($state == 1) { - substr($new, $oldlen) = lc(substr($new, $oldlen)); - } elsif ($state == 2) { - substr($new, $oldlen) = uc(substr($new, $oldlen)); - } - } - return $new; - } + # Original by Nathan Torkington, massaged by Jeffrey Friedl + # + sub preserve_case($$) + { + my ($old, $new) = @_; + my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc + my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new)); + my ($len) = $oldlen < $newlen ? $oldlen : $newlen; + + for ($i = 0; $i < $len; $i++) { + if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) { + $state = 0; + } elsif (lc $c eq $c) { + substr($new, $i, 1) = lc(substr($new, $i, 1)); + $state = 1; + } else { + substr($new, $i, 1) = uc(substr($new, $i, 1)); + $state = 2; + } + } + # finish up with any remaining new (for when new is longer than old) + if ($newlen > $oldlen) { + if ($state == 1) { + substr($new, $oldlen) = lc(substr($new, $oldlen)); + } elsif ($state == 2) { + substr($new, $oldlen) = uc(substr($new, $oldlen)); + } + } + return $new; + } =head2 How can I make C<\w> match national character sets? X<\w> @@ -315,11 +316,11 @@ a double-quoted string (see L<perlop> for more details). Remember also that any regex special characters will be acted on unless you precede the substitution with \Q. Here's an example: - $string = "Placido P. Octopus"; - $regex = "P."; + $string = "Placido P. Octopus"; + $regex = "P."; - $string =~ s/$regex/Polyp/; - # $string is now "Polypacido P. Octopus" + $string =~ s/$regex/Polyp/; + # $string is now "Polypacido P. Octopus" Because C<.> is special in regular expressions, and can match any single character, the regex C<P.> here has matched the <Pl> in the @@ -327,11 +328,11 @@ original string. To escape the special meaning of C<.>, we use C<\Q>: - $string = "Placido P. Octopus"; - $regex = "P."; + $string = "Placido P. Octopus"; + $regex = "P."; - $string =~ s/\Q$regex/Polyp/; - # $string is now "Placido Polyp Octopus" + $string =~ s/\Q$regex/Polyp/; + # $string is now "Placido Polyp Octopus" The use of C<\Q> causes the <.> in the regex to be treated as a regular character, so that C<P.> matches a C<P> followed by a dot. @@ -358,28 +359,28 @@ you don't want the regex to notice if they do. For example, here's a "paragrep" program: - $/ = ''; # paragraph mode - $pat = shift; - while (<>) { - print if /$pat/o; - } + $/ = ''; # paragraph mode + $pat = shift; + while (<>) { + print if /$pat/o; + } =head2 How do I use a regular expression to strip C style comments from a file? While this actually can be done, it's much harder than you'd think. For example, this one-liner - perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c + perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c will work in many but not all cases. You see, it's too simple-minded for certain kinds of C programs, in particular, those with what appear to be comments in quoted strings. For that, you'd need something like this, created by Jeffrey Friedl and later modified by Fred Curtis. - $/ = undef; - $_ = <>; - s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse; - print; + $/ = undef; + $_ = <>; + s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse; + print; This could, of course, be more legibly written with the C</x> modifier, adding whitespace and comments. Here it is expanded, courtesy of Fred Curtis. @@ -423,7 +424,7 @@ whitespace and comments. Here it is expanded, courtesy of Fred Curtis. A slight modification also removes C++ comments: - s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse; + s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse; =head2 Can I use Perl regular expressions to match balanced text? X<regex, matching balanced test> X<regexp, matching balanced test> @@ -466,9 +467,9 @@ versions of the same quantifiers, use (C<??>, C<*?>, C<+?>, C<{}?>). An example: - $s1 = $s2 = "I am very very cold"; - $s1 =~ s/ve.*y //; # I am cold - $s2 =~ s/ve.*?y //; # I am very cold + $s1 = $s2 = "I am very very cold"; + $s1 =~ s/ve.*y //; # I am cold + $s2 =~ s/ve.*?y //; # I am very cold Notice how the second substitution stopped matching as soon as it encountered "y ". The C<*?> quantifier effectively tells the regular @@ -481,11 +482,11 @@ X<word> Use the split function: - while (<>) { - foreach $word ( split ) { - # do something with $word here + while (<>) { + foreach $word ( split ) { + # do something with $word here + } } - } Note that this isn't really a word in the English sense; it's just chunks of consecutive non-whitespace characters. @@ -493,11 +494,11 @@ chunks of consecutive non-whitespace characters. To work with only alphanumeric sequences (including underscores), you might consider - while (<>) { - foreach $word (m/(\w+)/g) { - # do something with $word here + while (<>) { + foreach $word (m/(\w+)/g) { + # do something with $word here + } } - } =head2 How can I print out a word-frequency or line-frequency summary? @@ -506,24 +507,26 @@ pretend that by word you mean chunk of alphabetics, hyphens, or apostrophes, rather than the non-whitespace chunk idea of a word given in the previous question: - while (<>) { - while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'" - $seen{$1}++; + while (<>) { + while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'" + $seen{$1}++; + } } - } - while ( ($word, $count) = each %seen ) { - print "$count $word\n"; - } + + while ( ($word, $count) = each %seen ) { + print "$count $word\n"; + } If you wanted to do the same thing for lines, you wouldn't need a regular expression: - while (<>) { - $seen{$_}++; - } - while ( ($line, $count) = each %seen ) { - print "$count $line"; - } + while (<>) { + $seen{$_}++; + } + + while ( ($line, $count) = each %seen ) { + print "$count $line"; + } If you want these output in a sorted order, see L<perlfaq4>: "How do I sort a hash (optionally by value instead of key)?". @@ -544,15 +547,18 @@ you want to match it. In this example, perl must recompile the regular expression for every iteration of the foreach() loop since it has no way to know what $pattern will be. - @patterns = qw( foo bar baz ); + @patterns = qw( foo bar baz ); - LINE: while( <> ) - { + LINE: while( <DATA> ) + { foreach $pattern ( @patterns ) { - print if /\b$pattern\b/i; - next LINE; - } + if( /\b$pattern\b/i ) + { + print; + next LINE; + } + } } The qr// operator showed up in perl 5.005. It compiles a @@ -562,15 +568,15 @@ this example, I inserted a map() to turn each pattern into its pre-compiled form. The rest of the script is the same, but faster. - @patterns = map { qr/\b$_\b/i } qw( foo bar baz ); + @patterns = map { qr/\b$_\b/i } qw( foo bar baz ); - LINE: while( <> ) - { + LINE: while( <> ) + { foreach $pattern ( @patterns ) { - print if /\b$pattern\b/i; - next LINE; - } + print if /\b$pattern\b/i; + next LINE; + } } In some cases, you may be able to make several patterns into @@ -579,8 +585,8 @@ backtracking though. $regex = join '|', qw( foo bar baz ); - LINE: while( <> ) - { + LINE: while( <> ) + { print if /\b(?:$regex)\b/i; } @@ -744,15 +750,15 @@ when you want to try a different match if one fails, such as in a tokenizer. Jeffrey Friedl offers this example which works in 5.004 or later. - while (<>) { - chomp; - PARSER: { - m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; }; - m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; }; - m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; }; - m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; }; - } - } + while (<>) { + chomp; + PARSER: { + m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; }; + m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; }; + m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; }; + m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; }; + } + } For each line, the PARSER loop first tries to match a series of digits followed by a word boundary. This match has to @@ -792,7 +798,7 @@ context, no lists are constructed. =head2 How can I match strings with multibyte characters? X<regex, and multibyte characters> X<regexp, and multibyte characters> -X<regular expression, and multibyte characters> +X<regular expression, and multibyte characters> X<martian> X<encoding, Martian> Starting from Perl 5.6 Perl has had some level of multibyte character support. Perl 5.8 or later is recommended. Supported multibyte @@ -826,32 +832,33 @@ looks like it is because "SG" is next to "XX", but there's no real Here are a few ways, all painful, to deal with it: - $martian =~ s/([A-Z][A-Z])/ $1 /g; # Make sure adjacent "martian" - # bytes are no longer adjacent. - print "found GX!\n" if $martian =~ /GX/; + # Make sure adjacent "martian" bytes are no longer adjacent. + $martian =~ s/([A-Z][A-Z])/ $1 /g; + + print "found GX!\n" if $martian =~ /GX/; Or like this: - @chars = $martian =~ m/([A-Z][A-Z]|[^A-Z])/g; - # above is conceptually similar to: @chars = $text =~ m/(.)/g; - # - foreach $char (@chars) { - print "found GX!\n", last if $char eq 'GX'; - } + @chars = $martian =~ m/([A-Z][A-Z]|[^A-Z])/g; + # above is conceptually similar to: @chars = $text =~ m/(.)/g; + # + foreach $char (@chars) { + print "found GX!\n", last if $char eq 'GX'; + } Or like this: - while ($martian =~ m/\G([A-Z][A-Z]|.)/gs) { # \G probably unneeded - print "found GX!\n", last if $1 eq 'GX'; - } + while ($martian =~ m/\G([A-Z][A-Z]|.)/gs) { # \G probably unneeded + print "found GX!\n", last if $1 eq 'GX'; + } Here's another, slightly less painful, way to do it from Benjamin Goldberg, who uses a zero-width negative look-behind assertion. print "found GX!\n" if $martian =~ m/ - (?<![A-Z]) - (?:[A-Z][A-Z])*? - GX + (?<![A-Z]) + (?:[A-Z][A-Z])*? + GX /x; This succeeds if the "martian" character GX is in the string, and fails @@ -861,37 +868,93 @@ look-behind assertion, you can replace (?<![A-Z]) with (?:^|[^A-Z]). It does have the drawback of putting the wrong thing in $-[0] and $+[0], but this usually can be worked around. -=head2 How do I match a pattern that is supplied by the user? +=head2 How do I match a regular expression that's in a variable? +X<regex, in variable> X<eval> X<regex> X<quotemeta> X<\Q, regex> +X<\E, regex>, X<qr//> -Well, if it's really a pattern, then just use +(contributed by brian d foy) - chomp($pattern = <STDIN>); - if ($line =~ /$pattern/) { } +We don't have to hard-code patterns into the match operator (or +anything else that works with regular expressions). We can put the +pattern in a variable for later use. -Alternatively, since you have no guarantee that your user entered -a valid regular expression, trap the exception this way: +The match operator is a double quote context, so you can interpolate +your variable just like a double quoted string. In this case, you +read the regular expression as user input and store it in C<$regex>. +Once you have the pattern in C<$regex>, you use that variable in the +match operator. - if (eval { $line =~ /$pattern/ }) { } + chomp( my $regex = <STDIN> ); -If all you really want is to search for a string, not a pattern, -then you should either use the index() function, which is made for -string searching, or, if you can't be disabused of using a pattern -match on a non-pattern, then be sure to use C<\Q>...C<\E>, documented -in L<perlre>. + if( $string =~ m/$regex/ ) { ... } - $pattern = <STDIN>; +Any regular expression special characters in C<$regex> are still +special, and the pattern still has to be valid or Perl will complain. +For instance, in this pattern there is an unpaired parenthesis. - open (FILE, $input) or die "Couldn't open input $input: $!; aborting"; - while (<FILE>) { - print if /\Q$pattern\E/; - } - close FILE; + my $regex = "Unmatched ( paren"; + + "Two parens to bind them all" =~ m/$regex/; + +When Perl compiles the regular expression, it treats the parenthesis +as the start of a memory match. When it doesn't find the closing +parenthesis, it complains: + + Unmatched ( in regex; marked by <-- HERE in m/Unmatched ( <-- HERE paren/ at script line 3. + +You can get around this in several ways depending on our situation. +First, if you don't want any of the characters in the string to be +special, you can escape them with C<quotemeta> before you use the string. + + chomp( my $regex = <STDIN> ); + $regex = quotemeta( $regex ); + + if( $string =~ m/$regex/ ) { ... } + +You can also do this directly in the match operator using the C<\Q> +and C<\E> sequences. The C<\Q> tells Perl where to start escaping +special characters, and the C<\E> tells it where to stop (see L<perlop> +for more details). + + chomp( my $regex = <STDIN> ); + + if( $string =~ m/\Q$regex\E/ ) { ... } + +Alternately, you can use C<qr//>, the regular expression quote operator (see +L<perlop> for more details). It quotes and perhaps compiles the pattern, +and you can apply regular expression flags to the pattern. + + chomp( my $input = <STDIN> ); + + my $regex = qr/$input/is; + + $string =~ m/$regex/ # same as m/$input/is; + +You might also want to trap any errors by wrapping an C<eval> block +around the whole thing. + + chomp( my $input = <STDIN> ); + + eval { + if( $string =~ m/\Q$input\E/ ) { ... } + }; + warn $@ if $@; + +Or... + + my $regex = eval { qr/$input/is }; + if( defined $regex ) { + $string =~ m/$regex/; + } + else { + warn $@; + } =head1 REVISION -Revision: $Revision: 3606 $ +Revision: $Revision: 6479 $ -Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $ +Date: $Date: 2006-06-07 09:48:12 +0200 (mer, 07 jun 2006) $ See L<perlfaq> for source control details and availability. diff --git a/pod/perlfaq7.pod b/pod/perlfaq7.pod index 9d9ef8d554..d8fc183e2f 100644 --- a/pod/perlfaq7.pod +++ b/pod/perlfaq7.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq7 - General Perl Language Issues ($Revision: 3606 $) +perlfaq7 - General Perl Language Issues ($Revision: 6833 $) =head1 DESCRIPTION @@ -282,7 +282,7 @@ variable's value this way, but now it's much harder. Take this code: my $f = 'foo'; sub T { - while ($i++ < 3) { my $f = $f; $f .= $i; print $f, "\n" } + while ($i++ < 3) { my $f = $f; $f .= "bar"; print $f, "\n" } } T; print "Finally $f\n"; @@ -312,8 +312,8 @@ With the exception of regexes, you need to pass references to these objects. See L<perlsub/"Pass by Reference"> for this particular question, and L<perlref> for information on references. -See "Passing Regexes", below, for information on passing regular -expressions. +See "Passing Regexes", later in L<perlfaq7>, for information on +passing regular expressions. =over 4 @@ -634,9 +634,9 @@ where they don't belong. This is explained in more depth in the L<perlsyn>. Briefly, there's no official case statement, because of the variety of tests possible in Perl (numeric comparison, string comparison, glob comparison, -regex matching, overloaded comparisons, ...). -Larry couldn't decide how best to do this, so he left it out, even -though it's been on the wish list since perl1. +regex matching, overloaded comparisons, ...). Larry couldn't decide +how best to do this, so he left it out, even though it's been on the +wish list since perl1. Starting from Perl 5.8 to get switch and case one can use the Switch extension and say: @@ -942,7 +942,7 @@ If you see "bad interpreter - no such file or directory", the first line in your perl script (the "shebang" line) does not contain the right path to perl (or any other program capable of running scripts). Sometimes this happens when you move the script from one machine to -another and each machine has a different path to perl---/usr/bin/perl +another and each machine has a different path to perl--/usr/bin/perl versus /usr/local/bin/perl for instance. It may also indicate that the source machine has CRLF line terminators and the destination machine has LF only: the shell tries to find @@ -962,9 +962,9 @@ where you expect it so you need to adjust your shebang line. =head1 REVISION -Revision: $Revision: 3606 $ +Revision: $Revision: 6833 $ -Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $ +Date: $Date: 2006-09-02 21:16:20 +0200 (sam, 02 sep 2006) $ See L<perlfaq> for source control details and availability. diff --git a/pod/perlfaq8.pod b/pod/perlfaq8.pod index d5c63daa29..89f5b51c41 100644 --- a/pod/perlfaq8.pod +++ b/pod/perlfaq8.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq8 - System Interaction ($Revision: 3606 $) +perlfaq8 - System Interaction ($Revision: 6628 $) =head1 DESCRIPTION @@ -63,15 +63,15 @@ the recipient has a color-aware display device. If you know that they have an ANSI terminal that understands color, you can use the Term::ANSIColor module from CPAN: - use Term::ANSIColor; - print color("red"), "Stop!\n", color("reset"); - print color("green"), "Go!\n", color("reset"); + use Term::ANSIColor; + print color("red"), "Stop!\n", color("reset"); + print color("green"), "Go!\n", color("reset"); Or like this: - use Term::ANSIColor qw(:constants); - print RED, "Stop!\n", RESET; - print GREEN, "Go!\n", RESET; + use Term::ANSIColor qw(:constants); + print RED, "Stop!\n", RESET; + print GREEN, "Go!\n", RESET; =head2 How do I read just one key without waiting for a return key? @@ -80,74 +80,74 @@ On many systems, you can just use the B<stty> command as shown in L<perlfunc/getc>, but as you see, that's already getting you into portability snags. - open(TTY, "+</dev/tty") or die "no tty: $!"; - system "stty cbreak </dev/tty >/dev/tty 2>&1"; - $key = getc(TTY); # perhaps this works - # OR ELSE - sysread(TTY, $key, 1); # probably this does - system "stty -cbreak </dev/tty >/dev/tty 2>&1"; + open(TTY, "+</dev/tty") or die "no tty: $!"; + system "stty cbreak </dev/tty >/dev/tty 2>&1"; + $key = getc(TTY); # perhaps this works + # OR ELSE + sysread(TTY, $key, 1); # probably this does + system "stty -cbreak </dev/tty >/dev/tty 2>&1"; The Term::ReadKey module from CPAN offers an easy-to-use interface that should be more efficient than shelling out to B<stty> for each key. It even includes limited support for Windows. - use Term::ReadKey; - ReadMode('cbreak'); - $key = ReadKey(0); - ReadMode('normal'); + use Term::ReadKey; + ReadMode('cbreak'); + $key = ReadKey(0); + ReadMode('normal'); However, using the code requires that you have a working C compiler and can use it to build and install a CPAN module. Here's a solution using the standard POSIX module, which is already on your systems (assuming your system supports POSIX). - use HotKey; - $key = readkey(); + use HotKey; + $key = readkey(); And here's the HotKey module, which hides the somewhat mystifying calls to manipulate the POSIX termios structures. - # HotKey.pm - package HotKey; + # HotKey.pm + package HotKey; - @ISA = qw(Exporter); - @EXPORT = qw(cbreak cooked readkey); + @ISA = qw(Exporter); + @EXPORT = qw(cbreak cooked readkey); - use strict; - use POSIX qw(:termios_h); - my ($term, $oterm, $echo, $noecho, $fd_stdin); + use strict; + use POSIX qw(:termios_h); + my ($term, $oterm, $echo, $noecho, $fd_stdin); - $fd_stdin = fileno(STDIN); - $term = POSIX::Termios->new(); - $term->getattr($fd_stdin); - $oterm = $term->getlflag(); + $fd_stdin = fileno(STDIN); + $term = POSIX::Termios->new(); + $term->getattr($fd_stdin); + $oterm = $term->getlflag(); - $echo = ECHO | ECHOK | ICANON; - $noecho = $oterm & ~$echo; + $echo = ECHO | ECHOK | ICANON; + $noecho = $oterm & ~$echo; - sub cbreak { - $term->setlflag($noecho); # ok, so i don't want echo either - $term->setcc(VTIME, 1); - $term->setattr($fd_stdin, TCSANOW); - } + sub cbreak { + $term->setlflag($noecho); # ok, so i don't want echo either + $term->setcc(VTIME, 1); + $term->setattr($fd_stdin, TCSANOW); + } - sub cooked { - $term->setlflag($oterm); - $term->setcc(VTIME, 0); - $term->setattr($fd_stdin, TCSANOW); - } + sub cooked { + $term->setlflag($oterm); + $term->setcc(VTIME, 0); + $term->setattr($fd_stdin, TCSANOW); + } - sub readkey { - my $key = ''; - cbreak(); - sysread(STDIN, $key, 1); - cooked(); - return $key; - } + sub readkey { + my $key = ''; + cbreak(); + sysread(STDIN, $key, 1); + cooked(); + return $key; + } - END { cooked() } + END { cooked() } - 1; + 1; =head2 How do I check whether input is ready on the keyboard? @@ -155,37 +155,37 @@ The easiest way to do this is to read a key in nonblocking mode with the Term::ReadKey module from CPAN, passing it an argument of -1 to indicate not to block: - use Term::ReadKey; + use Term::ReadKey; - ReadMode('cbreak'); + ReadMode('cbreak'); - if (defined ($char = ReadKey(-1)) ) { - # input was waiting and it was $char - } else { - # no input was waiting - } + if (defined ($char = ReadKey(-1)) ) { + # input was waiting and it was $char + } else { + # no input was waiting + } - ReadMode('normal'); # restore normal tty settings + ReadMode('normal'); # restore normal tty settings =head2 How do I clear the screen? If you only have do so infrequently, use C<system>: - system("clear"); + system("clear"); If you have to do this a lot, save the clear string so you can print it 100 times without calling a program 100 times: - $clear_string = `clear`; - print $clear_string; + $clear_string = `clear`; + print $clear_string; If you're planning on doing other screen manipulations, like cursor positions, etc, you might wish to use Term::Cap module: - use Term::Cap; - $terminal = Term::Cap->Tgetent( {OSPEED => 9600} ); - $clear_string = $terminal->Tputs('cl'); + use Term::Cap; + $terminal = Term::Cap->Tgetent( {OSPEED => 9600} ); + $clear_string = $terminal->Tputs('cl'); =head2 How do I get the screen size? @@ -193,22 +193,22 @@ If you have Term::ReadKey module installed from CPAN, you can use it to fetch the width and height in characters and in pixels: - use Term::ReadKey; - ($wchar, $hchar, $wpixels, $hpixels) = GetTerminalSize(); + use Term::ReadKey; + ($wchar, $hchar, $wpixels, $hpixels) = GetTerminalSize(); This is more portable than the raw C<ioctl>, but not as illustrative: - require 'sys/ioctl.ph'; - die "no TIOCGWINSZ " unless defined &TIOCGWINSZ; - open(TTY, "+</dev/tty") or die "No tty: $!"; - unless (ioctl(TTY, &TIOCGWINSZ, $winsize='')) { - die sprintf "$0: ioctl TIOCGWINSZ (%08x: $!)\n", &TIOCGWINSZ; - } - ($row, $col, $xpixel, $ypixel) = unpack('S4', $winsize); - print "(row,col) = ($row,$col)"; - print " (xpixel,ypixel) = ($xpixel,$ypixel)" if $xpixel || $ypixel; - print "\n"; + require 'sys/ioctl.ph'; + die "no TIOCGWINSZ " unless defined &TIOCGWINSZ; + open(TTY, "+</dev/tty") or die "No tty: $!"; + unless (ioctl(TTY, &TIOCGWINSZ, $winsize='')) { + die sprintf "$0: ioctl TIOCGWINSZ (%08x: $!)\n", &TIOCGWINSZ; + } + ($row, $col, $xpixel, $ypixel) = unpack('S4', $winsize); + print "(row,col) = ($row,$col)"; + print " (xpixel,ypixel) = ($xpixel,$ypixel)" if $xpixel || $ypixel; + print "\n"; =head2 How do I ask the user for a password? @@ -224,10 +224,10 @@ to the B<stty> program, with varying degrees of portability. You can also do this for most systems using the Term::ReadKey module from CPAN, which is easier to use and in theory more portable. - use Term::ReadKey; + use Term::ReadKey; - ReadMode('noecho'); - $password = ReadLine(0); + ReadMode('noecho'); + $password = ReadLine(0); =head2 How do I read and write the serial port? @@ -262,8 +262,8 @@ their usual (Unix) ASCII values of "\012" and "\015". You may have to give the numeric values you want directly, using octal ("\015"), hex ("0x0D"), or as a control-character specification ("\cM"). - print DEV "atv1\012"; # wrong, for some devices - print DEV "atv1\015"; # right, for some devices + print DEV "atv1\012"; # wrong, for some devices + print DEV "atv1\015"; # right, for some devices Even though with normal text files a "\n" will do the trick, there is still no unified scheme for terminating a line that is portable @@ -280,19 +280,19 @@ and the C<$|> variable to control autoflushing (see L<perlvar/$E<verbar>> and L<perlfunc/select>, or L<perlfaq5>, "How do I flush/unbuffer an output filehandle? Why must I do this?"): - $oldh = select(DEV); - $| = 1; - select($oldh); + $oldh = select(DEV); + $| = 1; + select($oldh); You'll also see code that does this without a temporary variable, as in - select((select(DEV), $| = 1)[0]); + select((select(DEV), $| = 1)[0]); Or if you don't mind pulling in a few thousand lines of code just because you're afraid of a little $| variable: - use IO::Handle; - DEV->autoflush(1); + use IO::Handle; + DEV->autoflush(1); As mentioned in the previous item, this still doesn't work when using socket I/O between Unix and Macintosh. You'll need to hard code your @@ -310,23 +310,23 @@ L<perlfunc/"select">. =back While trying to read from his caller-id box, the notorious Jamie Zawinski -<jwz@netscape.com>, after much gnashing of teeth and fighting with sysread, +C<< <jwz@netscape.com> >>, after much gnashing of teeth and fighting with sysread, sysopen, POSIX's tcgetattr business, and various other functions that go bump in the night, finally came up with this: - sub open_modem { - use IPC::Open2; - my $stty = `/bin/stty -g`; - open2( \*MODEM_IN, \*MODEM_OUT, "cu -l$modem_device -s2400 2>&1"); - # starting cu hoses /dev/tty's stty settings, even when it has - # been opened on a pipe... - system("/bin/stty $stty"); - $_ = <MODEM_IN>; - chomp; - if ( !m/^Connected/ ) { - print STDERR "$0: cu printed `$_' instead of `Connected'\n"; + sub open_modem { + use IPC::Open2; + my $stty = `/bin/stty -g`; + open2( \*MODEM_IN, \*MODEM_OUT, "cu -l$modem_device -s2400 2>&1"); + # starting cu hoses /dev/tty's stty settings, even when it has + # been opened on a pipe... + system("/bin/stty $stty"); + $_ = <MODEM_IN>; + chomp; + if ( !m/^Connected/ ) { + print STDERR "$0: cu printed `$_' instead of `Connected'\n"; + } } - } =head2 How do I decode encrypted password files? @@ -354,7 +354,7 @@ details. You could also use - system("cmd &") + system("cmd &") or you could use fork as documented in L<perlfunc/"fork">, with further examples in L<perlipc>. Some things to be aware of, if you're @@ -383,23 +383,22 @@ not an issue with C<system("cmd&")>. You have to be prepared to "reap" the child process when it finishes. - $SIG{CHLD} = sub { wait }; + $SIG{CHLD} = sub { wait }; - $SIG{CHLD} = 'IGNORE'; + $SIG{CHLD} = 'IGNORE'; You can also use a double fork. You immediately wait() for your first child, and the init daemon will wait() for your grandchild once it exits. unless ($pid = fork) { - unless (fork) { - exec "what you really wanna do"; - die "exec failed!"; - } - exit 0; - } - waitpid($pid,0); - + unless (fork) { + exec "what you really wanna do"; + die "exec failed!"; + } + exit 0; + } + waitpid($pid, 0); See L<perlipc/"Signals"> for other examples of code to do this. Zombies are not an issue with C<system("prog &")>. @@ -438,7 +437,6 @@ causing perl to dump core. Since version 5.8.0, perl looks at %SIG *after* the signal has been caught, rather than while it is being caught. Previous versions of this answer were incorrect. - =head2 How do I modify the shadow password file on a Unix system? If perl was installed correctly and your shadow library was written @@ -459,9 +457,9 @@ the VMS equivalent is C<set time>. However, if all you want to do is change your time zone, you can probably get away with setting an environment variable: - $ENV{TZ} = "MST7MDT"; # unixish - $ENV{'SYS$TIMEZONE_DIFFERENTIAL'}="-5" # vms - system "trn comp.lang.perl.misc"; + $ENV{TZ} = "MST7MDT"; # unixish + $ENV{'SYS$TIMEZONE_DIFFERENTIAL'}="-5" # vms + system "trn comp.lang.perl.misc"; =head2 How can I sleep() or alarm() for under a second? @@ -481,31 +479,31 @@ If your system supports both the syscall() function in Perl as well as a system call like gettimeofday(2), then you may be able to do something like this: - require 'sys/syscall.ph'; + require 'sys/syscall.ph'; - $TIMEVAL_T = "LL"; + $TIMEVAL_T = "LL"; - $done = $start = pack($TIMEVAL_T, ()); + $done = $start = pack($TIMEVAL_T, ()); - syscall(&SYS_gettimeofday, $start, 0) != -1 - or die "gettimeofday: $!"; + syscall(&SYS_gettimeofday, $start, 0) != -1 + or die "gettimeofday: $!"; - ########################## - # DO YOUR OPERATION HERE # - ########################## + ########################## + # DO YOUR OPERATION HERE # + ########################## - syscall( &SYS_gettimeofday, $done, 0) != -1 - or die "gettimeofday: $!"; + syscall( &SYS_gettimeofday, $done, 0) != -1 + or die "gettimeofday: $!"; - @start = unpack($TIMEVAL_T, $start); - @done = unpack($TIMEVAL_T, $done); + @start = unpack($TIMEVAL_T, $start); + @done = unpack($TIMEVAL_T, $done); - # fix microseconds - for ($done[1], $start[1]) { $_ /= 1_000_000 } + # fix microseconds + for ($done[1], $start[1]) { $_ /= 1_000_000 } - $delta_time = sprintf "%.4f", ($done[0] + $done[1] ) - - - ($start[0] + $start[1] ); + $delta_time = sprintf "%.4f", ($done[0] + $done[1] ) + - + ($start[0] + $start[1] ); =head2 How can I do an atexit() or setjmp()/longjmp()? (Exception handling) @@ -516,9 +514,9 @@ thread ends (see L<perlmod> manpage for more details). For example, you can use this to make sure your filter program managed to finish its output without filling up the disk: - END { - close(STDOUT) || die "stdout close failed: $!"; - } + END { + close(STDOUT) || die "stdout close failed: $!"; + } The END block isn't called when untrapped signals kill the program, though, so if you use END blocks you should also use @@ -556,7 +554,7 @@ syscall(), you can use the syscall function (documented in L<perlfunc>). Remember to check the modules that came with your distribution, and -CPAN as well---someone may already have written a module to do it. On +CPAN as well--someone may already have written a module to do it. On Windows, try Win32::API. On Macs, try Mac::Carbon. If no module has an interface to the C function, you can inline a bit of C in your Perl source with Inline::C. @@ -572,9 +570,9 @@ Simple files like F<errno.h>, F<syscall.h>, and F<socket.h> were fine, but the hard ones like F<ioctl.h> nearly always need to hand-edited. Here's how to install the *.ph files: - 1. become super-user - 2. cd /usr/include - 3. h2ph *.h */*.h + 1. become super-user + 2. cd /usr/include + 3. h2ph *.h */*.h If your system supports dynamic loading, for reasons of portability and sanity you probably ought to use h2xs (also part of the standard perl @@ -613,16 +611,16 @@ the low 7 bits are the signal the process died from, if any, and the high 8 bits are the actual exit value). Backticks (``) run a command and return what it sent to STDOUT. - $exit_status = system("mail-users"); - $output_string = `ls`; + $exit_status = system("mail-users"); + $output_string = `ls`; =head2 How can I capture STDERR from an external command? There are three basic ways of running external commands: - system $cmd; # using system() - $output = `$cmd`; # using backticks (``) - open (PIPE, "cmd |"); # using open() + system $cmd; # using system() + $output = `$cmd`; # using backticks (``) + open (PIPE, "cmd |"); # using open() With system(), both STDOUT and STDERR will go the same place as the script's STDOUT and STDERR, unless the system() command redirects them. @@ -633,85 +631,85 @@ Goldberg provides some sample code: To capture a program's STDOUT, but discard its STDERR: - use IPC::Open3; - use File::Spec; - use Symbol qw(gensym); - open(NULL, ">", File::Spec->devnull); - my $pid = open3(gensym, \*PH, ">&NULL", "cmd"); - while( <PH> ) { } - waitpid($pid, 0); + use IPC::Open3; + use File::Spec; + use Symbol qw(gensym); + open(NULL, ">", File::Spec->devnull); + my $pid = open3(gensym, \*PH, ">&NULL", "cmd"); + while( <PH> ) { } + waitpid($pid, 0); To capture a program's STDERR, but discard its STDOUT: - use IPC::Open3; - use File::Spec; - use Symbol qw(gensym); - open(NULL, ">", File::Spec->devnull); - my $pid = open3(gensym, ">&NULL", \*PH, "cmd"); - while( <PH> ) { } - waitpid($pid, 0); + use IPC::Open3; + use File::Spec; + use Symbol qw(gensym); + open(NULL, ">", File::Spec->devnull); + my $pid = open3(gensym, ">&NULL", \*PH, "cmd"); + while( <PH> ) { } + waitpid($pid, 0); To capture a program's STDERR, and let its STDOUT go to our own STDERR: - use IPC::Open3; - use Symbol qw(gensym); - my $pid = open3(gensym, ">&STDERR", \*PH, "cmd"); - while( <PH> ) { } - waitpid($pid, 0); + use IPC::Open3; + use Symbol qw(gensym); + my $pid = open3(gensym, ">&STDERR", \*PH, "cmd"); + while( <PH> ) { } + waitpid($pid, 0); To read both a command's STDOUT and its STDERR separately, you can redirect them to temp files, let the command run, then read the temp files: - use IPC::Open3; - use Symbol qw(gensym); - use IO::File; - local *CATCHOUT = IO::File->new_tmpfile; - local *CATCHERR = IO::File->new_tmpfile; - my $pid = open3(gensym, ">&CATCHOUT", ">&CATCHERR", "cmd"); - waitpid($pid, 0); - seek $_, 0, 0 for \*CATCHOUT, \*CATCHERR; - while( <CATCHOUT> ) {} - while( <CATCHERR> ) {} + use IPC::Open3; + use Symbol qw(gensym); + use IO::File; + local *CATCHOUT = IO::File->new_tmpfile; + local *CATCHERR = IO::File->new_tmpfile; + my $pid = open3(gensym, ">&CATCHOUT", ">&CATCHERR", "cmd"); + waitpid($pid, 0); + seek $_, 0, 0 for \*CATCHOUT, \*CATCHERR; + while( <CATCHOUT> ) {} + while( <CATCHERR> ) {} But there's no real need for *both* to be tempfiles... the following should work just as well, without deadlocking: - use IPC::Open3; - use Symbol qw(gensym); - use IO::File; - local *CATCHERR = IO::File->new_tmpfile; - my $pid = open3(gensym, \*CATCHOUT, ">&CATCHERR", "cmd"); - while( <CATCHOUT> ) {} - waitpid($pid, 0); - seek CATCHERR, 0, 0; - while( <CATCHERR> ) {} + use IPC::Open3; + use Symbol qw(gensym); + use IO::File; + local *CATCHERR = IO::File->new_tmpfile; + my $pid = open3(gensym, \*CATCHOUT, ">&CATCHERR", "cmd"); + while( <CATCHOUT> ) {} + waitpid($pid, 0); + seek CATCHERR, 0, 0; + while( <CATCHERR> ) {} And it'll be faster, too, since we can begin processing the program's stdout immediately, rather than waiting for the program to finish. With any of these, you can change file descriptors before the call: - open(STDOUT, ">logfile"); - system("ls"); + open(STDOUT, ">logfile"); + system("ls"); or you can use Bourne shell file-descriptor redirection: - $output = `$cmd 2>some_file`; - open (PIPE, "cmd 2>some_file |"); + $output = `$cmd 2>some_file`; + open (PIPE, "cmd 2>some_file |"); You can also use file-descriptor redirection to make STDERR a duplicate of STDOUT: - $output = `$cmd 2>&1`; - open (PIPE, "cmd 2>&1 |"); + $output = `$cmd 2>&1`; + open (PIPE, "cmd 2>&1 |"); Note that you I<cannot> simply open STDERR to be a dup of STDOUT in your Perl program and avoid calling the shell to do the redirection. This doesn't work: - open(STDERR, ">&STDOUT"); - $alloutput = `cmd args`; # stderr still escapes + open(STDERR, ">&STDOUT"); + $alloutput = `cmd args`; # stderr still escapes This fails because the open() makes STDERR go to where STDOUT was going at the time of the open(). The backticks then make STDOUT go to @@ -725,40 +723,40 @@ F<versus/csh.whynot> article in the "Far More Than You Ever Wanted To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz . To capture a command's STDERR and STDOUT together: - $output = `cmd 2>&1`; # either with backticks - $pid = open(PH, "cmd 2>&1 |"); # or with an open pipe - while (<PH>) { } # plus a read + $output = `cmd 2>&1`; # either with backticks + $pid = open(PH, "cmd 2>&1 |"); # or with an open pipe + while (<PH>) { } # plus a read To capture a command's STDOUT but discard its STDERR: - $output = `cmd 2>/dev/null`; # either with backticks - $pid = open(PH, "cmd 2>/dev/null |"); # or with an open pipe - while (<PH>) { } # plus a read + $output = `cmd 2>/dev/null`; # either with backticks + $pid = open(PH, "cmd 2>/dev/null |"); # or with an open pipe + while (<PH>) { } # plus a read To capture a command's STDERR but discard its STDOUT: - $output = `cmd 2>&1 1>/dev/null`; # either with backticks - $pid = open(PH, "cmd 2>&1 1>/dev/null |"); # or with an open pipe - while (<PH>) { } # plus a read + $output = `cmd 2>&1 1>/dev/null`; # either with backticks + $pid = open(PH, "cmd 2>&1 1>/dev/null |"); # or with an open pipe + while (<PH>) { } # plus a read To exchange a command's STDOUT and STDERR in order to capture the STDERR but leave its STDOUT to come out our old STDERR: - $output = `cmd 3>&1 1>&2 2>&3 3>&-`; # either with backticks - $pid = open(PH, "cmd 3>&1 1>&2 2>&3 3>&-|");# or with an open pipe - while (<PH>) { } # plus a read + $output = `cmd 3>&1 1>&2 2>&3 3>&-`; # either with backticks + $pid = open(PH, "cmd 3>&1 1>&2 2>&3 3>&-|");# or with an open pipe + while (<PH>) { } # plus a read To read both a command's STDOUT and its STDERR separately, it's easiest to redirect them separately to files, and then read from those files when the program is done: - system("program args 1>program.stdout 2>program.stderr"); + system("program args 1>program.stdout 2>program.stderr"); Ordering is important in all these examples. That's because the shell processes file descriptor redirections in strictly left to right order. - system("prog args 1>tmpfile 2>&1"); - system("prog args 2>&1 1>tmpfile"); + system("prog args 1>tmpfile 2>&1"); + system("prog args 2>&1 1>tmpfile"); The first command sends both standard out and standard error to the temporary file. The second command sends only the old standard output @@ -794,22 +792,22 @@ Why send a clear message that isn't true? Consider this line: - `cat /etc/termcap`; + `cat /etc/termcap`; You forgot to check C<$?> to see whether the program even ran correctly. Even if you wrote - print `cat /etc/termcap`; + print `cat /etc/termcap`; this code could and probably should be written as - system("cat /etc/termcap") == 0 + system("cat /etc/termcap") == 0 or die "cat program failed!"; which will get the output quickly (as it is generated, instead of only at the end) and also check the return value. -system() also provides direct control over whether shell wildcard +C<system> also provides direct control over whether shell wildcard processing may take place, whereas backticks do not. =head2 How can I call backticks without shell processing? @@ -817,28 +815,28 @@ processing may take place, whereas backticks do not. This is a bit tricky. You can't simply write the command like this: - @ok = `grep @opts '$search_string' @filenames`; + @ok = `grep @opts '$search_string' @filenames`; As of Perl 5.8.0, you can use open() with multiple arguments. Just like the list forms of system() and exec(), no shell escapes happen. - open( GREP, "-|", 'grep', @opts, $search_string, @filenames ); - chomp(@ok = <GREP>); - close GREP; + open( GREP, "-|", 'grep', @opts, $search_string, @filenames ); + chomp(@ok = <GREP>); + close GREP; You can also: - my @ok = (); - if (open(GREP, "-|")) { - while (<GREP>) { - chomp; - push(@ok, $_); - } - close GREP; - } else { - exec 'grep', @opts, $search_string, @filenames; - } + my @ok = (); + if (open(GREP, "-|")) { + while (<GREP>) { + chomp; + push(@ok, $_); + } + close GREP; + } else { + exec 'grep', @opts, $search_string, @filenames; + } Just as with system(), no shell escapes happen when you exec() a list. Further examples of this can be found in L<perlipc/"Safe Pipe Opens">. @@ -860,8 +858,8 @@ workarounds: Try keeping around the seekpointer and go there, like this: - $where = tell(LOG); - seek(LOG, $where, 0); + $where = tell(LOG); + seek(LOG, $where, 0); =item 2 @@ -900,18 +898,18 @@ If all you want to do is pretend to be telnet but don't need the initial telnet handshaking, then the standard dual-process approach will suffice: - use IO::Socket; # new in 5.004 - $handle = IO::Socket::INET->new('www.perl.com:80') - || die "can't connect to port 80 on www.perl.com: $!"; - $handle->autoflush(1); - if (fork()) { # XXX: undef means failure - select($handle); - print while <STDIN>; # everything from stdin to socket - } else { - print while <$handle>; # everything from socket to stdout - } - close $handle; - exit; + use IO::Socket; # new in 5.004 + $handle = IO::Socket::INET->new('www.perl.com:80') + or die "can't connect to port 80 on www.perl.com: $!"; + $handle->autoflush(1); + if (fork()) { # XXX: undef means failure + select($handle); + print while <STDIN>; # everything from stdin to socket + } else { + print while <$handle>; # everything from socket to stdout + } + close $handle; + exit; =head2 How can I write expect in Perl? @@ -934,7 +932,7 @@ variable $0 as documented in L<perlvar>. This won't work on all operating systems, though. Daemon programs like sendmail place their state there, as in: - $0 = "orcus [accepting connections]"; + $0 = "orcus [accepting connections]"; =head2 I {changed directory, modified my environment} in a perl script. How come the change disappeared when I exited the script? How do I get my changes to be visible? @@ -985,7 +983,7 @@ tty. Background yourself like this: - fork && exit; + fork && exit; =back @@ -994,31 +992,31 @@ perform these actions for you. =head2 How do I find out if I'm running interactively or not? -Good question. Sometimes C<-t STDIN> and C<-t STDOUT> can give clues, +Good question. Sometimes C<-t STDIN> and C<-t STDOUT> can give clues, sometimes not. - if (-t STDIN && -t STDOUT) { - print "Now what? "; - } + if (-t STDIN && -t STDOUT) { + print "Now what? "; + } On POSIX systems, you can test whether your own process group matches the current process group of your controlling terminal as follows: - use POSIX qw/getpgrp tcgetpgrp/; - - # Some POSIX systems, such as Linux, can be - # without a /dev/tty at boot time. - if (!open(TTY, "/dev/tty")) { - print "no tty\n"; - } else { - $tpgrp = tcgetpgrp(fileno(*TTY)); - $pgrp = getpgrp(); - if ($tpgrp == $pgrp) { - print "foreground\n"; - } else { - print "background\n"; - } - } + use POSIX qw/getpgrp tcgetpgrp/; + + # Some POSIX systems, such as Linux, can be + # without a /dev/tty at boot time. + if (!open(TTY, "/dev/tty")) { + print "no tty\n"; + } else { + $tpgrp = tcgetpgrp(fileno(*TTY)); + $pgrp = getpgrp(); + if ($tpgrp == $pgrp) { + print "foreground\n"; + } else { + print "background\n"; + } + } =head2 How do I timeout a slow event? @@ -1058,8 +1056,8 @@ You can't. You need to imitate the system() call (see L<perlipc> for sample code) and then have a signal handler for the INT signal that passes the signal on to the subprocess. Or you can check for it: - $rc = system($cmd); - if ($rc & 127) { die "signal death" } + $rc = system($cmd); + if ($rc & 127) { die "signal death" } =head2 How do I open a file without blocking? @@ -1068,9 +1066,9 @@ non-blocking reads (most Unixish systems do), you need only to use the O_NDELAY or O_NONBLOCK flag from the Fcntl module in conjunction with sysopen(): - use Fcntl; - sysopen(FH, "/foo/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644) - or die "can't open /foo/somefile: $!": + use Fcntl; + sysopen(FH, "/foo/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644) + or die "can't open /foo/somefile: $!": =head2 How do I tell the difference between errors from the shell and perl? @@ -1145,12 +1143,12 @@ might not be perl's message. The easiest way is to have a module also named CPAN do it for you. This module comes with perl version 5.004 and later. - $ perl -MCPAN -e shell + $ perl -MCPAN -e shell - cpan shell -- CPAN exploration and modules installation (v1.59_54) - ReadLine support enabled + cpan shell -- CPAN exploration and modules installation (v1.59_54) + ReadLine support enabled - cpan> install Some::Module + cpan> install Some::Module To manually install the CPAN module, or any well-behaved CPAN module for that matter, follow these steps: @@ -1163,19 +1161,19 @@ Unpack the source into a temporary area. =item 2 - perl Makefile.PL + perl Makefile.PL =item 3 - make + make =item 4 - make test + make test =item 5 - make install + make install =back @@ -1192,19 +1190,19 @@ and use?". Perl offers several different ways to include code from one file into another. Here are the deltas between the various inclusion constructs: - 1) do $file is like eval `cat $file`, except the former + 1) do $file is like eval `cat $file`, except the former 1.1: searches @INC and updates %INC. 1.2: bequeaths an *unrelated* lexical scope on the eval'ed code. - 2) require $file is like do $file, except the former + 2) require $file is like do $file, except the former 2.1: checks for redundant loading, skipping already loaded files. 2.2: raises an exception on failure to find, compile, or execute $file. - 3) require Module is like require "Module.pm", except the former + 3) require Module is like require "Module.pm", except the former 3.1: translates each "::" into your system's directory separator. 3.2: primes the parser to disambiguate class Module as an indirect object. - 4) use Module is like require Module, except the former + 4) use Module is like require Module, except the former 4.1: loads the module at compile time, not run-time. 4.2: imports symbols and semantics from that package to the current one. @@ -1215,37 +1213,37 @@ In general, you usually want C<use> and a proper Perl module. When you build modules, use the PREFIX and LIB options when generating Makefiles: - perl Makefile.PL PREFIX=/mydir/perl LIB=/mydir/perl/lib + perl Makefile.PL PREFIX=/mydir/perl LIB=/mydir/perl/lib then either set the PERL5LIB environment variable before you run scripts that use the modules/libraries (see L<perlrun>) or say - use lib '/mydir/perl/lib'; + use lib '/mydir/perl/lib'; This is almost the same as - BEGIN { + BEGIN { unshift(@INC, '/mydir/perl/lib'); - } + } except that the lib module checks for machine-dependent subdirectories. See Perl's L<lib> for more information. =head2 How do I add the directory my program lives in to the module/library search path? - use FindBin; - use lib "$FindBin::Bin"; - use your_own_modules; + use FindBin; + use lib "$FindBin::Bin"; + use your_own_modules; =head2 How do I add a directory to my include path (@INC) at runtime? Here are the suggested ways of modifying your include path: - the PERLLIB environment variable - the PERL5LIB environment variable - the perl -Idir command line flag - the use lib pragma, as in - use lib "$ENV{HOME}/myown_perllib"; + the PERLLIB environment variable + the PERL5LIB environment variable + the perl -Idir command line flag + the use lib pragma, as in + use lib "$ENV{HOME}/myown_perllib"; The latter is particularly useful because it knows about machine dependent architectures. The lib.pm pragmatic module was first @@ -1259,9 +1257,9 @@ but other times it is not. Modern programs C<use Socket;> instead. =head1 REVISION -Revision: $Revision: 3606 $ +Revision: $Revision: 6628 $ -Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $ +Date: $Date: 2006-07-09 14:46:14 +0200 (dim, 09 jui 2006) $ See L<perlfaq> for source control details and availability. diff --git a/pod/perlfaq9.pod b/pod/perlfaq9.pod index f017a40879..8faf2fb003 100644 --- a/pod/perlfaq9.pod +++ b/pod/perlfaq9.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq9 - Networking ($Revision: 3606 $) +perlfaq9 - Networking ($Revision: 6309 $) =head1 DESCRIPTION @@ -311,7 +311,7 @@ an absolute URLpath. To enable authentication for your web server, you need to configure your web server. The configuration is different for different sorts -of web servers---apache does it differently from iPlanet which does +of web servers--apache does it differently from iPlanet which does it differently from IIS. Check your web server documentation for the details for your particular server. @@ -404,7 +404,7 @@ if it is a deliverable address (i.e. that mail to the address will not bounce). Modules like Mail::CheckUser and Mail::EXPN try to interact with the domain name system or particular mail servers to learn even more, but their methods do not -work everywhere---especially for security conscious administrators. +work everywhere--especially for security conscious administrators. Many are tempted to try to eliminate many frequently-invalid mail addresses with a simple regex, such as @@ -649,9 +649,9 @@ http://search.cpan.org/search?query=RPC&mode=all ). =head1 REVISION -Revision: $Revision: 3606 $ +Revision: $Revision: 6309 $ -Date: $Date: 2006-03-06 12:05:47 +0100 (lun, 06 mar 2006) $ +Date: $Date: 2006-05-18 07:44:45 +0200 (jeu, 18 mai 2006) $ See L<perlfaq> for source control details and availability. |