diff options
author | Rafael Garcia-Suarez <rgarciasuarez@gmail.com> | 2002-11-26 21:06:48 +0000 |
---|---|---|
committer | Rafael Garcia-Suarez <rgarciasuarez@gmail.com> | 2002-11-26 21:06:48 +0000 |
commit | 49d635f9372392ae44fe4c5b62b06e41912ae0c9 (patch) | |
tree | 29a0e48c51466f10da69fffa12babc88587672a9 /pod/perlfaq4.pod | |
parent | ad0f383a28b730182ea06492027f82167ce7032b (diff) | |
download | perl-49d635f9372392ae44fe4c5b62b06e41912ae0c9.tar.gz |
PerlFAQ sync.
p4raw-id: //depot/perl@18185
Diffstat (limited to 'pod/perlfaq4.pod')
-rw-r--r-- | pod/perlfaq4.pod | 370 |
1 files changed, 194 insertions, 176 deletions
diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod index f2512059cc..7c616aca3d 100644 --- a/pod/perlfaq4.pod +++ b/pod/perlfaq4.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq4 - Data Manipulation ($Revision: 1.25 $, $Date: 2002/05/30 07:04:25 $) +perlfaq4 - Data Manipulation ($Revision: 1.37 $, $Date: 2002/11/13 06:04:00 $) =head1 DESCRIPTION @@ -11,56 +11,36 @@ numbers, dates, strings, arrays, hashes, and miscellaneous data issues. =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)? -The infinite set that a mathematician thinks of as the real numbers can -only be approximated on a computer, since the computer only has a finite -number of bits to store an infinite number of, um, numbers. - -Internally, your computer represents floating-point numbers in binary. -Floating-point numbers read in from a file or appearing as literals -in your program are converted from their decimal floating-point -representation (eg, 19.95) to an internal binary representation. - -However, 19.95 can't be precisely represented as a binary -floating-point number, just like 1/3 can't be exactly represented as a -decimal floating-point number. The computer's binary representation -of 19.95, therefore, isn't exactly 19.95. - -When a floating-point number gets printed, the binary floating-point -representation is converted back to decimal. These decimal numbers -are displayed in either the format you specify with printf(), or the -current output format for numbers. (See L<perlvar/"$#"> if you use -print. C<$#> has a different default value in Perl5 than it did in -Perl4. Changing C<$#> yourself is deprecated.) - -This affects B<all> computer languages that represent decimal -floating-point numbers in binary, not just Perl. Perl provides -arbitrary-precision decimal numbers with the Math::BigFloat module -(part of the standard Perl distribution), but mathematical operations -are consequently slower. - -If precision is important, such as when dealing with money, it's good -to work with integers and then divide at the last possible moment. -For example, work in pennies (1995) instead of dollars and cents -(19.95) and divide by 100 at the end. - -To get rid of the superfluous digits, just use a format (eg, -C<printf("%.2f", 19.95)>) to get the required precision. -See L<perlop/"Floating-point Arithmetic">. +Internally, your computer represents floating-point numbers +in binary. Digital (as in powers of two) computers cannot +store all numbers exactly. Some real numbers lose precision +in the process. This is a problem with how computers store +numbers and affects all computer languages, not just Perl. +L<perlnumber> show the gory details of number +representations and conversions. + +To limit the number of decimal places in your numbers, you +can use the printf or sprintf function. See the +L<perlop|"Floating Point Arithmetic"> for more details. + + printf "%.2f", 10/3; + + my $number = sprintf "%.2f", 10/3; + =head2 Why isn't my octal data interpreted correctly? -Perl only understands octal and hex numbers as such when they occur -as literals in your program. Octal literals in perl must start with -a leading "0" and hexadecimal literals must start with a leading "0x". -If they are read in from somewhere and assigned, no automatic -conversion takes place. You must explicitly use oct() or hex() if you -want the values converted to decimal. oct() interprets -both hex ("0x350") numbers and octal ones ("0350" or even without the -leading "0", like "377"), while hex() only converts hexadecimal ones, -with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef". +Perl only understands octal and hex numbers as such when they occur as +literals in your program. Octal literals in perl must start with a +leading "0" and hexadecimal literals must start with a leading "0x". +If they are read in from somewhere and assigned, no automatic +conversion takes place. You must explicitly use oct() or hex() if you +want the values converted to decimal. oct() interprets hex ("0x350"), +octal ("0350" or even without the leading "0", like "377") and binary +("0b1010") numbers, while hex() only converts hexadecimal ones, with +or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef". The inverse mapping from decimal to octal can be done with either the -"%o" or "%O" sprintf() formats. To get from decimal to hex try either -the "%x" or the "%X" formats to sprintf(). +"%o" or "%O" sprintf() formats. This problem shows up most often when people try using chmod(), mkdir(), umask(), or sysopen(), which by widespread tradition typically take @@ -264,7 +244,7 @@ C<00110011>). The operators work with the binary form of a number (the number C<3> is treated as the bit pattern C<00000011>). So, saying C<11 & 3> performs the "and" operation on numbers (yielding -C<1>). Saying C<"11" & "3"> performs the "and" operation on strings +C<3>). Saying C<"11" & "3"> performs the "and" operation on strings (yielding C<"1">). Most problems with C<&> and C<|> arise because the programmer thinks @@ -335,14 +315,17 @@ Get the http://www.cpan.org/modules/by-module/Roman module. If you're using a version of Perl before 5.004, you must call C<srand> once at the start of your program to seed the random number generator. + + BEGIN { srand() if $[ < 5.004 } + 5.004 and later automatically call C<srand> at the beginning. Don't -call C<srand> more than once--you make your numbers less random, rather +call C<srand> more than once---you make your numbers less random, rather than more. Computers are good at being predictable and bad at being random (despite appearances caused by bugs in your programs :-). see the -F<random> artitcle in the "Far More Than You Ever Wanted To Know" -collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz , courtesy of +F<random> article in the "Far More Than You Ever Wanted To Know" +collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of Tom Phoenix, talks more about this. John von Neumann said, ``Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin.'' @@ -388,11 +371,20 @@ Use the following simple functions: return 1+int((((localtime(shift || time))[5] + 1899))/1000); } -On some systems, you'll find that the POSIX module's strftime() function -has been extended in a non-standard way to use a C<%C> format, which they -sometimes claim is the "century". It isn't, because on most such systems, -this is only the first two digits of the four-digit year, and thus cannot -be used to reliably determine the current century or millennium. +You can also use the POSIX strftime() function which may be a bit +slower but is easier to read and maintain. + + use POSIX qw/strftime/; + + my $week_of_the_year = strftime "%W", localtime; + my $day_of_the_year = strftime "%j", localtime; + +On some systems, the POSIX module's strftime() function has +been extended in a non-standard way to use a C<%C> format, +which they sometimes claim is the "century". It isn't, +because on most such systems, this is only the first two +digits of the four-digit year, and thus cannot be used to +reliably determine the current century or millennium. =head2 How can I compare two dates and find the difference? @@ -438,58 +430,60 @@ modules. (Thanks to David Cassell for most of this text.) =head2 How do I find yesterday's date? -The C<time()> function returns the current time in seconds since the -epoch. Take twenty-four hours off that: +If you only need to find the date (and not the same time), you +can use the Date::Calc module. - $yesterday = time() - ( 24 * 60 * 60 ); + use Date::Calc qw(Today Add_Delta_Days); + + my @date = Add_Delta_Days( Today(), -1 ); + + print "@date\n"; -Then you can pass this to C<localtime()> and get the individual year, -month, day, hour, minute, seconds values. - -Note very carefully that the code above assumes that your days are -twenty-four hours each. For most people, there are two days a year -when they aren't: the switch to and from summer time throws this off. -A solution to this issue is offered by Russ Allbery. +Most people try to use the time rather than the calendar to +figure out dates, but that assumes that your days are +twenty-four hours each. For most people, there are two days +a year when they aren't: the switch to and from summer time +throws this off. Russ Allbery offers this solution. sub yesterday { - my $now = defined $_[0] ? $_[0] : time; - my $then = $now - 60 * 60 * 24; - my $ndst = (localtime $now)[8] > 0; - my $tdst = (localtime $then)[8] > 0; - $then - ($tdst - $ndst) * 60 * 60; - } - # Should give you "this time yesterday" in seconds since epoch relative to - # the first argument or the current time if no argument is given and - # suitable for passing to localtime or whatever else you need to do with - # it. $ndst is whether we're currently in daylight savings time; $tdst is - # whether the point 24 hours ago was in daylight savings time. If $tdst - # and $ndst are the same, a boundary wasn't crossed, and the correction - # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more - # from yesterday's time since we gained an extra hour while going off - # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a - # negative hour (add an hour) to yesterday's time since we lost an hour. - # - # All of this is because during those days when one switches off or onto - # DST, a "day" isn't 24 hours long; it's either 23 or 25. - # - # The explicit settings of $ndst and $tdst are necessary because localtime - # only says it returns the system tm struct, and the system tm struct at - # least on Solaris doesn't guarantee any particular positive value (like, - # say, 1) for isdst, just a positive value. And that value can - # potentially be negative, if DST information isn't available (this sub - # just treats those cases like no DST). - # - # Note that between 2am and 3am on the day after the time zone switches - # off daylight savings time, the exact hour of "yesterday" corresponding - # to the current hour is not clearly defined. Note also that if used - # between 2am and 3am the day after the change to daylight savings time, - # the result will be between 3am and 4am of the previous day; it's - # arguable whether this is correct. - # - # This sub does not attempt to deal with leap seconds (most things don't). - # - # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu> - # This code is in the public domain + my $now = defined $_[0] ? $_[0] : time; + my $then = $now - 60 * 60 * 24; + my $ndst = (localtime $now)[8] > 0; + my $tdst = (localtime $then)[8] > 0; + $then - ($tdst - $ndst) * 60 * 60; + } + +Should give you "this time yesterday" in seconds since epoch relative to +the first argument or the current time if no argument is given and +suitable for passing to localtime or whatever else you need to do with +it. $ndst is whether we're currently in daylight savings time; $tdst is +whether the point 24 hours ago was in daylight savings time. If $tdst +and $ndst are the same, a boundary wasn't crossed, and the correction +will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more +from yesterday's time since we gained an extra hour while going off +daylight savings time. If $tdst is 0 and $ndst is 1, subtract a +negative hour (add an hour) to yesterday's time since we lost an hour. + +All of this is because during those days when one switches off or onto +DST, a "day" isn't 24 hours long; it's either 23 or 25. + +The explicit settings of $ndst and $tdst are necessary because localtime +only says it returns the system tm struct, and the system tm struct at +least on Solaris doesn't guarantee any particular positive value (like, +say, 1) for isdst, just a positive value. And that value can +potentially be negative, if DST information isn't available (this sub +just treats those cases like no DST). + +Note that between 2am and 3am on the day after the time zone switches +off daylight savings time, the exact hour of "yesterday" corresponding +to the current hour is not clearly defined. Note also that if used +between 2am and 3am the day after the change to daylight savings time, +the result will be between 3am and 4am of the previous day; it's +arguable whether this is correct. + +This sub does not attempt to deal with leap seconds (most things don't). + + =head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant? @@ -557,14 +551,6 @@ a subroutine call (in list context) into a string: print "My sub returned @{[mysub(1,2,3)]} that time.\n"; -If you prefer scalar context, similar chicanery is also useful for -arbitrary expressions: - - print "That yields ${\($n + 5)} widgets\n"; - -Version 5.004 of Perl had a bug that gave list context to the -expression in C<${...}>, but this is fixed in version 5.005. - See also ``How can I expand variables in text strings?'' in this section of the FAQ. @@ -645,23 +631,25 @@ done by making a shell alias, like so: See the documentation for Text::Autoformat to appreciate its many capabilities. -=head2 How can I access/change the first N letters of a string? - -There are many ways. If you just want to grab a copy, use -substr(): +=head2 How can I access or change N characters of a string? - $first_byte = substr($a, 0, 1); +You can access the first characters of a string with substr(). +To get the first character, for example, start at position 0 +and grab the string of length 1. -If you want to modify part of a string, the simplest way is often to -use substr() as an lvalue: - substr($a, 0, 3) = "Tom"; + $string = "Just another Perl Hacker"; + $first_char = substr( $string, 0, 1 ); # 'J' -Although those with a pattern matching kind of thought process will -likely prefer +To change part of a string, you can use the optional fourth +argument which is the replacement string. - $a =~ s/^.../Tom/; + substr( $string, 13, 4, "Perl 5.8.0" ); + +You can also use substr() as an lvalue. + substr( $string, 13, 4 ) = "Perl 5.8.0"; + =head2 How do I change the Nth occurrence of something? You have to keep track of N yourself. For example, let's say you want @@ -753,20 +741,21 @@ case", but that's not quite accurate. Consider the proper capitalization of the movie I<Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb>, for example. -=head2 How can I split a [character] delimited string except when inside -[character]? (Comma-separated files) +=head2 How can I split a [character] delimited string except when inside [character]? -Take the example case of trying to split a string that is comma-separated -into its different fields. (We'll pretend you said comma-separated, not -comma-delimited, which is different and almost never what you mean.) You -can't use C<split(/,/)> because you shouldn't split if the comma is inside -quotes. For example, take a data line like this: +Several modules can handle this sort of pasing---Text::Balanced, +Text::CVS, Text::CVS_XS, and Text::ParseWords, among others. + +Take the example case of trying to split a string that is +comma-separated into its different fields. You can't use C<split(/,/)> +because you shouldn't split if the comma is inside quotes. For +example, take a data line like this: SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped" Due to the restriction of the quotes, this is a fairly complex -problem. Thankfully, we have Jeffrey Friedl, author of a highly -recommended book on regular expressions, to handle these for us. He +problem. Thankfully, we have Jeffrey Friedl, author of +I<Mastering Regular Expressions>, to handle these for us. He suggests (assuming your string is contained in $text): @new = (); @@ -779,8 +768,7 @@ suggests (assuming your string is contained in $text): If you want to represent quotation marks inside a quotation-mark-delimited field, escape them with backslashes (eg, -C<"like \"this\"">. Unescaping them is a task addressed earlier in -this section. +C<"like \"this\"">. Alternatively, the Text::ParseWords module (part of the standard Perl distribution) lets you say: @@ -1271,16 +1259,37 @@ an exercise to the reader. =head2 How do I find the first array element for which a condition is true? -You can use this if you care about the index: - - for ($i= 0; $i < @array; $i++) { - if ($array[$i] eq "Waldo") { - $found_index = $i; - last; +To find the first array element which satisfies a condition, you can +use the first() function in the List::Util module, which comes with +Perl 5.8. This example finds the first element that contains "Perl". + + use List::Util qw(first); + + my $element = first { /Perl/ } @array; + +If you cannot use List::Util, you can make your own loop to do the +same thing. Once you find the element, you stop the loop with last. + + my $found; + foreach my $element ( @array ) + { + if( /Perl/ ) { $found = $element; last } + } + +If you want the array index, you can iterate through the indices +and check the array element at each index until you find one +that satisfies the condition. + + my( $found, $i ) = ( undef, -1 ); + for( $i = 0; $i < @array; $i++ ) + { + if( $array[$i] =~ /Perl/ ) + { + $found = $array[$i]; + $index = $i; + last; + } } - } - -Now C<$found_index> has what you want. =head2 How do I handle linked lists? @@ -1399,6 +1408,11 @@ Here's another; let's compute spherical volumes: $_ **= 3; $_ *= (4/3) * 3.14159; # this will be constant folded } + +which can also be done with map() which is made to transform +one list into another: + + @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii; If you want to do the same thing to modify the values of the hash, you can use the C<values> function. As of Perl 5.6 @@ -1431,34 +1445,40 @@ call to rand), you're almost certainly doing something wrong. =head2 How do I permute N elements of a list? -Here's a little program that generates all permutations -of all the words on each line of input. The algorithm embodied -in the permute() function should work on any list: - - #!/usr/bin/perl -n - # tsc-permute: permute each word of input - permute([split], []); - sub permute { - my @items = @{ $_[0] }; - my @perms = @{ $_[1] }; - unless (@items) { - print "@perms\n"; - } else { - my(@newitems,@newperms,$i); - foreach $i (0 .. $#items) { - @newitems = @items; - @newperms = @perms; - unshift(@newperms, splice(@newitems, $i, 1)); - permute([@newitems], [@newperms]); - } +Use the List::Permutor module on CPAN. If the list is +actually an array, try the Algorithm::Permute module (also +on CPAN). It's written in XS code and is very efficient. + + use Algorithm::Permute; + my @array = 'a'..'d'; + my $p_iterator = Algorithm::Permute->new ( \@array ); + while (my @perm = $p_iterator->next) { + print "next permutation: (@perm)\n"; + } + +Here's a little program that generates all permutations of +all the words on each line of input. The algorithm embodied +in the permute() function is discussed in Volume 4 (still +unpublished) of Knuth's I<The Art of Computer Programming> +and will work on any list: + + #!/usr/bin/perl -n + # Fischer-Kause ordered permutation generator + + sub permute (&@) { + my $code = shift; + my @idx = 0..$#_; + while ( $code->(@_[@idx]) ) { + my $p = $#idx; + --$p while $idx[$p-1] > $idx[$p]; + my $q = $p or return; + push @idx, reverse splice @idx, $p; + ++$q while $idx[$p-1] > $idx[$q]; + @idx[$p-1,$q]=@idx[$q,$p-1]; + } } - } -Unfortunately, this algorithm is very inefficient. The Algorithm::Permute -module from CPAN runs at least an order of magnitude faster. If you don't -have a C compiler (or a binary distribution of Algorithm::Permute), then -you can use List::Permutor which is written in pure Perl, and is still -several times faster than the algorithm above. + permute {print"@_\n"} split; =head2 How do I sort an array by (anything)? @@ -1502,7 +1522,7 @@ This can be conveniently combined with precalculation of keys as given above. See the F<sort> artitcle article in the "Far More Than You Ever Wanted -To Know" collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz for +To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for more about this approach. See also the question below on sorting hashes. @@ -1842,11 +1862,11 @@ it on top of either DB_File or GDBM_File. Use the Tie::IxHash from CPAN. use Tie::IxHash; - tie(%myhash, Tie::IxHash); - for ($i=0; $i<20; $i++) { + tie my %myhash, Tie::IxHash; + for (my $i=0; $i<20; $i++) { $myhash{$i} = 2*$i; } - @keys = keys %myhash; + my @keys = keys %myhash; # @keys = (0,1,2,3,...) =head2 Why does passing a subroutine an undefined element in a hash create it? @@ -1902,9 +1922,7 @@ this works fine (assuming the files are found): On less elegant (read: Byzantine) systems, however, you have to play tedious games with "text" versus "binary" files. See -L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking -systems are curses out of Microsoft, who seem to be committed to putting -the backward into backward compatibility. +L<perlfunc/"binmode"> or L<perlopentut>. If you're concerned about 8-bit ASCII data, then see L<perllocale>. |