summaryrefslogtreecommitdiff
path: root/pod/perlfaq4.pod
diff options
context:
space:
mode:
authorRafael Garcia-Suarez <rgarciasuarez@gmail.com>2002-11-26 21:06:48 +0000
committerRafael Garcia-Suarez <rgarciasuarez@gmail.com>2002-11-26 21:06:48 +0000
commit49d635f9372392ae44fe4c5b62b06e41912ae0c9 (patch)
tree29a0e48c51466f10da69fffa12babc88587672a9 /pod/perlfaq4.pod
parentad0f383a28b730182ea06492027f82167ce7032b (diff)
downloadperl-49d635f9372392ae44fe4c5b62b06e41912ae0c9.tar.gz
PerlFAQ sync.
p4raw-id: //depot/perl@18185
Diffstat (limited to 'pod/perlfaq4.pod')
-rw-r--r--pod/perlfaq4.pod370
1 files changed, 194 insertions, 176 deletions
diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod
index f2512059cc..7c616aca3d 100644
--- a/pod/perlfaq4.pod
+++ b/pod/perlfaq4.pod
@@ -1,6 +1,6 @@
=head1 NAME
-perlfaq4 - Data Manipulation ($Revision: 1.25 $, $Date: 2002/05/30 07:04:25 $)
+perlfaq4 - Data Manipulation ($Revision: 1.37 $, $Date: 2002/11/13 06:04:00 $)
=head1 DESCRIPTION
@@ -11,56 +11,36 @@ numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
-The infinite set that a mathematician thinks of as the real numbers can
-only be approximated on a computer, since the computer only has a finite
-number of bits to store an infinite number of, um, numbers.
-
-Internally, your computer represents floating-point numbers in binary.
-Floating-point numbers read in from a file or appearing as literals
-in your program are converted from their decimal floating-point
-representation (eg, 19.95) to an internal binary representation.
-
-However, 19.95 can't be precisely represented as a binary
-floating-point number, just like 1/3 can't be exactly represented as a
-decimal floating-point number. The computer's binary representation
-of 19.95, therefore, isn't exactly 19.95.
-
-When a floating-point number gets printed, the binary floating-point
-representation is converted back to decimal. These decimal numbers
-are displayed in either the format you specify with printf(), or the
-current output format for numbers. (See L<perlvar/"$#"> if you use
-print. C<$#> has a different default value in Perl5 than it did in
-Perl4. Changing C<$#> yourself is deprecated.)
-
-This affects B<all> computer languages that represent decimal
-floating-point numbers in binary, not just Perl. Perl provides
-arbitrary-precision decimal numbers with the Math::BigFloat module
-(part of the standard Perl distribution), but mathematical operations
-are consequently slower.
-
-If precision is important, such as when dealing with money, it's good
-to work with integers and then divide at the last possible moment.
-For example, work in pennies (1995) instead of dollars and cents
-(19.95) and divide by 100 at the end.
-
-To get rid of the superfluous digits, just use a format (eg,
-C<printf("%.2f", 19.95)>) to get the required precision.
-See L<perlop/"Floating-point Arithmetic">.
+Internally, your computer represents floating-point numbers
+in binary. Digital (as in powers of two) computers cannot
+store all numbers exactly. Some real numbers lose precision
+in the process. This is a problem with how computers store
+numbers and affects all computer languages, not just Perl.
+L<perlnumber> show the gory details of number
+representations and conversions.
+
+To limit the number of decimal places in your numbers, you
+can use the printf or sprintf function. See the
+L<perlop|"Floating Point Arithmetic"> for more details.
+
+ printf "%.2f", 10/3;
+
+ my $number = sprintf "%.2f", 10/3;
+
=head2 Why isn't my octal data interpreted correctly?
-Perl only understands octal and hex numbers as such when they occur
-as literals in your program. Octal literals in perl must start with
-a leading "0" and hexadecimal literals must start with a leading "0x".
-If they are read in from somewhere and assigned, no automatic
-conversion takes place. You must explicitly use oct() or hex() if you
-want the values converted to decimal. oct() interprets
-both hex ("0x350") numbers and octal ones ("0350" or even without the
-leading "0", like "377"), while hex() only converts hexadecimal ones,
-with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
+Perl only understands octal and hex numbers as such when they occur as
+literals in your program. Octal literals in perl must start with a
+leading "0" and hexadecimal literals must start with a leading "0x".
+If they are read in from somewhere and assigned, no automatic
+conversion takes place. You must explicitly use oct() or hex() if you
+want the values converted to decimal. oct() interprets hex ("0x350"),
+octal ("0350" or even without the leading "0", like "377") and binary
+("0b1010") numbers, while hex() only converts hexadecimal ones, with
+or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
The inverse mapping from decimal to octal can be done with either the
-"%o" or "%O" sprintf() formats. To get from decimal to hex try either
-the "%x" or the "%X" formats to sprintf().
+"%o" or "%O" sprintf() formats.
This problem shows up most often when people try using chmod(), mkdir(),
umask(), or sysopen(), which by widespread tradition typically take
@@ -264,7 +244,7 @@ C<00110011>). The operators work with the binary form of a number
(the number C<3> is treated as the bit pattern C<00000011>).
So, saying C<11 & 3> performs the "and" operation on numbers (yielding
-C<1>). Saying C<"11" & "3"> performs the "and" operation on strings
+C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
(yielding C<"1">).
Most problems with C<&> and C<|> arise because the programmer thinks
@@ -335,14 +315,17 @@ Get the http://www.cpan.org/modules/by-module/Roman module.
If you're using a version of Perl before 5.004, you must call C<srand>
once at the start of your program to seed the random number generator.
+
+ BEGIN { srand() if $[ < 5.004 }
+
5.004 and later automatically call C<srand> at the beginning. Don't
-call C<srand> more than once--you make your numbers less random, rather
+call C<srand> more than once---you make your numbers less random, rather
than more.
Computers are good at being predictable and bad at being random
(despite appearances caused by bugs in your programs :-). see the
-F<random> artitcle in the "Far More Than You Ever Wanted To Know"
-collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz , courtesy of
+F<random> article in the "Far More Than You Ever Wanted To Know"
+collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy of
Tom Phoenix, talks more about this. John von Neumann said, ``Anyone
who attempts to generate random numbers by deterministic means is, of
course, living in a state of sin.''
@@ -388,11 +371,20 @@ Use the following simple functions:
return 1+int((((localtime(shift || time))[5] + 1899))/1000);
}
-On some systems, you'll find that the POSIX module's strftime() function
-has been extended in a non-standard way to use a C<%C> format, which they
-sometimes claim is the "century". It isn't, because on most such systems,
-this is only the first two digits of the four-digit year, and thus cannot
-be used to reliably determine the current century or millennium.
+You can also use the POSIX strftime() function which may be a bit
+slower but is easier to read and maintain.
+
+ use POSIX qw/strftime/;
+
+ my $week_of_the_year = strftime "%W", localtime;
+ my $day_of_the_year = strftime "%j", localtime;
+
+On some systems, the POSIX module's strftime() function has
+been extended in a non-standard way to use a C<%C> format,
+which they sometimes claim is the "century". It isn't,
+because on most such systems, this is only the first two
+digits of the four-digit year, and thus cannot be used to
+reliably determine the current century or millennium.
=head2 How can I compare two dates and find the difference?
@@ -438,58 +430,60 @@ modules. (Thanks to David Cassell for most of this text.)
=head2 How do I find yesterday's date?
-The C<time()> function returns the current time in seconds since the
-epoch. Take twenty-four hours off that:
+If you only need to find the date (and not the same time), you
+can use the Date::Calc module.
- $yesterday = time() - ( 24 * 60 * 60 );
+ use Date::Calc qw(Today Add_Delta_Days);
+
+ my @date = Add_Delta_Days( Today(), -1 );
+
+ print "@date\n";
-Then you can pass this to C<localtime()> and get the individual year,
-month, day, hour, minute, seconds values.
-
-Note very carefully that the code above assumes that your days are
-twenty-four hours each. For most people, there are two days a year
-when they aren't: the switch to and from summer time throws this off.
-A solution to this issue is offered by Russ Allbery.
+Most people try to use the time rather than the calendar to
+figure out dates, but that assumes that your days are
+twenty-four hours each. For most people, there are two days
+a year when they aren't: the switch to and from summer time
+throws this off. Russ Allbery offers this solution.
sub yesterday {
- my $now = defined $_[0] ? $_[0] : time;
- my $then = $now - 60 * 60 * 24;
- my $ndst = (localtime $now)[8] > 0;
- my $tdst = (localtime $then)[8] > 0;
- $then - ($tdst - $ndst) * 60 * 60;
- }
- # Should give you "this time yesterday" in seconds since epoch relative to
- # the first argument or the current time if no argument is given and
- # suitable for passing to localtime or whatever else you need to do with
- # it. $ndst is whether we're currently in daylight savings time; $tdst is
- # whether the point 24 hours ago was in daylight savings time. If $tdst
- # and $ndst are the same, a boundary wasn't crossed, and the correction
- # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
- # from yesterday's time since we gained an extra hour while going off
- # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
- # negative hour (add an hour) to yesterday's time since we lost an hour.
- #
- # All of this is because during those days when one switches off or onto
- # DST, a "day" isn't 24 hours long; it's either 23 or 25.
- #
- # The explicit settings of $ndst and $tdst are necessary because localtime
- # only says it returns the system tm struct, and the system tm struct at
- # least on Solaris doesn't guarantee any particular positive value (like,
- # say, 1) for isdst, just a positive value. And that value can
- # potentially be negative, if DST information isn't available (this sub
- # just treats those cases like no DST).
- #
- # Note that between 2am and 3am on the day after the time zone switches
- # off daylight savings time, the exact hour of "yesterday" corresponding
- # to the current hour is not clearly defined. Note also that if used
- # between 2am and 3am the day after the change to daylight savings time,
- # the result will be between 3am and 4am of the previous day; it's
- # arguable whether this is correct.
- #
- # This sub does not attempt to deal with leap seconds (most things don't).
- #
- # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu>
- # This code is in the public domain
+ my $now = defined $_[0] ? $_[0] : time;
+ my $then = $now - 60 * 60 * 24;
+ my $ndst = (localtime $now)[8] > 0;
+ my $tdst = (localtime $then)[8] > 0;
+ $then - ($tdst - $ndst) * 60 * 60;
+ }
+
+Should give you "this time yesterday" in seconds since epoch relative to
+the first argument or the current time if no argument is given and
+suitable for passing to localtime or whatever else you need to do with
+it. $ndst is whether we're currently in daylight savings time; $tdst is
+whether the point 24 hours ago was in daylight savings time. If $tdst
+and $ndst are the same, a boundary wasn't crossed, and the correction
+will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more
+from yesterday's time since we gained an extra hour while going off
+daylight savings time. If $tdst is 0 and $ndst is 1, subtract a
+negative hour (add an hour) to yesterday's time since we lost an hour.
+
+All of this is because during those days when one switches off or onto
+DST, a "day" isn't 24 hours long; it's either 23 or 25.
+
+The explicit settings of $ndst and $tdst are necessary because localtime
+only says it returns the system tm struct, and the system tm struct at
+least on Solaris doesn't guarantee any particular positive value (like,
+say, 1) for isdst, just a positive value. And that value can
+potentially be negative, if DST information isn't available (this sub
+just treats those cases like no DST).
+
+Note that between 2am and 3am on the day after the time zone switches
+off daylight savings time, the exact hour of "yesterday" corresponding
+to the current hour is not clearly defined. Note also that if used
+between 2am and 3am the day after the change to daylight savings time,
+the result will be between 3am and 4am of the previous day; it's
+arguable whether this is correct.
+
+This sub does not attempt to deal with leap seconds (most things don't).
+
+
=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
@@ -557,14 +551,6 @@ a subroutine call (in list context) into a string:
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
-If you prefer scalar context, similar chicanery is also useful for
-arbitrary expressions:
-
- print "That yields ${\($n + 5)} widgets\n";
-
-Version 5.004 of Perl had a bug that gave list context to the
-expression in C<${...}>, but this is fixed in version 5.005.
-
See also ``How can I expand variables in text strings?'' in this
section of the FAQ.
@@ -645,23 +631,25 @@ done by making a shell alias, like so:
See the documentation for Text::Autoformat to appreciate its many
capabilities.
-=head2 How can I access/change the first N letters of a string?
-
-There are many ways. If you just want to grab a copy, use
-substr():
+=head2 How can I access or change N characters of a string?
- $first_byte = substr($a, 0, 1);
+You can access the first characters of a string with substr().
+To get the first character, for example, start at position 0
+and grab the string of length 1.
-If you want to modify part of a string, the simplest way is often to
-use substr() as an lvalue:
- substr($a, 0, 3) = "Tom";
+ $string = "Just another Perl Hacker";
+ $first_char = substr( $string, 0, 1 ); # 'J'
-Although those with a pattern matching kind of thought process will
-likely prefer
+To change part of a string, you can use the optional fourth
+argument which is the replacement string.
- $a =~ s/^.../Tom/;
+ substr( $string, 13, 4, "Perl 5.8.0" );
+
+You can also use substr() as an lvalue.
+ substr( $string, 13, 4 ) = "Perl 5.8.0";
+
=head2 How do I change the Nth occurrence of something?
You have to keep track of N yourself. For example, let's say you want
@@ -753,20 +741,21 @@ case", but that's not quite accurate. Consider the proper
capitalization of the movie I<Dr. Strangelove or: How I Learned to
Stop Worrying and Love the Bomb>, for example.
-=head2 How can I split a [character] delimited string except when inside
-[character]? (Comma-separated files)
+=head2 How can I split a [character] delimited string except when inside [character]?
-Take the example case of trying to split a string that is comma-separated
-into its different fields. (We'll pretend you said comma-separated, not
-comma-delimited, which is different and almost never what you mean.) You
-can't use C<split(/,/)> because you shouldn't split if the comma is inside
-quotes. For example, take a data line like this:
+Several modules can handle this sort of pasing---Text::Balanced,
+Text::CVS, Text::CVS_XS, and Text::ParseWords, among others.
+
+Take the example case of trying to split a string that is
+comma-separated into its different fields. You can't use C<split(/,/)>
+because you shouldn't split if the comma is inside quotes. For
+example, take a data line like this:
SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex
-problem. Thankfully, we have Jeffrey Friedl, author of a highly
-recommended book on regular expressions, to handle these for us. He
+problem. Thankfully, we have Jeffrey Friedl, author of
+I<Mastering Regular Expressions>, to handle these for us. He
suggests (assuming your string is contained in $text):
@new = ();
@@ -779,8 +768,7 @@ suggests (assuming your string is contained in $text):
If you want to represent quotation marks inside a
quotation-mark-delimited field, escape them with backslashes (eg,
-C<"like \"this\"">. Unescaping them is a task addressed earlier in
-this section.
+C<"like \"this\"">.
Alternatively, the Text::ParseWords module (part of the standard Perl
distribution) lets you say:
@@ -1271,16 +1259,37 @@ an exercise to the reader.
=head2 How do I find the first array element for which a condition is true?
-You can use this if you care about the index:
-
- for ($i= 0; $i < @array; $i++) {
- if ($array[$i] eq "Waldo") {
- $found_index = $i;
- last;
+To find the first array element which satisfies a condition, you can
+use the first() function in the List::Util module, which comes with
+Perl 5.8. This example finds the first element that contains "Perl".
+
+ use List::Util qw(first);
+
+ my $element = first { /Perl/ } @array;
+
+If you cannot use List::Util, you can make your own loop to do the
+same thing. Once you find the element, you stop the loop with last.
+
+ my $found;
+ foreach my $element ( @array )
+ {
+ if( /Perl/ ) { $found = $element; last }
+ }
+
+If you want the array index, you can iterate through the indices
+and check the array element at each index until you find one
+that satisfies the condition.
+
+ my( $found, $i ) = ( undef, -1 );
+ for( $i = 0; $i < @array; $i++ )
+ {
+ if( $array[$i] =~ /Perl/ )
+ {
+ $found = $array[$i];
+ $index = $i;
+ last;
+ }
}
- }
-
-Now C<$found_index> has what you want.
=head2 How do I handle linked lists?
@@ -1399,6 +1408,11 @@ Here's another; let's compute spherical volumes:
$_ **= 3;
$_ *= (4/3) * 3.14159; # this will be constant folded
}
+
+which can also be done with map() which is made to transform
+one list into another:
+
+ @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
If you want to do the same thing to modify the values of the
hash, you can use the C<values> function. As of Perl 5.6
@@ -1431,34 +1445,40 @@ call to rand), you're almost certainly doing something wrong.
=head2 How do I permute N elements of a list?
-Here's a little program that generates all permutations
-of all the words on each line of input. The algorithm embodied
-in the permute() function should work on any list:
-
- #!/usr/bin/perl -n
- # tsc-permute: permute each word of input
- permute([split], []);
- sub permute {
- my @items = @{ $_[0] };
- my @perms = @{ $_[1] };
- unless (@items) {
- print "@perms\n";
- } else {
- my(@newitems,@newperms,$i);
- foreach $i (0 .. $#items) {
- @newitems = @items;
- @newperms = @perms;
- unshift(@newperms, splice(@newitems, $i, 1));
- permute([@newitems], [@newperms]);
- }
+Use the List::Permutor module on CPAN. If the list is
+actually an array, try the Algorithm::Permute module (also
+on CPAN). It's written in XS code and is very efficient.
+
+ use Algorithm::Permute;
+ my @array = 'a'..'d';
+ my $p_iterator = Algorithm::Permute->new ( \@array );
+ while (my @perm = $p_iterator->next) {
+ print "next permutation: (@perm)\n";
+ }
+
+Here's a little program that generates all permutations of
+all the words on each line of input. The algorithm embodied
+in the permute() function is discussed in Volume 4 (still
+unpublished) of Knuth's I<The Art of Computer Programming>
+and will work on any list:
+
+ #!/usr/bin/perl -n
+ # Fischer-Kause ordered permutation generator
+
+ sub permute (&@) {
+ my $code = shift;
+ my @idx = 0..$#_;
+ while ( $code->(@_[@idx]) ) {
+ my $p = $#idx;
+ --$p while $idx[$p-1] > $idx[$p];
+ my $q = $p or return;
+ push @idx, reverse splice @idx, $p;
+ ++$q while $idx[$p-1] > $idx[$q];
+ @idx[$p-1,$q]=@idx[$q,$p-1];
+ }
}
- }
-Unfortunately, this algorithm is very inefficient. The Algorithm::Permute
-module from CPAN runs at least an order of magnitude faster. If you don't
-have a C compiler (or a binary distribution of Algorithm::Permute), then
-you can use List::Permutor which is written in pure Perl, and is still
-several times faster than the algorithm above.
+ permute {print"@_\n"} split;
=head2 How do I sort an array by (anything)?
@@ -1502,7 +1522,7 @@ This can be conveniently combined with precalculation of keys as given
above.
See the F<sort> artitcle article in the "Far More Than You Ever Wanted
-To Know" collection in http://www.cpan.org/olddoc/FMTEYEWTK.tgz for
+To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
more about this approach.
See also the question below on sorting hashes.
@@ -1842,11 +1862,11 @@ it on top of either DB_File or GDBM_File.
Use the Tie::IxHash from CPAN.
use Tie::IxHash;
- tie(%myhash, Tie::IxHash);
- for ($i=0; $i<20; $i++) {
+ tie my %myhash, Tie::IxHash;
+ for (my $i=0; $i<20; $i++) {
$myhash{$i} = 2*$i;
}
- @keys = keys %myhash;
+ my @keys = keys %myhash;
# @keys = (0,1,2,3,...)
=head2 Why does passing a subroutine an undefined element in a hash create it?
@@ -1902,9 +1922,7 @@ this works fine (assuming the files are found):
On less elegant (read: Byzantine) systems, however, you have
to play tedious games with "text" versus "binary" files. See
-L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking
-systems are curses out of Microsoft, who seem to be committed to putting
-the backward into backward compatibility.
+L<perlfunc/"binmode"> or L<perlopentut>.
If you're concerned about 8-bit ASCII data, then see L<perllocale>.