diff options
author | Gurusamy Sarathy <gsar@cpan.org> | 1999-05-24 06:26:48 +0000 |
---|---|---|
committer | Gurusamy Sarathy <gsar@cpan.org> | 1999-05-24 06:26:48 +0000 |
commit | d92eb7b0e84a41728b3fbb642691f159dbe28882 (patch) | |
tree | 157aeb98628dc7bb83a2b831cddc389c31e3c926 /pod/perlfaq4.pod | |
parent | 36263cb347dc0d66c6ed49be3e8c8a14c5d21ffb (diff) | |
download | perl-d92eb7b0e84a41728b3fbb642691f159dbe28882.tar.gz |
perlfaq update from Tom Christiansen
p4raw-id: //depot/perl@3459
Diffstat (limited to 'pod/perlfaq4.pod')
-rw-r--r-- | pod/perlfaq4.pod | 272 |
1 files changed, 202 insertions, 70 deletions
diff --git a/pod/perlfaq4.pod b/pod/perlfaq4.pod index 92aee2c7af..700c42abf8 100644 --- a/pod/perlfaq4.pod +++ b/pod/perlfaq4.pod @@ -1,6 +1,6 @@ =head1 NAME -perlfaq4 - Data Manipulation ($Revision: 1.40 $, $Date: 1999/01/08 04:26:39 $) +perlfaq4 - Data Manipulation ($Revision: 1.49 $, $Date: 1999/05/23 20:37:49 $) =head1 DESCRIPTION @@ -104,14 +104,21 @@ are not guaranteed. =head2 How do I convert bits into ints? To turn a string of 1s and 0s like C<10110110> into a scalar containing -its binary value, use the pack() function (documented in -L<perlfunc/"pack">): +its binary value, use the pack() and unpack() functions (documented in +L<perlfunc/"pack" L<perlfunc/"unpack">): - $decimal = pack('B8', '10110110'); + $decimal = unpack('c', pack('B8', '10110110')); + +This packs the string C<10110110> into an eight bit binary structure. +This is then unpack as a character, which returns its ordinal value. + +This does the same thing: + + $decimal = ord(pack('B8', '10110110')); Here's an example of going the other way: - $binary_string = join('', unpack('B*', "\x29")); + $binary_string = unpack('B*', "\x29"); =head2 Why doesn't & work the way I want it to? @@ -228,12 +235,34 @@ American businesses often consider the first week with a Monday in it to be Work Week #1, despite ISO 8601, which considers WW1 to be the first week with a Thursday in it. +=head2 How do I find the current century or millennium? + +Use the following simple functions: + + sub get_century { + return int((((localtime(shift || time))[5] + 1999))/100); + } + sub get_millennium { + return 1+int((((localtime(shift || time))[5] + 1899))/1000); + } + +On some systems, you'll find that the POSIX module's strftime() function +has been extended in a non-standard way to use a C<%C> format, which they +sometimes claim is the "century". It isn't, because on most such systems, +this is only the first two digits of the four-digit year, and thus cannot +be used to reliably determine the current century or millennium. + =head2 How can I compare two dates and find the difference? If you're storing your dates as epoch seconds then simply subtract one from the other. If you've got a structured date (distinct year, day, -month, hour, minute, seconds values) then use one of the Date::Manip -and Date::Calc modules from CPAN. +month, hour, minute, seconds values), then for reasons of accessibility, +simplicity, and efficiency, merely use either timelocal or timegm (from +the Time::Local module in the standard distribution) to reduce structured +dates to epoch seconds. However, if you don't know the precise format of +your dates, then you should probably use either of the Date::Manip and +Date::Calc modules from CPAN before you go hacking up your own parsing +routine to handle arbitrary date formats. =head2 How can I take a string and turn it into epoch seconds? @@ -244,22 +273,83 @@ and Date::Manip modules from CPAN. =head2 How can I find the Julian Day? -Neither Date::Manip nor Date::Calc deal with Julian days. Instead, -there is an example of Julian date calculation that should help you in -Time::JulianDay (part of the Time-modules bundle) which can be found at -http://www.perl.com/CPAN/modules/by-module/Time/. +You could use Date::Calc's Delta_Days function and calculate the number +of days from there. Assuming that's what you really want, that is. + +Before you immerse yourself too deeply in this, be sure to verify that it +is the I<Julian> Day you really want. Are they really just interested in +a way of getting serial days so that they can do date arithmetic? If you +are interested in performing date arithmetic, this can be done using +either Date::Manip or Date::Calc, without converting to Julian Day first. + +There is too much confusion on this issue to cover in this FAQ, but the +term is applied (correctly) to a calendar now supplanted by the Gregorian +Calendar, with the Julian Calendar failing to adjust properly for leap +years on centennial years (among other annoyances). The term is also used +(incorrectly) to mean: [1] days in the Gregorian Calendar; and [2] days +since a particular starting time or `epoch', usually 1970 in the Unix +world and 1980 in the MS-DOS/Windows world. If you find that it is not +the first meaning that you really want, then check out the Date::Manip +and Date::Calc modules. (Thanks to David Cassell for most of this text.) +There is also an example of Julian date calculation that should help you in +http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz =head2 How do I find yesterday's date? The C<time()> function returns the current time in seconds since the -epoch. Take one day off that: +epoch. Take twenty-four hours off that: $yesterday = time() - ( 24 * 60 * 60 ); Then you can pass this to C<localtime()> and get the individual year, month, day, hour, minute, seconds values. +Note very carefully that the code above assumes that your days are +twenty-four hours each. For most people, there are two days a year +when they aren't: the switch to and from summer time throws this off. +A solution to this issue is offered by Russ Allbery. + + sub yesterday { + my $now = defined $_[0] ? $_[0] : time; + my $then = $now - 60 * 60 * 24; + my $ndst = (localtime $now)[8] > 0; + my $tdst = (localtime $then)[8] > 0; + $then - ($tdst - $ndst) * 60 * 60; + } + # Should give you "this time yesterday" in seconds since epoch relative to + # the first argument or the current time if no argument is given and + # suitable for passing to localtime or whatever else you need to do with + # it. $ndst is whether we're currently in daylight savings time; $tdst is + # whether the point 24 hours ago was in daylight savings time. If $tdst + # and $ndst are the same, a boundary wasn't crossed, and the correction + # will subtract 0. If $tdst is 1 and $ndst is 0, subtract an hour more + # from yesterday's time since we gained an extra hour while going off + # daylight savings time. If $tdst is 0 and $ndst is 1, subtract a + # negative hour (add an hour) to yesterday's time since we lost an hour. + # + # All of this is because during those days when one switches off or onto + # DST, a "day" isn't 24 hours long; it's either 23 or 25. + # + # The explicit settings of $ndst and $tdst are necessary because localtime + # only says it returns the system tm struct, and the system tm struct at + # least on Solaris doesn't guarantee any particuliar positive value (like, + # say, 1) for isdst, just a positive value. And that value can + # potentially be negative, if DST information isn't available (this sub + # just treats those cases like no DST). + # + # Note that between 2am and 3am on the day after the time zone switches + # off daylight savings time, the exact hour of "yesterday" corresponding + # to the current hour is not clearly defined. Note also that if used + # between 2am and 3am the day after the change to daylight savings time, + # the result will be between 3am and 4am of the previous day; it's + # arguable whether this is correct. + # + # This sub does not attempt to deal with leap seconds (most things don't). + # + # Copyright relinquished 1999 by Russ Allbery <rra@stanford.edu> + # This code is in the public domain + =head2 Does Perl have a year 2000 problem? Is Perl Y2K compliant? Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is @@ -312,7 +402,11 @@ This won't expand C<"\n"> or C<"\t"> or any other special escapes. To turn C<"abbcccd"> into C<"abccd">: - s/(.)\1/$1/g; + s/(.)\1/$1/g; # add /s to include newlines + +Here's a solution that turns "abbcccd" to "abcd": + + y///cs; # y == tr, but shorter :-) =head2 How do I expand function calls in a string? @@ -353,7 +447,7 @@ Dominus's excellent I<py> tool at http://www.plover.com/~mjd/perl/py/ One simple destructive, inside-out approach that you might try is to pull out the smallest nesting parts one at a time: - while (s//BEGIN((?:(?!BEGIN)(?!END).)*)END/gs) { + while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) { # do something with $1 } @@ -422,24 +516,25 @@ likely prefer: You have to keep track of N yourself. For example, let's say you want to change the fifth occurrence of C<"whoever"> or C<"whomever"> into -C<"whosoever"> or C<"whomsoever">, case insensitively. +C<"whosoever"> or C<"whomsoever">, case insensitively. These +all assume that $_ contains the string to be altered. $count = 0; s{((whom?)ever)}{ ++$count == 5 # is it the 5th? ? "${2}soever" # yes, swap : $1 # renege and leave it there - }igex; + }ige; In the more general case, you can use the C</g> modifier in a C<while> loop, keeping count of matches. $WANT = 3; $count = 0; + $_ = "One fish two fish red fish blue fish"; while (/(\w+)\s+fish\b/gi) { if (++$count == $WANT) { print "The third fish is a $1 one.\n"; - # Warning: don't `last' out of this loop } } @@ -456,7 +551,7 @@ C<tr///> function like so: $string = "ThisXlineXhasXsomeXx'sXinXit"; $count = ($string =~ tr/X//); - print "There are $count X charcters in the string"; + print "There are $count X characters in the string"; This is fine if you are just looking for a single character. However, if you are trying to count multiple character substrings within a @@ -499,7 +594,7 @@ characters by placing a C<use locale> pragma in your program. See L<perllocale> for endless details on locales. This is sometimes referred to as putting something into "title -case", but that's not quite accurate. Consdier the proper +case", but that's not quite accurate. Consider the proper capitalization of the movie I<Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb>, for example. @@ -546,8 +641,8 @@ Although the simplest approach would seem to be: $string =~ s/^\s*(.*?)\s*$/$1/; -This is unnecessarily slow, destructive, and fails with embedded newlines. -It is much better faster to do this in two steps: +Not only is this unnecessarily slow and destructive, it also fails with +embedded newlines. It is much faster to do this operation in two steps: $string =~ s/^\s+//; $string =~ s/\s+$//; @@ -562,7 +657,7 @@ Or more nicely written as: This idiom takes advantage of the C<foreach> loop's aliasing behavior to factor out common code. You can do this on several strings at once, or arrays, or even the -values of a hash if you use a slide: +values of a hash if you use a slice: # trim whitespace in the scalar, the array, # and all the values in the hash @@ -573,41 +668,48 @@ values of a hash if you use a slide: =head2 How do I pad a string with blanks or pad a number with zeroes? -(This answer contributed by Uri Guttman) +(This answer contributed by Uri Guttman, with kibitzing from +Bart Lateur.) In the following examples, C<$pad_len> is the length to which you wish -to pad the string, C<$text> or C<$num> contains the string to be -padded, and C<$pad_char> contains the padding character. You can use a -single character string constant instead of the C<$pad_char> variable -if you know what it is in advance. +to pad the string, C<$text> or C<$num> contains the string to be padded, +and C<$pad_char> contains the padding character. You can use a single +character string constant instead of the C<$pad_char> variable if you +know what it is in advance. And in the same way you can use an integer in +place of C<$pad_len> if you know the pad length in advance. -The simplest method use the C<sprintf> function. It can pad on the -left or right with blanks and on the left with zeroes. +The simplest method uses the C<sprintf> function. It can pad on the left +or right with blanks and on the left with zeroes and it will not +truncate the result. The C<pack> function can only pad strings on the +right with blanks and it will truncate the result to a maximum length of +C<$pad_len>. - # Left padding with blank: - $padded = sprintf( "%${pad_len}s", $text ) ; + # Left padding a string with blanks (no truncation): + $padded = sprintf("%${pad_len}s", $text); - # Right padding with blank: - $padded = sprintf( "%${pad_len}s", $text ) ; + # Right padding a string with blanks (no truncation): + $padded = sprintf("%-${pad_len}s", $text); - # Left padding with 0: - $padded = sprintf( "%0${pad_len}d", $num ) ; + # Left padding a number with 0 (no truncation): + $padded = sprintf("%0${pad_len}d", $num); -If you need to pad with a character other than blank or zero you can use -one of the following methods. + # Right padding a string with blanks using pack (will truncate): + $padded = pack("A$pad_len",$text); -These methods generate a pad string with the C<x> operator and -concatenate that with the original text. +If you need to pad with a character other than blank or zero you can use +one of the following methods. They all generate a pad string with the +C<x> operator and combine that with C<$text>. These methods do +not truncate C<$text>. -Left and right padding with any character: +Left and right padding with any character, creating a new string: - $padded = $pad_char x ( $pad_len - length( $text ) ) . $text ; - $padded = $text . $pad_char x ( $pad_len - length( $text ) ) ; + $padded = $pad_char x ( $pad_len - length( $text ) ) . $text; + $padded = $text . $pad_char x ( $pad_len - length( $text ) ); -Or you can left or right pad $text directly: +Left and right padding with any character, modifying C<$text> directly: - $text .= $pad_char x ( $pad_len - length( $text ) ) ; - substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ) ; + substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ); + $text .= $pad_char x ( $pad_len - length( $text ) ); =head2 How do I extract selected columns from a string? @@ -634,6 +736,13 @@ you can use this kind of thing: =head2 How do I find the soundex value of a string? Use the standard Text::Soundex module distributed with perl. +But before you do so, you may want to determine whether `soundex' is in +fact what you think it is. Knuth's soundex algorithm compresses words +into a small space, and so it does not necessarily distinguish between +two words which you might want to appear separately. For example, the +last names `Knuth' and `Kant' are both mapped to the soundex code K530. +If Text::Soundex does not do what you are looking for, you might want +to consider the String::Approx module available at CPAN. =head2 How can I expand variables in text strings? @@ -767,7 +876,7 @@ This works with leading special strings, dynamically determined: @@@ runops() { @@@ SAVEI32(runlevel); @@@ runlevel++; - @@@ while ( op = (*op->op_ppaddr)() ) ; + @@@ while ( op = (*op->op_ppaddr)() ); @@@ TAINT_NOT; @@@ return 0; @@@ } @@ -805,9 +914,9 @@ When you say $scalar = (2, 5, 7, 9); -you're using the comma operator in scalar context, so it evaluates the -left hand side, then evaluates and returns the left hand side. This -causes the last value to be returned: 9. +you're using the comma operator in scalar context, so it uses the scalar +comma operator. There never was a list there at all! This causes the +last value to be returned: 9. =head2 What is the difference between $array[1] and @array[1]? @@ -827,7 +936,7 @@ with The B<-w> flag will warn you about these matters. -=head2 How can I extract just the unique elements of an array? +=head2 How can I remove duplicate elements from a list or array? There are several possible ways, depending on whether the array is ordered and whether you wish to preserve the ordering. @@ -893,7 +1002,8 @@ array. This kind of an array will take up less space: @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31); undef @is_tiny_prime; - for (@primes) { $is_tiny_prime[$_] = 1; } + for (@primes) { $is_tiny_prime[$_] = 1 } + # or simply @istiny_prime[@primes] = (1) x @primes; Now you check whether $is_tiny_prime[$some_number]. @@ -916,7 +1026,7 @@ or worse yet These are slow (checks every element even if the first matches), inefficient (same reason), and potentially buggy (what if there are -regexp characters in $whatever?). If you're only testing once, then +regex characters in $whatever?). If you're only testing once, then use: $is_there = 0; @@ -941,6 +1051,9 @@ each element is unique in a given array: push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element; } +Note that this is the I<symmetric difference>, that is, all elements in +either A or in B, but not in both. Think of it as an xor operation. + =head2 How do I test whether two arrays or hashes are equal? The following code works for single-level arrays. It uses a stringwise @@ -1078,7 +1191,7 @@ Use this: fisher_yates_shuffle( \@array ); # permutes @array in place -You've probably seen shuffling algorithms that works using splice, +You've probably seen shuffling algorithms that work using splice, randomly picking another element to swap the current element with: srand; @@ -1185,7 +1298,7 @@ that's come to be known as the Schwartzian Transform: @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } - map { [ $_, uc((/\d+\s*(\S+)/ )[0] ] } @data; + map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data; If you need to sort on several fields, the following paradigm is useful. @@ -1311,7 +1424,19 @@ sorting the keys as shown in an earlier question. =head2 What happens if I add or remove keys from a hash while iterating over it? -Don't do that. +Don't do that. :-) + +[lwall] In Perl 4, you were not allowed to modify a hash at all while +interating over it. In Perl 5 you can delete from it, but you still +can't add to it, because that might cause a doubling of the hash table, +in which half the entries get copied up to the new top half of the +table, at which point you've totally bamboozled the interator code. +Even if the table doesn't double, there's no telling whether your new +entry will be inserted before or after the current iterator position. + +Either treasure up your changes and make them after the iterator finishes, +or use keys to fetch all the old keys at once, and iterate over the list +of keys. =head2 How do I look up a hash element by value? @@ -1327,8 +1452,13 @@ to use: $by_value{$value} = $key; } -If your hash could have repeated values, the methods above will only -find one of the associated keys. This may or may not worry you. +If your hash could have repeated values, the methods above will only find +one of the associated keys. This may or may not worry you. If it does +worry you, you can always reverse the hash into a hash of arrays instead: + + while (($key, $value) = each %by_key) { + push @{$key_list_by_value{$value}}, $key; + } =head2 How can I know how many entries are in a hash? @@ -1337,8 +1467,9 @@ take the scalar sense of the keys() function: $num_keys = scalar keys %hash; -In void context it just resets the iterator, which is faster -for tied hashes. +In void context, the keys() function just resets the iterator, which is +faster for tied hashes than would be iterating through the whole +hash, one key-value pair at a time. =head2 How do I sort a hash (optionally by value instead of key)? @@ -1467,8 +1598,8 @@ re-enter it, the hash iterator has been reset. =head2 How can I get the unique keys from two hashes? -First you extract the keys from the hashes into arrays, and then solve -the uniquifying the array problem described above. For example: +First you extract the keys from the hashes into lists, then solve +the "removing duplicates" problem described above. For example: %seen = (); for $element (keys(%foo), keys(%bar)) { @@ -1560,9 +1691,11 @@ this works fine (assuming the files are found): print "Your kernel is GNU-zip enabled!\n"; } -On some legacy systems, however, you have to play tedious games with -"text" versus "binary" files. See L<perlfunc/"binmode">, or the upcoming -L<perlopentut> manpage. +On less elegant (read: Byzantine) systems, however, you have +to play tedious games with "text" versus "binary" files. See +L<perlfunc/"binmode"> or L<perlopentut>. Most of these ancient-thinking +systems are curses out of Microsoft, who seem to be committed to putting +the backward into backward compatibility. If you're concerned about 8-bit ASCII data, then see L<perllocale>. @@ -1606,10 +1739,10 @@ if you just want to say, ``Is this a float?'' sub is_numeric { defined &getnum } -Or you could check out String::Scanf which can be found at -http://www.perl.com/CPAN/modules/by-module/String/. -The POSIX module (part of the standard Perl distribution) provides -the C<strtol> and C<strtod> for converting strings to double +Or you could check out +http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz +instead. The POSIX module (part of the standard Perl distribution) +provides the C<strtol> and C<strtod> for converting strings to double and longs, respectively. =head2 How do I keep persistent data across program calls? @@ -1663,7 +1796,7 @@ All rights reserved. When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work -may be distributed only under the terms of Perl's Artistic Licence. +may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof I<outside> of that package require that special arrangements be made with copyright holder. @@ -1673,4 +1806,3 @@ are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required. - |