diff options
author | Jarkko Hietaniemi <jhi@iki.fi> | 2001-06-24 14:43:36 +0000 |
---|---|---|
committer | Jarkko Hietaniemi <jhi@iki.fi> | 2001-06-24 14:43:36 +0000 |
commit | b61a233108eaaddabed65aef1894e43894311515 (patch) | |
tree | 0232aeabbcb9582b394fb1ad645aed59c95ee018 /lib/Memoize/README | |
parent | 30beff698f5df6e3004d9892f67ed1de7f6d123c (diff) | |
download | perl-b61a233108eaaddabed65aef1894e43894311515.tar.gz |
Upgrade to Memoize 0.65.
p4raw-id: //depot/perl@10894
Diffstat (limited to 'lib/Memoize/README')
-rw-r--r-- | lib/Memoize/README | 704 |
1 files changed, 27 insertions, 677 deletions
diff --git a/lib/Memoize/README b/lib/Memoize/README index 60f9b8387b..011c4bfed1 100644 --- a/lib/Memoize/README +++ b/lib/Memoize/README @@ -1,7 +1,7 @@ Name: Memoize What: Transparently speed up functions by caching return values. -Version: 0.51 +Version: 0.65 Author: Mark-Jason Dominus (mjd-perl-memoize+@plover.com) ################################################################ @@ -25,690 +25,40 @@ If not, please send me a report that mentions which tests failed. The address is: mjd-perl-memoize+@plover.com. ################################################################ -What's new since 0.49: +What's new since 0.62: -Just a maintenance release. I made the tests a little more robust, -and I included the Memoization article that I forgot to put into 0.48. -################################################################ -What's new since 0.48: - -You can now expire data from the memoization cache according to any -expiration policy you desire. A sample policy is provided in the -Memoize::Expire module. It supports expiration of items that have -been in the cache a certain number of seconds and items that have been -accessed a certain number of times. When you call a memoized -function, and Memoize discovers that a cache item has expired, it -calls the real function and stores the result in the cache, just as if -the data had not been in the cache in the first place. - -Many people asked for a cache expiration feature, and some people even -sent patches. Thanks for the patches! But I did not accept them, -because they all added the expiration stuff into the module, and I was -sure that this was a bad way to do it. Everyone had a different idea -of what useful expiration behavior was, so I foresaw an endless series -of creeeping features and an expiration mechansim that got more and -more and more complicated and slower and slower and slower. - -The new expiration policy mechanism makes use of the TIE feature. You -write a cache policy module ( which might be very simple) and use the -TIE feature to insert it between memoize and the real cache. The -Memoize::Expire module. included in this package, is a useful example -of this that might satisfy many people. The documentation for that -module includes an even simpler module for those who would like to -implement their own expiration policies. - -Big win: If you don't use the expiration feature, you don't pay for -it. Memoize 0.49 with expiration turned off runs *exactly* as fast as -Memoize 0.48 did. Not one line of code has been changed. - -Moral of the story: Sometimes, there is a Right Way to Do Things that -really is better than the obvious way. It might not be obvious at -first, and sometimes you have to make people wait for features so that -the Right Way to Do Things can make itself known. - -Many thanks to Mike Cariaso for helping me figure out The Right Way to -Do Things. - -Also: If you try to use ODBM_File, NDBM_File, SDBM_File, GDBM_File, or -DB_File for the LIST_CACHE, you get an error right away, because those -kinds of files will only store strings. Thanks to Jonathan Roy for -suggesting this. If you want to store list values in a persistent -cache, try Memoize::Storable. - -################################################################ - -What's new since 0.46: - -Caching of function return values into NDBM files is now supported. -You can cache function return values into Memoize::AnyDBM files, which -is a pseudo-module that selects the `best' available DBM -implementation. - -Bug fix: Prototyped functions are now memoized correctly; memoizing -used to remove the prototype and issue a warning. Also new tests for -this feature. (Thanks Alex Dudkevich) - -New test suites for SDBM and NDBM caching and prototyped functions. -Various small fixes in the test suite. -Various documentation enhancements and fixes. - -################################################################ - -What's new since 0.45: - -Now has an interface to `Storable'. This wasn't formerly possible, -because the base package can only store caches via modules that -present a tied hash interface, and `Storable' doesn't. Solution: -Memoize::Storable is a tied hash interface to `Storable'. - -################################################################ - -What's new since 0.06: - -Storage of cached function return values in a static file is now -tentatively supported. `memoize' now accepts new options SCALAR_CACHE -and LIST_CACHE to specify the destination and protocol for saving -cached values to disk. - -Consider these features alpha, and please report bugs to -mjd-perl-memoize@plover.com. The beta version is awaiting a more -complete test suite. - -Much new documentation to support all this. - -################################################################ - -What's new since 0.05: - -Calling syntax is now - - memoize(function, OPTION1 => VALUE1, ...) - -instead of - - memoize(function, { OPTION1 => VALUE1, ... }) - - -Functions that return lists can now be memoized. - -New tests for list-returning functions and their normalizers. - -Various documentation changes. - -Return value from `unmemoize' is now the resulting unmemoized -function, instead of the constant `1'. It was already docmuented to -do so. - -################################################################ - - -=head1 NAME - -Memoize - Make your functions faster by trading space for time - -=head1 SYNOPSIS - - use Memoize; - memoize('slow_function'); - slow_function(arguments); # Is faster than it was before - - -This is normally all you need to know. However, many options are available: - - memoize(function, options...); - -Options include: - - NORMALIZER => function - INSTALL => new_name - - SCALAR_CACHE => 'MEMORY' - SCALAR_CACHE => ['TIE', Module, arguments...] - SCALAR_CACHE => 'FAULT' - SCALAR_CACHE => 'MERGE' - - LIST_CACHE => 'MEMORY' - LIST_CACHE => ['TIE', Module, arguments...] - LIST_CACHE => 'FAULT' - LIST_CACHE => 'MERGE' - - -=head1 DESCRIPTION - -`Memoizing' a function makes it faster by trading space for time. It -does this by caching the return values of the function in a table. -If you call the function again with the same arguments, C<memoize> -jmups in and gives you the value out of the table, instead of letting -the function compute the value all over again. - -Here is an extreme example. Consider the Fibonacci sequence, defined -by the following function: - - # Compute Fibonacci numbers - sub fib { - my $n = shift; - return $n if $n < 2; - fib($n-1) + fib($n-2); - } - -This function is very slow. Why? To compute fib(14), it first wants -to compute fib(13) and fib(12), and add the results. But to compute -fib(13), it first has to compute fib(12) and fib(11), and then it -comes back and computes fib(12) all over again even though the answer -is the same. And both of the times that it wants to compute fib(12), -it has to compute fib(11) from scratch, and then it has to do it -again each time it wants to compute fib(13). This function does so -much recomputing of old results that it takes a really long time to -run---fib(14) makes 1,200 extra recursive calls to itself, to compute -and recompute things that it already computed. - -This function is a good candidate for memoization. If you memoize the -`fib' function above, it will compute fib(14) exactly once, the first -time it needs to, and then save the result in a table. Then if you -ask for fib(14) again, it gives you the result out of the table. -While computing fib(14), instead of computing fib(12) twice, it does -it once; the second time it needs the value it gets it from the table. -It doesn't compute fib(11) four times; it computes it once, getting it -from the table the next three times. Instead of making 1,200 -recursive calls to `fib', it makes 15. This makes the function about -150 times faster. - -You could do the memoization yourself, by rewriting the function, like -this: - - # Compute Fibonacci numbers, memoized version - { my @fib; - sub fib { - my $n = shift; - return $fib[$n] if defined $fib[$n]; - return $fib[$n] = $n if $n < 2; - $fib[$n] = fib($n-1) + fib($n-2); - } - } - -Or you could use this module, like this: - - use Memoize; - memoize('fib'); - - # Rest of the fib function just like the original version. - -This makes it easy to turn memoizing on and off. - -Here's an even simpler example: I wrote a simple ray tracer; the -program would look in a certain direction, figure out what it was -looking at, and then convert the `color' value (typically a string -like `red') of that object to a red, green, and blue pixel value, like -this: - - for ($direction = 0; $direction < 300; $direction++) { - # Figure out which object is in direction $direction - $color = $object->{color}; - ($r, $g, $b) = @{&ColorToRGB($color)}; - ... - } - -Since there are relatively few objects in a picture, there are only a -few colors, which get looked up over and over again. Memoizing -C<ColorToRGB> speeded up the program by several percent. - -=head1 DETAILS - -This module exports exactly one function, C<memoize>. The rest of the -functions in this package are None of Your Business. - -You should say - - memoize(function) - -where C<function> is the name of the function you want to memoize, or -a reference to it. C<memoize> returns a reference to the new, -memoized version of the function, or C<undef> on a non-fatal error. -At present, there are no non-fatal errors, but there might be some in -the future. - -If C<function> was the name of a function, then C<memoize> hides the -old version and installs the new memoized version under the old name, -so that C<&function(...)> actually invokes the memoized version. - -=head1 OPTIONS - -There are some optional options you can pass to C<memoize> to change -the way it behaves a little. To supply options, invoke C<memoize> -like this: - - memoize(function, NORMALIZER => function, - INSTALL => newname, - SCALAR_CACHE => option, - LIST_CACHE => option - ); - -Each of these options is optional; you can include some, all, or none -of them. - -=head2 INSTALL - -If you supply a function name with C<INSTALL>, memoize will install -the new, memoized version of the function under the name you give. -For example, - - memoize('fib', INSTALL => 'fastfib') - -installs the memoized version of C<fib> as C<fastfib>; without the -C<INSTALL> option it would have replaced the old C<fib> with the -memoized version. - -To prevent C<memoize> from installing the memoized version anywhere, use -C<INSTALL =E<gt> undef>. - -=head2 NORMALIZER - -Suppose your function looks like this: - - # Typical call: f('aha!', A => 11, B => 12); - sub f { - my $a = shift; - my %hash = @_; - $hash{B} ||= 2; # B defaults to 2 - $hash{C} ||= 7; # C defaults to 7 - - # Do something with $a, %hash - } - -Now, the following calls to your function are all completely equivalent: - - f(OUCH); - f(OUCH, B => 2); - f(OUCH, C => 7); - f(OUCH, B => 2, C => 7); - f(OUCH, C => 7, B => 2); - (etc.) - -However, unless you tell C<Memoize> that these calls are equivalent, -it will not know that, and it will compute the values for these -invocations of your function separately, and store them separately. - -To prevent this, supply a C<NORMALIZER> function that turns the -program arguments into a string in a way that equivalent arguments -turn into the same string. A C<NORMALIZER> function for C<f> above -might look like this: - - sub normalize_f { - my $a = shift; - my %hash = @_; - $hash{B} ||= 2; - $hash{C} ||= 7; - - join($;, $a, map ($_ => $hash{$_}) sort keys %hash); - } - -Each of the argument lists above comes out of the C<normalize_f> -function looking exactly the same, like this: - - OUCH^\B^\2^\C^\7 - -You would tell C<Memoize> to use this normalizer this way: - - memoize('f', NORMALIZER => 'normalize_f'); - -C<memoize> knows that if the normalized version of the arguments is -the same for two argument lists, then it can safely look up the value -that it computed for one argument list and return it as the result of -calling the function with the other argument list, even if the -argument lists look different. - -The default normalizer just concatenates the arguments with C<$;> in -between. This always works correctly for functions with only one -argument, and also when the arguments never contain C<$;> (which is -normally character #28, control-\. ) However, it can confuse certain -argument lists: - - normalizer("a\034", "b") - normalizer("a", "\034b") - normalizer("a\034\034b") - -for example. - -The calling context of the function (scalar or list context) is -propagated to the normalizer. This means that if the memoized -function will treat its arguments differently in list context than it -would in scalar context, you can have the normalizer function select -its behavior based on the results of C<wantarray>. Even if called in -a list context, a normalizer should still return a single string. - -=head2 C<SCALAR_CACHE>, C<LIST_CACHE> - -Normally, C<Memoize> caches your function's return values into an -ordinary Perl hash variable. However, you might like to have the -values cached on the disk, so that they persist from one run of your -program to the next, or you might like to associate some other -interesting semantics with the cached values. - -There's a slight complication under the hood of C<Memoize>: There are -actually I<two> caches, one for scalar values and one for list values. -When your function is called in scalar context, its return value is -cached in one hash, and when your function is called in list context, -its value is cached in the other hash. You can control the caching -behavior of both contexts independently with these options. - -The argument to C<LIST_CACHE> or C<SCALAR_CACHE> must either be one of -the following four strings: - - MEMORY - TIE - FAULT - MERGE - -or else it must be a reference to a list whose first element is one of -these four strings, such as C<[TIE, arguments...]>. - -=over 4 - -=item C<MEMORY> - -C<MEMORY> means that return values from the function will be cached in -an ordinary Perl hash variable. The hash variable will not persist -after the program exits. This is the default. - -=item C<TIE> - -C<TIE> means that the function's return values will be cached in a -tied hash. A tied hash can have any semantics at all. It is -typically tied to an on-disk database, so that cached values are -stored in the database and retrieved from it again when needed, and -the disk file typically persists after your pogram has exited. - -If C<TIE> is specified as the first element of a list, the remaining -list elements are taken as arguments to the C<tie> call that sets up -the tied hash. For example, - - SCALAR_CACHE => [TIE, DB_File, $filename, O_RDWR | O_CREAT, 0666] - -says to tie the hash into the C<DB_File> package, and to pass the -C<$filename>, C<O_RDWR | O_CREAT>, and C<0666> arguments to the C<tie> -call. This has the effect of storing the cache in a C<DB_File> -database whose name is in C<$filename>. - -Other typical uses of C<TIE>: - - LIST_CACHE => [TIE, GDBM_File, $filename, O_RDWR | O_CREAT, 0666] - SCALAR_CACHE => [TIE, MLDBM, DB_File, $filename, O_RDWR|O_CREAT, 0666] - LIST_CACHE => [TIE, My_Package, $tablename, $key_field, $val_field] - -This last might tie the cache hash to a package that you wrote -yourself that stores the cache in a SQL-accessible database. -A useful use of this feature: You can construct a batch program that -runs in the background and populates the memo table, and then when you -come to run your real program the memoized function will be -screamingly fast because all its results have been precomputed. - -=item C<FAULT> - -C<FAULT> means that you never expect to call the function in scalar -(or list) context, and that if C<Memoize> detects such a call, it -should abort the program. The error message is one of - - `foo' function called in forbidden list context at line ... - `foo' function called in forbidden scalar context at line ... - -=item C<MERGE> - -C<MERGE> normally means the function does not distinguish between list -and sclar context, and that return values in both contexts should be -stored together. C<LIST_CACHE =E<gt> MERGE> means that list context -return values should be stored in the same hash that is used for -scalar context returns, and C<SCALAR_CACHE =E<gt> MERGE> means the -same, mutatis mutandis. It is an error to specify C<MERGE> for both, -but it probably does something useful. - -Consider this function: - - sub pi { 3; } - -Normally, the following code will result in two calls to C<pi>: - - $x = pi(); - ($y) = pi(); - $z = pi(); - -The first call caches the value C<3> in the scalar cache; the second -caches the list C<(3)> in the list cache. The third call doesn't call -the real C<pi> function; it gets the value from the scalar cache. - -Obviously, the second call to C<pi> is a waste of time, and storing -its return value is a waste of space. Specifying C<LIST_CACHE -=E<gt> MERGE> will make C<memoize> use the same cache for scalar and -list context return values, so that the second call uses the scalar -cache that was populated by the first call. C<pi> ends up being -cvalled only once, and both subsequent calls return C<3> from the -cache, regardless of the calling context. - -Another use for C<MERGE> is when you want both kinds of return values -stored in the same disk file; this saves you from having to deal with -two disk files instead of one. You can use a normalizer function to -keep the two sets of return values separate. For example: - - memoize 'myfunc', - NORMALIZER => 'n', - SCALAR_CACHE => [TIE, MLDBM, DB_File, $filename, ...], - LIST_CACHE => MERGE, - ; - - sub n { - my $context = wantarray() ? 'L' : 'S'; - # ... now compute the hash key from the arguments ... - $hashkey = "$context:$hashkey"; - } - -This normalizer function will store scalar context return values in -the disk file under keys that begin with C<S:>, and list context -return values under keys that begin with C<L:>. - -=back - -=head1 OTHER FUNCTION - -There's an C<unmemoize> function that you can import if you want to. -Why would you want to? Here's an example: Suppose you have your cache -tied to a DBM file, and you want to make sure that the cache is -written out to disk if someone interrupts the program. If the program -exits normally, this will happen anyway, but if someone types -control-C or something then the program will terminate immediately -without syncronizing the database. So what you can do instead is - - $SIG{INT} = sub { unmemoize 'function' }; - - -Thanks to Jonathan Roy for discovering a use for C<unmemoize>. - -C<unmemoize> accepts a reference to, or the name of a previously -memoized function, and undoes whatever it did to provide the memoized -version in the first place, including making the name refer to the -unmemoized version if appropriate. It returns a reference to the -unmemoized version of the function. - -If you ask it to unmemoize a function that was never memoized, it -croaks. - -=head1 CAVEATS - -Memoization is not a cure-all: - -=over 4 - -=item * - -Do not memoize a function whose behavior depends on program -state other than its own arguments, such as global variables, the time -of day, or file input. These functions will not produce correct -results when memoized. For a particularly easy example: - - sub f { - time; - } - -This function takes no arguments, and as far as C<Memoize> is -concerned, it always returns the same result. C<Memoize> is wrong, of -course, and the memoized version of this function will call C<time> once -to get the current time, and it will return that same time -every time you call it after that. - -=item * - -Do not memoize a function with side effects. - - sub f { - my ($a, $b) = @_; - my $s = $a + $b; - print "$a + $b = $s.\n"; - } - -This function accepts two arguments, adds them, and prints their sum. -Its return value is the numuber of characters it printed, but you -probably didn't care about that. But C<Memoize> doesn't understand -that. If you memoize this function, you will get the result you -expect the first time you ask it to print the sum of 2 and 3, but -subsequent calls will return the number 11 (the return value of -C<print>) without actually printing anything. - -=item * - -Do not memoize a function that returns a data structure that is -modified by its caller. - -Consider these functions: C<getusers> returns a list of users somehow, -and then C<main> throws away the first user on the list and prints the -rest: - - sub main { - my $userlist = getusers(); - shift @$userlist; - foreach $u (@$userlist) { - print "User $u\n"; - } - } - - sub getusers { - my @users; - # Do something to get a list of users; - \@users; # Return reference to list. - } - -If you memoize C<getusers> here, it will work right exactly once. The -reference to the users list will be stored in the memo table. C<main> -will discard the first element from the referenced list. The next -time you invoke C<main>, C<Memoize> will not call C<getusers>; it will -just return the same reference to the same list it got last time. But -this time the list has already had its head removed; C<main> will -erroneously remove another element from it. The list will get shorter -and shorter every time you call C<main>. - - -=back - -=head1 PERSISTENT CACHE SUPPORT - -You can tie the cache tables to any sort of tied hash that you want -to, as long as it supports C<TIEHASH>, C<FETCH>, C<STORE>, and -C<EXISTS>. For example, - - memoize 'function', SCALAR_CACHE => - [TIE, GDBM_File, $filename, O_RDWR|O_CREAT, 0666]; - -works just fine. For some storage methods, you need a little glue. - -C<SDBM_File> doesn't supply an C<EXISTS> method, so included in this -package is a glue module called C<Memoize::SDBM_File> which does -provide one. Use this instead of plain C<SDBM_File> to store your -cache table on disk in an C<SDBM_File> database: - - memoize 'function', - SCALAR_CACHE => - [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666]; - -C<NDBM_File> has the same problem and the same solution. - -C<Storable> isn't a tied hash class at all. You can use it to store a -hash to disk and retrieve it again, but you can't modify the hash while -it's on the disk. So if you want to store your cache table in a -C<Storable> database, use C<Memoize::Storable>, which puts a hashlike -front-end onto C<Storable>. The hash table is actually kept in -memory, and is loaded from your C<Storable> file at the time you -memoize the function, and stored back at the time you unmemoize the -function (or when your program exits): - - memoize 'function', - SCALAR_CACHE => [TIE, Memoize::Storable, $filename]; - - memoize 'function', - SCALAR_CACHE => [TIE, Memoize::Storable, $filename, 'nstore']; - -Include the `nstore' option to have the C<Storable> database written -in `network order'. (See L<Storable> for more details about this.) - -=head1 EXPIRATION SUPPORT - -See Memoize::Expire, which is a plug-in module that adds expiration -functionality to Memoize. If you don't like the kinds of policies -that Memoize::Expire implements, it is easy to write your own plug-in -module to implement whatever policy you desire. - -=head1 MY BUGS - -Needs a better test suite, especially for the tied and expiration stuff. - -Also, there is some problem with the way C<goto &f> works under -threaded Perl, because of the lexical scoping of C<@_>. This is a bug -in Perl, and until it is resolved, Memoize won't work with these -Perls. To fix it, you need to chop the source code a little. Find -the comment in the source code that says C<--- THREADED PERL -COMMENT---> and comment out the active line and uncomment the -commented one. Then try it again. - -I wish I could investigate this threaded Perl problem. If someone -could lend me an account on a machine with threaded Perl for a few -hours, it would be very helpful. - -That is why the version number is 0.49 instead of 1.00. - -=head1 MAILING LIST + N O T I C E ! -To join a very low-traffic mailing list for announcements about -C<Memoize>, send an empty note to C<mjd-perl-memoize-request@plover.com>. + **************************************************************** + ** ** + ** The TIE option is now strongly deprecated. It will be ** + ** permanently removed in the NEXT release of Memoize. ** + ** Please convert all extant software to use HASH instead. ** + ** ** + ** See the manual for details. ** + ** ** + **************************************************************** -=head1 AUTHOR +I'm sorry about this. I hate making incompatible changes. But as of +v0.65, Memoize is included in the Perl core. It is about to become +much more difficult to make incompatible interface changes; if I don't +get rid of TIE now, I may not get another chance. -Mark-Jason Dominus (C<mjd-perl-memoize+@plover.com>), Plover Systems co. +TIE presented serious problems. First, it had a bizarre syntax. But +the big problem was that it was difficult and complicated for +expiration manager authors to support; evern expiration manager had to +duplicate the logic for handling TIE. HASH is much simpler to use, +more powerful, and is trivial for expiration managers to support. -See the C<Memoize.pm> Page at http://www.plover.com/~mjd/perl/Memoize/ -for news and upgrades. Near this page, at -http://www.plover.com/~mjd/perl/MiniMemoize/ there is an article about -memoization and about the internals of Memoize that appeared in The -Perl Journal, issue #13. (This article is also included in the -Memoize distribution as `article.html'.) +Many long-awaited cleanups and bug fixes. -To join a mailing list for announcements about C<Memoize>, send an -empty message to C<mjd-perl-memoize-request@plover.com>. This mailing -list is for announcements only and has extremely low traffic---about -four messages per year. +Memoize now works under threaded perl -=head1 THANK YOU +Slow tests speeded up. More test file improvements. -Many thanks to Jonathan Roy for bug reports and suggestions, to -Michael Schwern for other bug reports and patches, to Mike Cariaso for -helping me to figure out the Right Thing to Do About Expiration, to -Joshua Gerth, Joshua Chamas, Jonathan Roy, Mark D. Anderson, and -Andrew Johnson for more suggestions about expiration, to Ariel -Scolnikov for delightful messages about the Fibonacci function, to -Dion Almaer for thought-provoking suggestions about the default -normalizer, to Walt Mankowski and Kurt Starsinic for much help -investigating problems under threaded Perl, to Alex Dudkevich for -reporting the bug in prototyped functions and for checking my patch, -to Tony Bass for many helpful suggestions, to Philippe Verdret for -enlightening discussion of Hook::PrePostCall, to Nat Torkington for -advice I ignored, to Chris Nandor for portability advice, and to Jenda -Krynicky for being a light in the world. +Long-standing LIST_CACHE bug cleared up---it turns out that there +never was a bug. I put in tests for it anyway. -=cut +Manual increased. |