summaryrefslogtreecommitdiff
path: root/lib/Memoize/README
diff options
context:
space:
mode:
Diffstat (limited to 'lib/Memoize/README')
-rw-r--r--lib/Memoize/README714
1 files changed, 714 insertions, 0 deletions
diff --git a/lib/Memoize/README b/lib/Memoize/README
new file mode 100644
index 0000000000..60f9b8387b
--- /dev/null
+++ b/lib/Memoize/README
@@ -0,0 +1,714 @@
+
+Name: Memoize
+What: Transparently speed up functions by caching return values.
+Version: 0.51
+Author: Mark-Jason Dominus (mjd-perl-memoize+@plover.com)
+
+################################################################
+
+How to build me:
+
+ perl Makefile.PL
+ make
+ make test
+
+There's a very small chance that the tests in speed.t and
+expire_module_t.t might fail because of clock skew or bizarre system
+load conditions. If the tests there fail, rerun them and see if the
+problem persists.
+
+If the tests work,
+
+ make install
+
+If not, please send me a report that mentions which tests failed.
+The address is: mjd-perl-memoize+@plover.com.
+
+################################################################
+What's new since 0.49:
+
+Just a maintenance release. I made the tests a little more robust,
+and I included the Memoization article that I forgot to put into 0.48.
+
+################################################################
+What's new since 0.48:
+
+You can now expire data from the memoization cache according to any
+expiration policy you desire. A sample policy is provided in the
+Memoize::Expire module. It supports expiration of items that have
+been in the cache a certain number of seconds and items that have been
+accessed a certain number of times. When you call a memoized
+function, and Memoize discovers that a cache item has expired, it
+calls the real function and stores the result in the cache, just as if
+the data had not been in the cache in the first place.
+
+Many people asked for a cache expiration feature, and some people even
+sent patches. Thanks for the patches! But I did not accept them,
+because they all added the expiration stuff into the module, and I was
+sure that this was a bad way to do it. Everyone had a different idea
+of what useful expiration behavior was, so I foresaw an endless series
+of creeeping features and an expiration mechansim that got more and
+more and more complicated and slower and slower and slower.
+
+The new expiration policy mechanism makes use of the TIE feature. You
+write a cache policy module ( which might be very simple) and use the
+TIE feature to insert it between memoize and the real cache. The
+Memoize::Expire module. included in this package, is a useful example
+of this that might satisfy many people. The documentation for that
+module includes an even simpler module for those who would like to
+implement their own expiration policies.
+
+Big win: If you don't use the expiration feature, you don't pay for
+it. Memoize 0.49 with expiration turned off runs *exactly* as fast as
+Memoize 0.48 did. Not one line of code has been changed.
+
+Moral of the story: Sometimes, there is a Right Way to Do Things that
+really is better than the obvious way. It might not be obvious at
+first, and sometimes you have to make people wait for features so that
+the Right Way to Do Things can make itself known.
+
+Many thanks to Mike Cariaso for helping me figure out The Right Way to
+Do Things.
+
+Also: If you try to use ODBM_File, NDBM_File, SDBM_File, GDBM_File, or
+DB_File for the LIST_CACHE, you get an error right away, because those
+kinds of files will only store strings. Thanks to Jonathan Roy for
+suggesting this. If you want to store list values in a persistent
+cache, try Memoize::Storable.
+
+################################################################
+
+What's new since 0.46:
+
+Caching of function return values into NDBM files is now supported.
+You can cache function return values into Memoize::AnyDBM files, which
+is a pseudo-module that selects the `best' available DBM
+implementation.
+
+Bug fix: Prototyped functions are now memoized correctly; memoizing
+used to remove the prototype and issue a warning. Also new tests for
+this feature. (Thanks Alex Dudkevich)
+
+New test suites for SDBM and NDBM caching and prototyped functions.
+Various small fixes in the test suite.
+Various documentation enhancements and fixes.
+
+################################################################
+
+What's new since 0.45:
+
+Now has an interface to `Storable'. This wasn't formerly possible,
+because the base package can only store caches via modules that
+present a tied hash interface, and `Storable' doesn't. Solution:
+Memoize::Storable is a tied hash interface to `Storable'.
+
+################################################################
+
+What's new since 0.06:
+
+Storage of cached function return values in a static file is now
+tentatively supported. `memoize' now accepts new options SCALAR_CACHE
+and LIST_CACHE to specify the destination and protocol for saving
+cached values to disk.
+
+Consider these features alpha, and please report bugs to
+mjd-perl-memoize@plover.com. The beta version is awaiting a more
+complete test suite.
+
+Much new documentation to support all this.
+
+################################################################
+
+What's new since 0.05:
+
+Calling syntax is now
+
+ memoize(function, OPTION1 => VALUE1, ...)
+
+instead of
+
+ memoize(function, { OPTION1 => VALUE1, ... })
+
+
+Functions that return lists can now be memoized.
+
+New tests for list-returning functions and their normalizers.
+
+Various documentation changes.
+
+Return value from `unmemoize' is now the resulting unmemoized
+function, instead of the constant `1'. It was already docmuented to
+do so.
+
+################################################################
+
+
+=head1 NAME
+
+Memoize - Make your functions faster by trading space for time
+
+=head1 SYNOPSIS
+
+ use Memoize;
+ memoize('slow_function');
+ slow_function(arguments); # Is faster than it was before
+
+
+This is normally all you need to know. However, many options are available:
+
+ memoize(function, options...);
+
+Options include:
+
+ NORMALIZER => function
+ INSTALL => new_name
+
+ SCALAR_CACHE => 'MEMORY'
+ SCALAR_CACHE => ['TIE', Module, arguments...]
+ SCALAR_CACHE => 'FAULT'
+ SCALAR_CACHE => 'MERGE'
+
+ LIST_CACHE => 'MEMORY'
+ LIST_CACHE => ['TIE', Module, arguments...]
+ LIST_CACHE => 'FAULT'
+ LIST_CACHE => 'MERGE'
+
+
+=head1 DESCRIPTION
+
+`Memoizing' a function makes it faster by trading space for time. It
+does this by caching the return values of the function in a table.
+If you call the function again with the same arguments, C<memoize>
+jmups in and gives you the value out of the table, instead of letting
+the function compute the value all over again.
+
+Here is an extreme example. Consider the Fibonacci sequence, defined
+by the following function:
+
+ # Compute Fibonacci numbers
+ sub fib {
+ my $n = shift;
+ return $n if $n < 2;
+ fib($n-1) + fib($n-2);
+ }
+
+This function is very slow. Why? To compute fib(14), it first wants
+to compute fib(13) and fib(12), and add the results. But to compute
+fib(13), it first has to compute fib(12) and fib(11), and then it
+comes back and computes fib(12) all over again even though the answer
+is the same. And both of the times that it wants to compute fib(12),
+it has to compute fib(11) from scratch, and then it has to do it
+again each time it wants to compute fib(13). This function does so
+much recomputing of old results that it takes a really long time to
+run---fib(14) makes 1,200 extra recursive calls to itself, to compute
+and recompute things that it already computed.
+
+This function is a good candidate for memoization. If you memoize the
+`fib' function above, it will compute fib(14) exactly once, the first
+time it needs to, and then save the result in a table. Then if you
+ask for fib(14) again, it gives you the result out of the table.
+While computing fib(14), instead of computing fib(12) twice, it does
+it once; the second time it needs the value it gets it from the table.
+It doesn't compute fib(11) four times; it computes it once, getting it
+from the table the next three times. Instead of making 1,200
+recursive calls to `fib', it makes 15. This makes the function about
+150 times faster.
+
+You could do the memoization yourself, by rewriting the function, like
+this:
+
+ # Compute Fibonacci numbers, memoized version
+ { my @fib;
+ sub fib {
+ my $n = shift;
+ return $fib[$n] if defined $fib[$n];
+ return $fib[$n] = $n if $n < 2;
+ $fib[$n] = fib($n-1) + fib($n-2);
+ }
+ }
+
+Or you could use this module, like this:
+
+ use Memoize;
+ memoize('fib');
+
+ # Rest of the fib function just like the original version.
+
+This makes it easy to turn memoizing on and off.
+
+Here's an even simpler example: I wrote a simple ray tracer; the
+program would look in a certain direction, figure out what it was
+looking at, and then convert the `color' value (typically a string
+like `red') of that object to a red, green, and blue pixel value, like
+this:
+
+ for ($direction = 0; $direction < 300; $direction++) {
+ # Figure out which object is in direction $direction
+ $color = $object->{color};
+ ($r, $g, $b) = @{&ColorToRGB($color)};
+ ...
+ }
+
+Since there are relatively few objects in a picture, there are only a
+few colors, which get looked up over and over again. Memoizing
+C<ColorToRGB> speeded up the program by several percent.
+
+=head1 DETAILS
+
+This module exports exactly one function, C<memoize>. The rest of the
+functions in this package are None of Your Business.
+
+You should say
+
+ memoize(function)
+
+where C<function> is the name of the function you want to memoize, or
+a reference to it. C<memoize> returns a reference to the new,
+memoized version of the function, or C<undef> on a non-fatal error.
+At present, there are no non-fatal errors, but there might be some in
+the future.
+
+If C<function> was the name of a function, then C<memoize> hides the
+old version and installs the new memoized version under the old name,
+so that C<&function(...)> actually invokes the memoized version.
+
+=head1 OPTIONS
+
+There are some optional options you can pass to C<memoize> to change
+the way it behaves a little. To supply options, invoke C<memoize>
+like this:
+
+ memoize(function, NORMALIZER => function,
+ INSTALL => newname,
+ SCALAR_CACHE => option,
+ LIST_CACHE => option
+ );
+
+Each of these options is optional; you can include some, all, or none
+of them.
+
+=head2 INSTALL
+
+If you supply a function name with C<INSTALL>, memoize will install
+the new, memoized version of the function under the name you give.
+For example,
+
+ memoize('fib', INSTALL => 'fastfib')
+
+installs the memoized version of C<fib> as C<fastfib>; without the
+C<INSTALL> option it would have replaced the old C<fib> with the
+memoized version.
+
+To prevent C<memoize> from installing the memoized version anywhere, use
+C<INSTALL =E<gt> undef>.
+
+=head2 NORMALIZER
+
+Suppose your function looks like this:
+
+ # Typical call: f('aha!', A => 11, B => 12);
+ sub f {
+ my $a = shift;
+ my %hash = @_;
+ $hash{B} ||= 2; # B defaults to 2
+ $hash{C} ||= 7; # C defaults to 7
+
+ # Do something with $a, %hash
+ }
+
+Now, the following calls to your function are all completely equivalent:
+
+ f(OUCH);
+ f(OUCH, B => 2);
+ f(OUCH, C => 7);
+ f(OUCH, B => 2, C => 7);
+ f(OUCH, C => 7, B => 2);
+ (etc.)
+
+However, unless you tell C<Memoize> that these calls are equivalent,
+it will not know that, and it will compute the values for these
+invocations of your function separately, and store them separately.
+
+To prevent this, supply a C<NORMALIZER> function that turns the
+program arguments into a string in a way that equivalent arguments
+turn into the same string. A C<NORMALIZER> function for C<f> above
+might look like this:
+
+ sub normalize_f {
+ my $a = shift;
+ my %hash = @_;
+ $hash{B} ||= 2;
+ $hash{C} ||= 7;
+
+ join($;, $a, map ($_ => $hash{$_}) sort keys %hash);
+ }
+
+Each of the argument lists above comes out of the C<normalize_f>
+function looking exactly the same, like this:
+
+ OUCH^\B^\2^\C^\7
+
+You would tell C<Memoize> to use this normalizer this way:
+
+ memoize('f', NORMALIZER => 'normalize_f');
+
+C<memoize> knows that if the normalized version of the arguments is
+the same for two argument lists, then it can safely look up the value
+that it computed for one argument list and return it as the result of
+calling the function with the other argument list, even if the
+argument lists look different.
+
+The default normalizer just concatenates the arguments with C<$;> in
+between. This always works correctly for functions with only one
+argument, and also when the arguments never contain C<$;> (which is
+normally character #28, control-\. ) However, it can confuse certain
+argument lists:
+
+ normalizer("a\034", "b")
+ normalizer("a", "\034b")
+ normalizer("a\034\034b")
+
+for example.
+
+The calling context of the function (scalar or list context) is
+propagated to the normalizer. This means that if the memoized
+function will treat its arguments differently in list context than it
+would in scalar context, you can have the normalizer function select
+its behavior based on the results of C<wantarray>. Even if called in
+a list context, a normalizer should still return a single string.
+
+=head2 C<SCALAR_CACHE>, C<LIST_CACHE>
+
+Normally, C<Memoize> caches your function's return values into an
+ordinary Perl hash variable. However, you might like to have the
+values cached on the disk, so that they persist from one run of your
+program to the next, or you might like to associate some other
+interesting semantics with the cached values.
+
+There's a slight complication under the hood of C<Memoize>: There are
+actually I<two> caches, one for scalar values and one for list values.
+When your function is called in scalar context, its return value is
+cached in one hash, and when your function is called in list context,
+its value is cached in the other hash. You can control the caching
+behavior of both contexts independently with these options.
+
+The argument to C<LIST_CACHE> or C<SCALAR_CACHE> must either be one of
+the following four strings:
+
+ MEMORY
+ TIE
+ FAULT
+ MERGE
+
+or else it must be a reference to a list whose first element is one of
+these four strings, such as C<[TIE, arguments...]>.
+
+=over 4
+
+=item C<MEMORY>
+
+C<MEMORY> means that return values from the function will be cached in
+an ordinary Perl hash variable. The hash variable will not persist
+after the program exits. This is the default.
+
+=item C<TIE>
+
+C<TIE> means that the function's return values will be cached in a
+tied hash. A tied hash can have any semantics at all. It is
+typically tied to an on-disk database, so that cached values are
+stored in the database and retrieved from it again when needed, and
+the disk file typically persists after your pogram has exited.
+
+If C<TIE> is specified as the first element of a list, the remaining
+list elements are taken as arguments to the C<tie> call that sets up
+the tied hash. For example,
+
+ SCALAR_CACHE => [TIE, DB_File, $filename, O_RDWR | O_CREAT, 0666]
+
+says to tie the hash into the C<DB_File> package, and to pass the
+C<$filename>, C<O_RDWR | O_CREAT>, and C<0666> arguments to the C<tie>
+call. This has the effect of storing the cache in a C<DB_File>
+database whose name is in C<$filename>.
+
+Other typical uses of C<TIE>:
+
+ LIST_CACHE => [TIE, GDBM_File, $filename, O_RDWR | O_CREAT, 0666]
+ SCALAR_CACHE => [TIE, MLDBM, DB_File, $filename, O_RDWR|O_CREAT, 0666]
+ LIST_CACHE => [TIE, My_Package, $tablename, $key_field, $val_field]
+
+This last might tie the cache hash to a package that you wrote
+yourself that stores the cache in a SQL-accessible database.
+A useful use of this feature: You can construct a batch program that
+runs in the background and populates the memo table, and then when you
+come to run your real program the memoized function will be
+screamingly fast because all its results have been precomputed.
+
+=item C<FAULT>
+
+C<FAULT> means that you never expect to call the function in scalar
+(or list) context, and that if C<Memoize> detects such a call, it
+should abort the program. The error message is one of
+
+ `foo' function called in forbidden list context at line ...
+ `foo' function called in forbidden scalar context at line ...
+
+=item C<MERGE>
+
+C<MERGE> normally means the function does not distinguish between list
+and sclar context, and that return values in both contexts should be
+stored together. C<LIST_CACHE =E<gt> MERGE> means that list context
+return values should be stored in the same hash that is used for
+scalar context returns, and C<SCALAR_CACHE =E<gt> MERGE> means the
+same, mutatis mutandis. It is an error to specify C<MERGE> for both,
+but it probably does something useful.
+
+Consider this function:
+
+ sub pi { 3; }
+
+Normally, the following code will result in two calls to C<pi>:
+
+ $x = pi();
+ ($y) = pi();
+ $z = pi();
+
+The first call caches the value C<3> in the scalar cache; the second
+caches the list C<(3)> in the list cache. The third call doesn't call
+the real C<pi> function; it gets the value from the scalar cache.
+
+Obviously, the second call to C<pi> is a waste of time, and storing
+its return value is a waste of space. Specifying C<LIST_CACHE
+=E<gt> MERGE> will make C<memoize> use the same cache for scalar and
+list context return values, so that the second call uses the scalar
+cache that was populated by the first call. C<pi> ends up being
+cvalled only once, and both subsequent calls return C<3> from the
+cache, regardless of the calling context.
+
+Another use for C<MERGE> is when you want both kinds of return values
+stored in the same disk file; this saves you from having to deal with
+two disk files instead of one. You can use a normalizer function to
+keep the two sets of return values separate. For example:
+
+ memoize 'myfunc',
+ NORMALIZER => 'n',
+ SCALAR_CACHE => [TIE, MLDBM, DB_File, $filename, ...],
+ LIST_CACHE => MERGE,
+ ;
+
+ sub n {
+ my $context = wantarray() ? 'L' : 'S';
+ # ... now compute the hash key from the arguments ...
+ $hashkey = "$context:$hashkey";
+ }
+
+This normalizer function will store scalar context return values in
+the disk file under keys that begin with C<S:>, and list context
+return values under keys that begin with C<L:>.
+
+=back
+
+=head1 OTHER FUNCTION
+
+There's an C<unmemoize> function that you can import if you want to.
+Why would you want to? Here's an example: Suppose you have your cache
+tied to a DBM file, and you want to make sure that the cache is
+written out to disk if someone interrupts the program. If the program
+exits normally, this will happen anyway, but if someone types
+control-C or something then the program will terminate immediately
+without syncronizing the database. So what you can do instead is
+
+ $SIG{INT} = sub { unmemoize 'function' };
+
+
+Thanks to Jonathan Roy for discovering a use for C<unmemoize>.
+
+C<unmemoize> accepts a reference to, or the name of a previously
+memoized function, and undoes whatever it did to provide the memoized
+version in the first place, including making the name refer to the
+unmemoized version if appropriate. It returns a reference to the
+unmemoized version of the function.
+
+If you ask it to unmemoize a function that was never memoized, it
+croaks.
+
+=head1 CAVEATS
+
+Memoization is not a cure-all:
+
+=over 4
+
+=item *
+
+Do not memoize a function whose behavior depends on program
+state other than its own arguments, such as global variables, the time
+of day, or file input. These functions will not produce correct
+results when memoized. For a particularly easy example:
+
+ sub f {
+ time;
+ }
+
+This function takes no arguments, and as far as C<Memoize> is
+concerned, it always returns the same result. C<Memoize> is wrong, of
+course, and the memoized version of this function will call C<time> once
+to get the current time, and it will return that same time
+every time you call it after that.
+
+=item *
+
+Do not memoize a function with side effects.
+
+ sub f {
+ my ($a, $b) = @_;
+ my $s = $a + $b;
+ print "$a + $b = $s.\n";
+ }
+
+This function accepts two arguments, adds them, and prints their sum.
+Its return value is the numuber of characters it printed, but you
+probably didn't care about that. But C<Memoize> doesn't understand
+that. If you memoize this function, you will get the result you
+expect the first time you ask it to print the sum of 2 and 3, but
+subsequent calls will return the number 11 (the return value of
+C<print>) without actually printing anything.
+
+=item *
+
+Do not memoize a function that returns a data structure that is
+modified by its caller.
+
+Consider these functions: C<getusers> returns a list of users somehow,
+and then C<main> throws away the first user on the list and prints the
+rest:
+
+ sub main {
+ my $userlist = getusers();
+ shift @$userlist;
+ foreach $u (@$userlist) {
+ print "User $u\n";
+ }
+ }
+
+ sub getusers {
+ my @users;
+ # Do something to get a list of users;
+ \@users; # Return reference to list.
+ }
+
+If you memoize C<getusers> here, it will work right exactly once. The
+reference to the users list will be stored in the memo table. C<main>
+will discard the first element from the referenced list. The next
+time you invoke C<main>, C<Memoize> will not call C<getusers>; it will
+just return the same reference to the same list it got last time. But
+this time the list has already had its head removed; C<main> will
+erroneously remove another element from it. The list will get shorter
+and shorter every time you call C<main>.
+
+
+=back
+
+=head1 PERSISTENT CACHE SUPPORT
+
+You can tie the cache tables to any sort of tied hash that you want
+to, as long as it supports C<TIEHASH>, C<FETCH>, C<STORE>, and
+C<EXISTS>. For example,
+
+ memoize 'function', SCALAR_CACHE =>
+ [TIE, GDBM_File, $filename, O_RDWR|O_CREAT, 0666];
+
+works just fine. For some storage methods, you need a little glue.
+
+C<SDBM_File> doesn't supply an C<EXISTS> method, so included in this
+package is a glue module called C<Memoize::SDBM_File> which does
+provide one. Use this instead of plain C<SDBM_File> to store your
+cache table on disk in an C<SDBM_File> database:
+
+ memoize 'function',
+ SCALAR_CACHE =>
+ [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666];
+
+C<NDBM_File> has the same problem and the same solution.
+
+C<Storable> isn't a tied hash class at all. You can use it to store a
+hash to disk and retrieve it again, but you can't modify the hash while
+it's on the disk. So if you want to store your cache table in a
+C<Storable> database, use C<Memoize::Storable>, which puts a hashlike
+front-end onto C<Storable>. The hash table is actually kept in
+memory, and is loaded from your C<Storable> file at the time you
+memoize the function, and stored back at the time you unmemoize the
+function (or when your program exits):
+
+ memoize 'function',
+ SCALAR_CACHE => [TIE, Memoize::Storable, $filename];
+
+ memoize 'function',
+ SCALAR_CACHE => [TIE, Memoize::Storable, $filename, 'nstore'];
+
+Include the `nstore' option to have the C<Storable> database written
+in `network order'. (See L<Storable> for more details about this.)
+
+=head1 EXPIRATION SUPPORT
+
+See Memoize::Expire, which is a plug-in module that adds expiration
+functionality to Memoize. If you don't like the kinds of policies
+that Memoize::Expire implements, it is easy to write your own plug-in
+module to implement whatever policy you desire.
+
+=head1 MY BUGS
+
+Needs a better test suite, especially for the tied and expiration stuff.
+
+Also, there is some problem with the way C<goto &f> works under
+threaded Perl, because of the lexical scoping of C<@_>. This is a bug
+in Perl, and until it is resolved, Memoize won't work with these
+Perls. To fix it, you need to chop the source code a little. Find
+the comment in the source code that says C<--- THREADED PERL
+COMMENT---> and comment out the active line and uncomment the
+commented one. Then try it again.
+
+I wish I could investigate this threaded Perl problem. If someone
+could lend me an account on a machine with threaded Perl for a few
+hours, it would be very helpful.
+
+That is why the version number is 0.49 instead of 1.00.
+
+=head1 MAILING LIST
+
+To join a very low-traffic mailing list for announcements about
+C<Memoize>, send an empty note to C<mjd-perl-memoize-request@plover.com>.
+
+=head1 AUTHOR
+
+Mark-Jason Dominus (C<mjd-perl-memoize+@plover.com>), Plover Systems co.
+
+See the C<Memoize.pm> Page at http://www.plover.com/~mjd/perl/Memoize/
+for news and upgrades. Near this page, at
+http://www.plover.com/~mjd/perl/MiniMemoize/ there is an article about
+memoization and about the internals of Memoize that appeared in The
+Perl Journal, issue #13. (This article is also included in the
+Memoize distribution as `article.html'.)
+
+To join a mailing list for announcements about C<Memoize>, send an
+empty message to C<mjd-perl-memoize-request@plover.com>. This mailing
+list is for announcements only and has extremely low traffic---about
+four messages per year.
+
+=head1 THANK YOU
+
+Many thanks to Jonathan Roy for bug reports and suggestions, to
+Michael Schwern for other bug reports and patches, to Mike Cariaso for
+helping me to figure out the Right Thing to Do About Expiration, to
+Joshua Gerth, Joshua Chamas, Jonathan Roy, Mark D. Anderson, and
+Andrew Johnson for more suggestions about expiration, to Ariel
+Scolnikov for delightful messages about the Fibonacci function, to
+Dion Almaer for thought-provoking suggestions about the default
+normalizer, to Walt Mankowski and Kurt Starsinic for much help
+investigating problems under threaded Perl, to Alex Dudkevich for
+reporting the bug in prototyped functions and for checking my patch,
+to Tony Bass for many helpful suggestions, to Philippe Verdret for
+enlightening discussion of Hook::PrePostCall, to Nat Torkington for
+advice I ignored, to Chris Nandor for portability advice, and to Jenda
+Krynicky for being a light in the world.
+
+=cut
+