diff options
-rw-r--r-- | doc/projects.html | 43 |
1 files changed, 42 insertions, 1 deletions
diff --git a/doc/projects.html b/doc/projects.html index 0aa851bf4..159b8532a 100644 --- a/doc/projects.html +++ b/doc/projects.html @@ -33,7 +33,7 @@ MA 02111-1307, USA. <hr> <!-- NB. timestamp updated automatically by emacs --> <comment> - This file current as of 18 Sep 2002. An up-to-date version is available at + This file current as of 28 Sep 2002. An up-to-date version is available at <a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>. Please send comments about this page to <a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>. @@ -428,6 +428,47 @@ MA 02111-1307, USA. the end. +<p> <li> <strong>Factorial</strong> + + <p> Conrad Curry found that a factorial can be efficiently calculated using + the prime factorization and a simultaneous powering (Handbook of Applied + Cryptography algorithm 14.88). The power for a prime p is of course + floor(n/p)+floor(n/p^2)+... + + <p> A difficulty with this is that quite large n can be calculated on a + system with enough memory, larger than we'd probably want for a table of + primes, so some sort of sieving would be wanted. + +<p> <li> <strong>Binomial Coefficients</strong> + + <p> Conrad Curry also reports a big speedup for binomial coefficients using + a prime powering scheme, at least for k near n/2. Of course this is + only practical for moderate size n since again it requires primes up to + n. + + <p> When k is small the current (n-k+1)...n/(1...k) will be fastest. Some + sort of rule would be needed for when to use this or when to use prime + powering. Such a rule will be a function of n and k. Some + investigation is needed to see what sort of shape the crossover line + will have, the usual parameter tuning can of course find machine + dependent constants to fill in where necessary. + + <p> An easier possibility also reported by Conrad Curry is that it may be + faster not to divide out the denominator (1...k) one-limb at a time, but + do one big division at the end. Is this because a big divisor in + <code>mpn_bdivmod</code> trades the latency of + <code>mpn_divexact_1</code> for the throughput of + <code>mpn_submul_1</code>? Overheads must hurt though. + + <p> Another reason a big divisor might help is that + <code>mpn_divexact_1</code> won't be getting a full limb in + <code>mpz_bin_uiui</code>. It's called when the n accumulator is full + but the k may not be. Perhaps the two could be decoupled so k is + applied when full. It'd be necessary to delay consideration of k terms + until the corresponding n terms had been applied though, since otherwise + the division won't be exact. + + </ul> <hr> |