summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/projects.html43
1 files changed, 42 insertions, 1 deletions
diff --git a/doc/projects.html b/doc/projects.html
index 0aa851bf4..159b8532a 100644
--- a/doc/projects.html
+++ b/doc/projects.html
@@ -33,7 +33,7 @@ MA 02111-1307, USA.
<hr>
<!-- NB. timestamp updated automatically by emacs -->
<comment>
- This file current as of 18 Sep 2002. An up-to-date version is available at
+ This file current as of 28 Sep 2002. An up-to-date version is available at
<a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>.
Please send comments about this page to
<a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>.
@@ -428,6 +428,47 @@ MA 02111-1307, USA.
the end.
+<p> <li> <strong>Factorial</strong>
+
+ <p> Conrad Curry found that a factorial can be efficiently calculated using
+ the prime factorization and a simultaneous powering (Handbook of Applied
+ Cryptography algorithm 14.88). The power for a prime p is of course
+ floor(n/p)+floor(n/p^2)+...
+
+ <p> A difficulty with this is that quite large n can be calculated on a
+ system with enough memory, larger than we'd probably want for a table of
+ primes, so some sort of sieving would be wanted.
+
+<p> <li> <strong>Binomial Coefficients</strong>
+
+ <p> Conrad Curry also reports a big speedup for binomial coefficients using
+ a prime powering scheme, at least for k near n/2. Of course this is
+ only practical for moderate size n since again it requires primes up to
+ n.
+
+ <p> When k is small the current (n-k+1)...n/(1...k) will be fastest. Some
+ sort of rule would be needed for when to use this or when to use prime
+ powering. Such a rule will be a function of n and k. Some
+ investigation is needed to see what sort of shape the crossover line
+ will have, the usual parameter tuning can of course find machine
+ dependent constants to fill in where necessary.
+
+ <p> An easier possibility also reported by Conrad Curry is that it may be
+ faster not to divide out the denominator (1...k) one-limb at a time, but
+ do one big division at the end. Is this because a big divisor in
+ <code>mpn_bdivmod</code> trades the latency of
+ <code>mpn_divexact_1</code> for the throughput of
+ <code>mpn_submul_1</code>? Overheads must hurt though.
+
+ <p> Another reason a big divisor might help is that
+ <code>mpn_divexact_1</code> won't be getting a full limb in
+ <code>mpz_bin_uiui</code>. It's called when the n accumulator is full
+ but the k may not be. Perhaps the two could be decoupled so k is
+ applied when full. It'd be necessary to delay consideration of k terms
+ until the corresponding n terms had been applied though, since otherwise
+ the division won't be exact.
+
+
</ul>
<hr>