diff options
author | tege <tege@gmplib.org> | 2002-12-30 20:37:25 +0100 |
---|---|---|
committer | tege <tege@gmplib.org> | 2002-12-30 20:37:25 +0100 |
commit | 82a98bccb000f5e2189593e54b6cc9ef7c5ad515 (patch) | |
tree | 69a3cb17049e410ef95e2899588605579f343ea9 /doc/projects.html | |
parent | 403fddc36ac5376021c87d26a8be39f943efcebe (diff) | |
download | gmp-82a98bccb000f5e2189593e54b6cc9ef7c5ad515.tar.gz |
Add 3-prime FFT under multiplication.
Diffstat (limited to 'doc/projects.html')
-rw-r--r-- | doc/projects.html | 54 |
1 files changed, 34 insertions, 20 deletions
diff --git a/doc/projects.html b/doc/projects.html index c5f3b7f5f..412e2c731 100644 --- a/doc/projects.html +++ b/doc/projects.html @@ -36,37 +36,51 @@ MA 02111-1307, USA. This file current as of 7 Nov 2002. An up-to-date version is available at <a href="http://swox.com/gmp/projects.html">http://swox.com/gmp/projects.html</a>. Please send comments about this page to - <a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>. + <a href="mailto:gmp-devel@swox.com">gmp-devel@swox.com</a>. </comment> <p> This file lists projects suitable for volunteers. Please see the <a href="tasks.html">tasks file</a> for smaller tasks. -<p> If you want to work on any of the projects below, please let tege@swox.com - know. If you want to help with a project that already somebody else is - working on, please talk to that person too, tege@swox.com can put you in - touch. (There are no email addresses of volunteers below, due to spamming - problems.) +<p> If you want to work on any of the projects below, please let <a + href="mailto:gmp-devel@swox.com">gmp-devel@swox.com</a> know. If you want + to help with a project that already somebody else is working on, you will + get in touch through gmp-devel@swox.com. (There are no email addresses of + volunteers below, due to spamming problems.) <ul> <li> <strong>Faster multiplication</strong> - <p> The current multiplication code uses Karatsuba, 3-way Toom, - or Fermat FFT. Several new developments are desirable: + <p> The current multiplication code uses Karatsuba, 3-way Toom, and Fermat + FFT. Several new developments are desirable: <ol> - <li> Handle multiplication of operands with different digit count - better than today. We now split the operands in a very - inefficient way, see mpn/generic/mul.c. - - <li> Consider N-way Toom. See Knuth's Seminumerical - Algorithms for details on the method. Code implementing it - exists. This is asymptotically inferior to FFTs, but is finer - grained. A toom-4 might fit in between toom-3 and the FFTs - (or it might not). - - <li> It's possible CPU dependent effects like cache locality will - have a greater impact on speed than algorithmic improvements. + + <li> Handle multiplication of operands with different digit count better + than today. We now split the operands in a very inefficient way, see + mpn/generic/mul.c. + + <li> Implement an FFT variant computing the coefficients mod m different + limb size primes of the form l*2^k+1. i.e., compute m separate FFTs. + The wanted coefficients will at the end be found by lifting with CRT + (Chinese Remainder Theorem). If we let m = 3, i.e., use 3 primes, we + can split the operands into coefficients at limb boundaries, and if + our machine uses b-bit limbs, we can multiply numbers with close to + 2^b limbs without coefficient overflow. For smaller multiplication, + we might perhaps let m = 1, and instead of splitting our operands at + limb boundaries, split them in much smaller pieces. We might also use + 4 or more primes, and split operands into bigger than b-bit chunks. + By using more primes, the gain in shorter transform length, but lose + in having to do more FFTs, but that is a slight total save. We then + lose in more expensive CRT. + + <li> Perhaps consider N-way Toom. See Knuth's Seminumerical Algorithms for + details on the method. Code implementing it exists. This is + asymptotically inferior to FFTs, but is finer grained. A toom-4 might + fit in between toom-3 and the FFTs (or it might not). + + <li> It's possible CPU dependent effects like cache locality will have a + greater impact on speed than algorithmic improvements. <li> Add support for partial products, either a given number of low limbs or high limbs of the result. A high partial product can be used by |