summaryrefslogtreecommitdiff
path: root/doc/projects.html
diff options
context:
space:
mode:
authorTorbjorn Granlund <tege@gmplib.org>2008-12-15 15:14:22 +0100
committerTorbjorn Granlund <tege@gmplib.org>2008-12-15 15:14:22 +0100
commit78d95a8dec1030f71f99915c8947bb8a13a39ac4 (patch)
treefbe808bc0800a6bbfedbff798148f4c20085b84c /doc/projects.html
parent4f3ff19a70fa5be1a71e8c3827d1331c333ccec2 (diff)
downloadgmp-78d95a8dec1030f71f99915c8947bb8a13a39ac4.tar.gz
Remove GCD and division projects, update text on multiplication.
Diffstat (limited to 'doc/projects.html')
-rw-r--r--doc/projects.html152
1 files changed, 64 insertions, 88 deletions
diff --git a/doc/projects.html b/doc/projects.html
index 5d1640f25..165726be7 100644
--- a/doc/projects.html
+++ b/doc/projects.html
@@ -14,24 +14,30 @@
</center>
<font size=-1>
-Copyright 2000, 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation,
-Inc. <br><br>
-This file is part of the GNU MP Library. <br><br>
+<pre>
+Copyright 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2008 Free Software
+Foundation, Inc.
+
+This file is part of the GNU MP Library.
+
The GNU MP Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation; either version 3 of the License, or (at
-your option) any later version. <br><br>
+your option) any later version.
+
The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
-License for more details. <br><br>
+License for more details.
+
You should have received a copy of the GNU Lesser General Public License
along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.
+</pre>
</font>
<hr>
<!-- NB. timestamp updated automatically by emacs -->
- This file current as of 21 Apr 2006. An up-to-date version is available at
+ This file current as of 15 Dec 2008. An up-to-date version is available at
<a href="http://gmplib.org/projects.html">http://gmplib.org/projects.html</a>.
Please send comments about this page to gmp-devel<font>@</font>swox.com.
@@ -47,15 +53,22 @@ along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.
<ul>
<li> <strong>Faster multiplication</strong>
- <p> The current multiplication code uses Karatsuba, 3-way Toom, and Fermat
- FFT. Several new developments are desirable:
+ <p> The current multiplication code uses Karatsuba, 3-way and 4-way Toom, and
+ Fermat FFT. Several new developments are desirable:
<ol>
- <li> Handle multiplication of operands with different digit count better
- than today. We now split the operands in a very inefficient way, see
- mpn/generic/mul.c. The best operands splitting strategy depends on
- the underlying multiplication algorithm to be used.
+ <li> Write more toom multiply functions for unbalanced operands. We now have
+ toom22, toom32, toom42, toom62, toom33, toom53, and toom44. Most
+ desirable is toom43, which will require a new toom_interpolate_6pts
+ function. Writing toom52 will then be straightforward. See also
+ <a href="http://bodrato.it/software/toom.html">Marco Bodrato's
+ site</a>
+
+ <li> The mpn_mul call now (from GMP 4.3) uses toom22, toom32, and toom42
+ for unbalanced operations. We don't use any of the other new toom
+ functions currently. Write new clever code for choosing the best toom
+ function from an m-limb and an n-limb operand.
<li> Implement an FFT variant computing the coefficients mod m different
limb size primes of the form l*2^k+1. i.e., compute m separate FFTs.
@@ -71,22 +84,23 @@ along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.
in having to do more FFTs, but that is a slight total save. We then
lose in more expensive CRT. <br><br>
- An nearly complete implementation has been done by Tommy Färnqvist.
+ [We now have two implementations of this algorithm, one by Tommy
+ Färnqvist and one by Niels Möller.]
- <li> Perhaps consider N-way Toom, N > 3. See Knuth's Seminumerical
+ <li> Perhaps consider N-way Toom, N > 4. See Knuth's Seminumerical
Algorithms for details on the method. Code implementing it exists.
- This is asymptotically inferior to FFTs, but is finer grained. A
- Toom-4 might fit in between Toom-3 and the FFTs (or it might not).
-
- <li> Add support for partial products, either a given number of low limbs
- or high limbs of the result. A high partial product can be used by
- <code>mpf_mul</code> and by Newton approximations, a low half partial
- product might be of use in a future sub-quadratic REDC. On small
- sizes a partial product will be faster simply through fewer
- cross-products, similar to the way squaring is faster. But work by
- Thom Mulders shows that for Karatsuba and higher order algorithms the
- advantage is progressively lost, so for large sizes partial products
- turn out to be no faster.
+ This is asymptotically inferior to FFTs, but is finer grained.
+
+ <li> Add support for short products, either a given number of low limbs, a
+ given number of high limbs, or perhaps the middle limbs of the result.
+ High short product can be used by <code>mpf_mul</code>, by
+ left-to-right Newton approximations, and for quotient approximation.
+ Low half short product can be of use in sub-quadratic REDC and for
+ right-to-left Newton approximations. On small sizes a short product
+ will be faster simply through fewer cross-products, similar to the way
+ squaring is faster. But work by Thom Mulders shows that for Karatsuba
+ and higher order algorithms the advantage is progressively lost, so
+ for large sizes shows products turn out to be no faster.
</ol>
@@ -106,9 +120,9 @@ along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.
<p> Please make sure your new routines are fast for these three situations:
<ol>
- <li> Operands that fit into the cache.
- <li> Small operands of less than, say, 10 limbs.
- <li> Huge operands that does not fit into the cache.
+ <li> Operands that fit into the cache.
+ <li> Small operands of less than, say, 10 limbs.
+ <li> Huge operands that does not fit into the cache.
</ol>
<p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
@@ -123,22 +137,16 @@ along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.
<p> Using floating-point operations is interesting but somewhat tricky.
Since IEEE double has 53 bit of mantissa, one has to split the operands
- in small prices, so that no result is greater than 2^53. For 32-bit
+ in small pieces, so that no result is greater than 2^53. For 32-bit
computers, splitting one operand into 16-bit pieces works. For 64-bit
machines, one operand can be split into 21-bit pieces and the other into
32-bit pieces. (A 64-bit operand can be split into just three 21-bit
pieces if one allows the split operands to be negative!)
-<li> <strong>Faster GCD</strong>
-
- <p> Work on Schönhage GCD and GCDEXT for large numbers is in progress.
- Contact Niels Möller if you want to help.
-
-
<li> <strong>Math functions for the mpf layer</strong>
- <p> Implement the functions of math.h for the GMP mpf layer! Check the book
+ <p> Implement the functions of math.h for the GMP mpf layer! Check the book
"Pi and the AGM" by Borwein and Borwein for ideas how to do this. These
functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2,
cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
@@ -150,17 +158,20 @@ along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.
possible to use only multiplications by computing 1/sqrt(A) using this
formula:
<pre>
- 2
- x = x (3 &minus; A x )/2
- i+1 i i </pre>
+ 2
+ x = x (3 &minus; A x )/2
+ i+1 i i </pre>
The square root can then be computed like this:
<pre>
- sqrt(A) = A x
- n </pre>
- <p> That final multiply might be the full size of the input (though it might
+ sqrt(A) = A x
+ n </pre>
+ <p> That final multiply might be the full size of the input (though it might
only need the high half of that), so there may or may not be any speedup
overall.
+ <p> We should probably allow a special exponent-like parameter, to speed
+ computations of a precise square root of a small number in mpf.
+
<li> <strong>Nth root</strong>
@@ -179,47 +190,12 @@ along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.
subtract, rather than forming a complete remainder. The greatest savings
are when the quotient is small compared to the dividend and divisor.
- <p> Some code along these lines can be found in the current
- <code>mpn_tdiv_qr</code>, though perhaps calculating bigger chunks of
- remainder than might be strictly necessary. That function in its current
- form actually then always goes on to calculate a full remainder.
- Burnikel and Zeigler describe a similar approach for the divide and
- conquer case.
-
-
-<li> <strong>Sub-Quadratic REDC and Exact Division</strong>
-
- <p> See also
- <a href="http://swox.com/gmp/development.html">http://swox.com/gmp/development.html</a>
- for some new code for divexact.
-
- <p> <code>mpn_bdivmod</code> and the <code>redc</code> in
- <code>mpz_powm</code> should use some sort of divide and conquer
- algorithm. This would benefit <code>mpz_divexact</code>, and
- <code>mpn_gcd</code> on large unequal size operands. See "Exact Division
- with Karatsuba Complexity" by Jebelean for a (brief) description.
-
- <p> Failing that, some sort of <code>DIVEXACT_THRESHOLD</code> could be added
- to control whether <code>mpz_divexact</code> uses
- <code>mpn_bdivmod</code> or <code>mpn_tdiv_qr</code>, since the latter is
- faster on large divisors.
-
- <p> For the REDC, basically all that's needed is Montgomery's algorithm done
- in multi-limb integers. R is multiple limbs, and the inverse and
- operands are multi-precision.
-
- <p> For exact division the time to calculate a multi-limb inverse is not
- amortized across many modular operations, but instead will probably
- create a threshold below which the current style <code>mpn_bdivmod</code>
- is best. There's also Krandick and Jebelean, "Bidirectional Exact
- Integer Division" to basically use a low to high exact division for the
- low half quotient, and a quotient-only division for the high half.
-
- <p> It will be noted that low-half and high-half multiplies, and a low-half
- square, can be used. These ought to each take as little as half the time
- of a full multiply, or square, though work by Thom Mulders shows the
- advantage is progressively lost as Karatsuba and higher algorithms are
- applied.
+ <p> Some code along these lines can be found in the
+ current <code>mpn_tdiv_qr</code>, though perhaps calculating bigger
+ chunks of remainder than might be strictly necessary. That function in
+ its current form actually then always goes on to calculate a full
+ remainder. Burnikel and Zeigler describe a similar approach for the
+ divide and conquer case.
<li> <strong>Exceptions</strong>
@@ -252,8 +228,8 @@ along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.
externally, but being a plain C library we don't want to depend on that.
<p> A C++ <code>throw</code> might be a good optional extra exceptions
- mechanism, perhaps under a build option. For GCC
- <code>-fexceptions</code> will add the necessary frame information to
+ mechanism, perhaps under a build option. For
+ GCC <code>-fexceptions</code> will add the necessary frame information to
plain C code, or GMP could be compiled as C++.
<p> Out-of-memory exceptions are expected to be handled by the
@@ -343,8 +319,8 @@ along with the GNU MP Library. If not, see http://www.gnu.org/licenses/.
uses would be,
<ol>
- <li> Checking speed variations between compilers.
- <li> Checking relative performance between systems or CPUs.
+ <li> Checking speed variations between compilers.
+ <li> Checking relative performance between systems or CPUs.
</ol>
<p> A combination of measuring some fundamental routines and some