1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>
GMP Development Projects
</title>
</head>
<body bgcolor=pink>
<center>
<h1>
GMP Development Projects
</h1>
</center>
<font size=2>
Copyright 2000, 2001 Free Software Foundation, Inc.
<p> This file is part of the GNU MP Library.
<p> The GNU MP Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation; either version 2.1 of the License, or (at
your option) any later version.
<p> The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
License for more details.
<p> You should have received a copy of the GNU Lesser General Public License
along with the GNU MP Library; see the file COPYING.LIB. If not, write to
the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
MA 02111-1307, USA.
</p>
</font>
<hr>
<!-- NB. timestamp updated automatically by emacs -->
<comment>
This file current as of 17 Oct 2001. An up-to-date version is available at
<a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>.
Please send comments about this page to
<a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>.
</comment>
<p> This file lists projects suitable for volunteers. Please see the
<a href="tasks.html">tasks file</a> for smaller tasks.
<p> If you want to work on any of the projects below, please let tege@swox.com
know. If you want to help with a project that already somebody else is
working on, please talk to that person too, tege@swox.com can put you in
touch. (There are no email addresses of volunteers below, due to spamming
problems.)
<ul>
<li> <strong>C++ wrapper</strong>
<p> A C++ wrapper for GMP is highly desirable.
<a href="http://www.sissa.it/~ballabio/gmp++.html">GMP++</a> has made
excellent progress on this.
For a wrapper to be useful, one needs to pay attention to efficiency.
<ol>
<li> Write arithmetic functions for direct applications of most
elementary C++ types. This might be necessary to avoid
ambiguities, but it is also desirable from an efficiency
standpoint.
<li> Avoid running constructors/destructors when not necessary. For
example, implement <code>a += b</code> by directly applying
<code>mpz_add</code>.
</ol>
<p> <li> <strong>Faster multiplication</strong>
<p> The current multiplication code uses Karatsuba, 3-way Toom-Cook,
or Fermat FFT. Several new developments are desirable:
<ol>
<li> Handle multiplication of operands with different digit count
better than today. We now split the operands in a very
inefficient way, see mpn/generic/mul.c.
<li> Consider N-way Toom-Cook. See Knuth's Seminumerical
Algorithms for details on the method. Code implementing it
exists. This is asymptotically inferior to FFTs, but is finer
grained. A toom-4 might fit in between toom-3 and the FFTs
(or it might not).
<li> It's possible CPU dependent effects like cache locality will
have a greater impact on speed than algorithmic improvements.
<li> Add support for partial products, either a given number of low limbs
or high limbs of the result. A high partial product can be used by
<code>mpf_mul</code> now, a low half partial product might be of use
in a future sub-quadratic REDC. On small sizes a partial product
will be faster simply through fewer cross-products, similar to the
way squaring is faster. But work by Thom Mulders shows that for
Karatsuba and higher order algorithms the advantage is progressively
lost, so for large sizes partial products turn out to be no faster.
</ol>
<p> Another possibility would be an optimized cube. In the basecase that
should definitely be able to save cross-products in a similar fashion to
squaring, but some investigation might be needed for how best to adapt
the higher-order algorithms. Not sure whether cubing or further small
powers have any particularly important uses though.
<p> <li> <strong>Assembly routines</strong>
<p> Write new and improve existing assembly routines. The tests/devel
programs and the tune/speed.c and tune/many.pl programs are useful for
testing and timing the routines you write. See the README files in those
directories for more information.
<p> Please make sure your new routines are fast for these three situations:
<ol>
<li> Operands that fit into the cache.
<li> Small operands of less than, say, 10 limbs.
<li> Huge operands that does not fit into the cache.
</ol>
<p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
mpn_sqr_basecase. The latter two don't exist for all machines, while
mpn_addmul_1 exists for almost all machines.
<p> Standard techniques for these routines are unrolling, software
pipelining, and specialization for common operand values. For machines with
poor integer multiplication, it is often possible to improve the performance
using floating-point operations, or SIMD operations such as MMX or Sun's VIS.
<p> Using floating-point operations is interesting but somewhat tricky.
Since IEEE double has 53 bit of mantissa, one has to split the operands in
small prices, so that no result is greater than 2^53. For 32-bit computers,
splitting one operand into 16-bit pieces works. For 64-bit machines, one
operand can be split into 21-bit pieces and the other into 32-bit pieces. (A
64-bit operand can be split into just three 21-bit pieces if one allows the
split operands to be negative!)
<p> <li> <strong>Faster extended GCD</strong>
<p> We currently have extended GCD based on Lehmer's method.
But the binary method can quite easily be improved for bignums
in a similar way Lehmer improved Euclid's algorithm. The result should
beat Lehmer's method by about a factor of 2. Furthermore, the multiprecision
step of Lehmer's method and the binary method will be identical, so the
current code can mostly be reused. It should be possible to share code
between GCD and GCDEXT, and probably with Jacobi symbols too.
<p> <li> <strong>Math functions for the mpf layer</strong>
<p> Implement the functions of math.h for the GMP mpf layer! Check the book
"Pi and the AGM" by Borwein and Borwein for ideas how to do this. These
functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos,
cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
<p> <li> <strong>Faster sqrt</strong>
<p> The current code uses divisions, which are reasonably fast, but it'd be
possible to use only multiplications by computing 1/sqrt(A) using this
formula:
<pre>
2
x = x (3 - A x )/2.
i+1 i i </pre>
And the final result
<pre>
sqrt(A) = A x .
n </pre>
That final multiply might be the full size of the input (though it might
only need the high half of that), so there may or may not be any speedup
overall.
<p> <li> <strong>Nth root</strong>
<p> Implement, at the mpn level, a routine for computing the nth root of a
number. The routine should use Newton's method, preferably without using
division.
<p> If the routine becomes fast enough, perhaps square roots could be computed
using this function.
<p> <li> <strong>Random number generators</strong>
<p> The implementation of the linear congruential generator is not
particularly fast. Perhaps a second seed areas within
<code>gmp_randstate_t</code> would save some copying, and perhaps a
special case for single (or single and double) limb moduli could avoid
lots of function calls.
<p> More generator algorithms would be good. An additive style (delayed
Fibonacci or whatever it's called) ought to be fast, and Blum-Blum-Shub
was planned.
<p> Some random functions giving various distributions (normal, geometric,
etc) might be good too.
<p> <li> <strong>Test Suite</strong>
<p> Add a test suite for old bugs. These tests wouldn't loop or use
random numbers, but just test for specific bugs found in the
past.
<p> More importantly, improve the random number controls for test
collections:
<ol>
<li> Use the new random number framework.
<li> Have every test accept a seed parameter.
<li> Allow `make check' to set the seed parameter.
<li> Add more tests for important, now untested functions.
</ol>
<p> With this in place, it should be possible to rid GMP of
practically all bugs by having some dedicated GMP test machines
running "make check" with ever increasing seeds (and
periodically updating to the latest GMP).
<p> If a few more ASSERTs were sprinkled through the code, running
some tests with --enable-assert might be useful, though not a
substitute for tests on the normal build.
<p> An important feature of any random tests will be to record the
seeds used, and perhaps to snapshot operands before performing
each test, so any problem exposed can be reproduced.
</ul>
<hr>
</body>
</html>
<!--
Local variables:
eval: (add-hook 'write-file-hooks 'time-stamp)
time-stamp-start: "This file current as of "
time-stamp-format: "%:d %3b %:y"
time-stamp-end: "\\."
time-stamp-line-limit: 50
End:
-->
|