diff options
author | tege <tege@gmplib.org> | 2000-07-25 23:39:02 +0200 |
---|---|---|
committer | tege <tege@gmplib.org> | 2000-07-25 23:39:02 +0200 |
commit | 6b01d24995ed6b4bce727db0b6215f4c4a15d667 (patch) | |
tree | 90d3b23aad28aa69757e6d635c9768da38875bf6 /doc | |
parent | 43d1e272a2036f2b9ecb49ac2f776b54df8b72b0 (diff) | |
download | gmp-6b01d24995ed6b4bce727db0b6215f4c4a15d667.tar.gz |
Delete some items that are now done, add item about running `tune', reformat.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/tasks.html | 151 |
1 files changed, 58 insertions, 93 deletions
diff --git a/doc/tasks.html b/doc/tasks.html index 4c67e8387..6b3eaaeb0 100644 --- a/doc/tasks.html +++ b/doc/tasks.html @@ -101,25 +101,25 @@ <code>mpf_get_prec</code>, <code>mpf_set_prec_raw</code>, <code>mpf_set_ui</code>, <code>mpf_init</code>, <code>mpf_init2</code>, <code>mpf_clear</code>, <code>mpf_set_si</code>. - <li> <code>mpz_powm</code> and <code>mpz_powm_ui</code> aren't very fast on one or two limb moduli, due to a lot of function call overheads. These could perhaps be handled as special cases. - <li> <code>mpz_powm</code> and <code>mpz_powm_ui</code> want better algorithm selection, and the latter should use REDC. Both could change to use an <code>mpn_powm</code> and <code>mpn_redc</code>. - <li> <code>mpn_gcd</code> might be able to be sped up on small to moderate sizes by improving <code>find_a</code>, possibly just by providing an alternate implementation for CPUs with slowish <code>count_leading_zeros</code>. - </ul> <h4>Machine Dependent Optimization</h4> <ul> +<li> Run the `tune' utility for more compiler/CPU combinations. We would like + to have gmp-mparam.h files in practically every implementation specific + mpn subdirectory, and repeat each *_THRESHOLD for gcc and the system + compiler. See the `tune' top-level directory for more information. <li> Alpha: Rewrite <code>mpn_addmul_1</code>, <code>mpn_submul_1</code>, and <code>mpn_mul_1</code> for the 21264. On 21264, they should run at 4, 3, and 3 cycles/limb respectively, if the code is unrolled properly. (Ask @@ -129,12 +129,6 @@ multiplies and floating-point multiplies. For the floating-point operations, the single-limb multiplier should be split into three 21-bit chunks. -<li> UltraSPARC: Rewrite 64-bit <code>mpn_add_n</code> and - <code>mpn_sub_n</code>. The current sparc64 code uses <code>MOVcc</code> - instructions, which take about 6 cycles on UltraSPARC. The correct - approach is probably to use conditional branching. That should lead to - loops that run at 4 cycles/limb. (Torbjörn has code that just needs to be - finished.) <li> UltraSPARC: Rewrite 64-bit <code>mpn_addmul_1</code>, <code>mpn_submul_1</code>, and <code>mpn_mul_1</code>. Should use floating-point operations, and split the invariant single-limb multiplier @@ -144,19 +138,15 @@ <li> UltraSPARC: Rewrite <code>mpn_lshift</code> and <code>mpn_rshift</code>. Should give 2 cycles/limb. (Torbjörn has code that just needs to be finished.) - <li> SPARC32/V9: Find out why the speed of <code>mpn_addmul_1</code> and the other multiplies vary so much on successive sizes. - <li> PA64: Improve <code>mpn_addmul_1</code>, <code>mpn_submul_1</code>, and <code>mpn_mul_1</code>. The current development code runs at 11 cycles/limb, which is already very good. But it should be possible to saturate the cache, which will happen at 7.5 cycles/limb. - <li> Sparc & SparcV8: Enable umul.asm for native cc. The generic longlong.h umul_ppmm is suspected to be causing sqr_basecase to be slower than mul_basecase. - <li> UltraSPARC: Write <code>umul_ppmm</code>. Important in particular for <code>mpn_sqr_basecase</code>. <li> Implement <code>mpn_mul_basecase</code> and <code>mpn_sqr_basecase</code> @@ -215,25 +205,19 @@ little-endian and big-endian machines. <li> Handle numeric exceptions: Call an error handler, and/or set <code>gmp_errno</code>. - <li> Implement <code>gmp_fprintf</code>, <code>gmp_sprintf</code>, and <code>gmp_snprintf</code>. Think about some sort of wrapper around <code>printf</code> so it and its several variants don't have to be completely reimplemented. - <li> Implement some <code>mpq</code> input and output functions. - <li> Implement a full precision <code>mpz_kronecker</code>, leave <code>mpz_jacobi</code> for compatibility. - <li> Make the mpn logops and copys available in gmp.h. Since they can be either library functions or inlines, gmp.h would need to be generated from a gmp.in based on what's in the library. gmp.h would still be compiler-independent though. - <li> Make versions of <code>mpz_set_str</code> etc taking string lengths rather than null-terminators. - </ul> @@ -252,76 +236,59 @@ processor and operating system. <ul> - - <li> Find out whether there's an alloca available and how to use it. - AC_FUNC_ALLOCA has various system dependencies covered, but we - don't want its alloca.c replacement. (One thing current cpp - tests don't cover: HPUX 10 C compiler supports alloca, but - cannot find any symbol to test in order to know if we're on - HPUX 10. Damn.) - - <li> Improve config.guess. We want to recognize the processor very - accurately, more accurately than other GNU packages. - config.guess does not currently make the distinctions we would - like it to do and a --target often needs to be set explicitly. - Remember to make sure config.sub accepts the guesses. - - <li> Identify Mips processor under Irix: `hinv -c processor'. - config.guess should say mips2, mips3, and mips4. - - <li> Identify Alpha processor under OSF: "/usr/sbin/sizer -c". - Unfortunately, sizer is not available before some revision of - Dec Unix 4.0, and it also returns some rather cryptic names for - processors. Perhaps the <code>implver</code> and - <code>amask</code> assembly instructions are better, but that - doesn't differentiate between ev5 and ev56. - - <li> Identify Sparc processors. config.guess should say supersparc, - microsparc, ultrasparc1, ultrasparc2, etc. - - <li> Identify HPPA processors similarly. - - <li> Get lots of information about a Solaris system: prtconf -vp - - <li> For some target machines and some compilers, specific options - are needed (sparcv8/gcc needs -mv8, sparcv8/cc needs -cg92, - Irix64/cc needs -64, Irix32/cc might need -n32, etc). Some are - set already, add more, see configure.in. - - <li> Options to be passed to the assembler (via the compiler, using - whatever syntax the compiler uses for passing options to the - assembler). - - <li> On Solaris 7, check if gcc supports native v9 64-bit - arithmetic. If not compile using "cc -fast -xarch=v9". - (Problem: -fast requires that we link with -fast too, which - might not be very good. Pass "-xO4 -xtarget=native" instead?) - - <li> Extend the "optional" compiler arguments to choose the first - that works from from a set, so when gcc gets athlon support it - can try -mcpu=athlon, -mcpu=pentiumpro, or -mcpu=i486, - whichever works. - - <li> Detect gcc >=2.96 and enable -march=pentiumpro for relevant - x86s. (A bug in gcc 2.95.2 prevents it being used - unconditionally.) - - <li> Build multiple variants of the library under certain systems. - An example is -n32, -o32, and -64 on Irix. - - <li> Check name conflicts under DOS 8.3 filenames and DJGPP, with a - view to avoiding at least the simplest ones. Similarly old - SysV 14 char names. - - <li> Enable support for FORTRAN versions of mpn files (eg. for - mpn/cray/mulww.f). Add "f" to the mpn path searching, run - AC_PROG_F77 if such a file is found, . Hopefully automake will - generate everything needed in the makefiles. - - <li> Only run GMP_PROG_M4 if it's needed, ie. if there's .asm files - selected from the mpn path. This might help say a generic C - build on weird systems. - +<li> Find out whether there's an alloca available and how to use it. + AC_FUNC_ALLOCA has various system dependencies covered, but we + don't want its alloca.c replacement. (One thing current cpp + tests don't cover: HPUX 10 C compiler supports alloca, but + cannot find any symbol to test in order to know if we're on + HPUX 10. Damn.) +<li> Improve config.guess. We want to recognize the processor very + accurately, more accurately than other GNU packages. + config.guess does not currently make the distinctions we would + like it to do and a --target often needs to be set explicitly. + Remember to make sure config.sub accepts the guesses. +<li> Identify Mips processor under Irix: `hinv -c processor'. + config.guess should say mips2, mips3, and mips4. +<li> Identify Alpha processor under OSF: "/usr/sbin/sizer -c". + Unfortunately, sizer is not available before some revision of + Dec Unix 4.0, and it also returns some rather cryptic names for + processors. Perhaps the <code>implver</code> and + <code>amask</code> assembly instructions are better, but that + doesn't differentiate between ev5 and ev56. +<li> Identify Sparc processors. config.guess should say supersparc, + microsparc, ultrasparc1, ultrasparc2, etc. +<li> Identify HPPA processors similarly. +<li> Get lots of information about a Solaris system: prtconf -vp +<li> For some target machines and some compilers, specific options + are needed (sparcv8/gcc needs -mv8, sparcv8/cc needs -cg92, + Irix64/cc needs -64, Irix32/cc might need -n32, etc). Some are + set already, add more, see configure.in. +<li> Options to be passed to the assembler (via the compiler, using + whatever syntax the compiler uses for passing options to the + assembler). +<li> On Solaris 7, check if gcc supports native v9 64-bit + arithmetic. If not compile using "cc -fast -xarch=v9". + (Problem: -fast requires that we link with -fast too, which + might not be very good. Pass "-xO4 -xtarget=native" instead?) +<li> Extend the "optional" compiler arguments to choose the first + that works from from a set, so when gcc gets athlon support it + can try -mcpu=athlon, -mcpu=pentiumpro, or -mcpu=i486, + whichever works. +<li> Detect gcc >=2.96 and enable -march=pentiumpro for relevant + x86s. (A bug in gcc 2.95.2 prevents it being used + unconditionally.) +<li> Build multiple variants of the library under certain systems. + An example is -n32, -o32, and -64 on Irix. +<li> Check name conflicts under DOS 8.3 filenames and DJGPP, with a + view to avoiding at least the simplest ones. Similarly old + SysV 14 char names. +<li> Enable support for FORTRAN versions of mpn files (eg. for + mpn/cray/mulww.f). Add "f" to the mpn path searching, run + AC_PROG_F77 if such a file is found, . Hopefully automake will + generate everything needed in the makefiles. +<li> Only run GMP_PROG_M4 if it's needed, ie. if there's .asm files + selected from the mpn path. This might help say a generic C + build on weird systems. </ul> <p> In general, getting the exact right configuration, passing the @@ -333,7 +300,7 @@ target machines: (1) Both gcc and cc (and c89). (2) Both 32-bit mode and 64-bit mode (such as -n32 vs -64 under Irix). (3) Both the system `make' and GNU `make'. (4) With and without GNU binutils. - + <h4>Miscellaneous</h4> <ul> @@ -349,7 +316,6 @@ and 64-bit mode (such as -n32 vs -64 under Irix). (3) Both the system <li> Maybe make mpz_pow_ui.c more like mpz/ui_pow_ui.c, or write new mpn/generic/pow_ui. <li> Make mpz_invert call mpn_gcdext directly. - <li> Make a build option to enable execution profiling with gprof. In particular look at getting the right <code>mcount</code> call at the start of each assembler subroutine (for important targets at @@ -362,7 +328,6 @@ and 64-bit mode (such as -n32 vs -64 under Irix). (3) Both the system <li> Make an option for stack-alloc.c to call <code>malloc</code> separately for each <code>TMP_ALLOC</code> block, so a redzoning malloc debugger could be used during development. - <li> Add <code>ASSERT</code>s at the start of each user-visible mpz/mpq/mpf function to check the validity of each <code>mp?_t</code> parameter, in particular to check they've been |