From d9c394bf83e2be9e03db34e4f52c8fc02fb61ae5 Mon Sep 17 00:00:00 2001 From: Kevin Ryde Date: Tue, 21 May 2002 01:29:15 +0200 Subject: Remove _mpz_realloc truncates to invalid, done. Remove P55 mmx mpn_divexact_1, not as promising as first thought. Update VAX floats idea for new configure float detection. Add gmp_printf %b for binary. Amend mpz_togglebit to be called mpz_combit. Update float format detection task partly done. Remove powerpc 750 gmp-mparam.h, done. Remove sparc exact cpu detections, done (as much as desired for now). Remove sparclet and sparclite, too old to be worth bothering with. Add sparc configure enhancements for exact cpus. Remove 55-element fibonacci random generator, decided against. Add mersenne twister random generator. Remove demos/factorize.c use gmp rand functions, done. More on powerpc mftb for speed measuring. Misc other rewordings. --- doc/tasks.html | 93 ++++++++++++++++++++++++---------------------------------- 1 file changed, 38 insertions(+), 55 deletions(-) diff --git a/doc/tasks.html b/doc/tasks.html index f8877f752..839cdfd89 100644 --- a/doc/tasks.html +++ b/doc/tasks.html @@ -33,7 +33,7 @@ MA 02111-1307, USA.
- This file current as of 8 May 2002. An up-to-date version is available at + This file current as of 20 May 2002. An up-to-date version is available at http://www.swox.com/gmp/tasks.html. Please send comments about this page to bug-gmp@gnu.org. @@ -66,11 +66,6 @@ MA 02111-1307, USA. situation; put similar code in mpf_eq.
  • mpf_eq doesn't implement what gmp.texi specifies. It should not use just whole limbs, but partial limbs. -
  • _mpz_realloc will reallocate to less than the current size - of an mpz. This is ok internally when a destination size is yet to be - updated, but from user code it should probably refuse to go below the - size of the current value. Could switch internal uses across to a new - function, and tighten up _mpz_realloc.
  • mpf_set_str doesn't validate it's exponent, for instance garbage 123.456eX789X is accepted (and an exponent 0 used), and overflow of a long is not detected. @@ -322,7 +317,7 @@ MA 02111-1307, USA. might only be an advantage if A and B are about the same size.
  • mpn_toom3_mul_n, mpn_toom3_sqr_n: Temporaries B and D are adjacent in memory and at the final - coefficient addition look like they could use a single + coefficient additions look like they could use a single mpn_add_n of l4 limbs rather than two of l2 limbs. @@ -457,9 +452,6 @@ MA 02111-1307, USA.
  • Pentium P54: mpn_lshift and mpn_rshift can come down from 6.0 c/l to 5.5 or 5.375 by paying attention to pairing after shrdl and shldl, see mpn/x86/pentium/README. -
  • Pentium P55 MMX: mpn_divexact_1 and - mpn_modexact_1_odd on 16-bit divisors could use MMX - multiplies and run at around 16 cycles (down from 23).
  • Pentium P55 MMX: mpn_lshift and mpn_rshift might benefit from some destination prefetching.
  • PentiumPro: mpn_divrem_1 might be able to use a @@ -511,9 +503,9 @@ MA 02111-1307, USA. generic C versions of mpn_popcount and mpn_hamdist suffice for Cray (if it vectorizes, or can be given a hint to do so). -
  • 68000: mpn_mul_1 could check for a 16-bit multiplier and use - two multiplies per limb, not four. Ditto mpn_addmul_1 and - mpn_submul_1. +
  • 68000: mpn_mul_1, mpn_addmul_1, + mpn_submul_1: Check for a 16-bit multiplier and use two + multiplies per limb, not four.
  • 68000: mpn_lshift and mpn_rshift could use a roll and mask instead of lsrl and lsll. This promises to be a speedup, effectively trading a @@ -530,8 +522,7 @@ MA 02111-1307, USA.
  • VAX D and G format double floats are straightforward and could perhaps be handled directly in __gmp_extract_double and maybe in mpz_get_d, rather than falling back on the - generic code. GCC defines __GFLOAT when -mg has selected G - format (which would be possible via a user CFLAGS). + generic code. (Both formats are detected by configure.)
  • mpn_get_str final divisions by the base with udiv_qrnd_unnorm could use some sort of multiply-by-inverse on suitable machines. This ends up happening for decimal by presenting @@ -627,13 +618,16 @@ MA 02111-1307, USA. recognise the Intel 80-bit format on i386, and IEEE 128-bit quad on sparc, hppa and power. Might like an ABI sub-option or something when it's a compiler option for 64-bit or 128-bit long double. -
  • gmp_printf could usefully accept an arbitrary base, for both - integer and float conversions. Either a number in the format string or a - * to take a parameter should be allowed. Maybe +
  • gmp_printf could accept %b for binary output. + It'd be nice if it worked for plain int etc too, not just + mpz_t etc. +
  • gmp_printf in fact could usefully accept an arbitrary base, + for both integer and float conversions. A base either in the format + string or as a parameter with * should be allowed. Maybe &13b (b for base) or something like that.
  • gmp_printf could perhaps have a type code for an mp_limb_t. That would save an application from having to - worry whether it's a long or a long long. + worry whether it's a long or a long long.
  • gmp_printf could perhaps accept mpq_t for float conversions, eg. "%.4Qf". This would be merely for convenience, but still might be useful. Rounding would be the same as @@ -647,9 +641,9 @@ MA 02111-1307, USA. supported in the future, or perhaps for mpq_t. Something like &*r (r for rounding, and mpfr style GMP_RND parameter). -
  • mpz_togglebit or mpz_chgbit or some such might - be a good companion to mpz_setbit and - mpz_clrbit. Suggested by Niels Möller. +
  • mpz_combit to toggle a bit would be a good companion for + mpz_setbit and mpz_clrbit. Suggested by Niels + Möller (and has done some work towards it).
  • mpz_scan0_reverse or mpz_scan0low or some such searching towards the low end of an integer might match mpz_scan0 nicely. Likewise for scan1. @@ -715,11 +709,9 @@ MA 02111-1307, USA.

    Configuration

      -
    • Floating-point format: Determine this with a feature test. Get rid of - the #ifdef mess in gmp-impl.h. This is simple when doing a - native compile, but needs a reliable way to examine object files when - cross-compiling. Falling back on a run-time test would be reasonable, if - build time tests fail. +
    • Floating-point format: GMP_C_DOUBLE_FORMAT seems to work + well. Get rid of the #ifdef mess in gmp-impl.h and use the + results of the test instead.
    • a29k: umul.s and udiv.s exist but don't get used.
    • ARM: umul_ppmm in longlong.h always uses umull, but is that available only for M series chips or some such? Perhaps it @@ -733,28 +725,19 @@ MA 02111-1307, USA. ia64-*-hpux*. Does GMP need to know anything about that?
    • Mips: config.guess should say mipsr3000, mipsr4000, mipsr10000, etc. "hinv -c processor" gives lots of information on Irix. Standard - config.guess appends "el" to indicate endianness. GMP currently only - cares about that for a small mpz_inp_raw and - mpz_out_raw optimization. It's hoped - AC_C_BIGENDIAN can be relied on to interrogate the compiler. -
    • PowerPC-32: gmp-mparam.h comes out quite different for a 750 than a 604e, - it'd be good to select the right one, probably by having CPU types - powerpc604, powerpc750 etc. -
    • PowerPC: The crazy explicit TOC setups for AIX are currently driven by + config.guess appends "el" to indicate endianness, but + AC_C_BIGENDIAN seems the best way to handle that for GMP. +
    • PowerPC: The function descriptor nonsense for AIX is currently driven by *-*-aix*. It might be more reliable to do some sort of - feature test or to examine the compiler output. It might also be nice to - merge the aix.m4 files into powerpc-defs.m4. -
    • Sparc: config.guess should say supersparc, microsparc, ultrasparc1, - ultrasparc2, etc. "prtconf -vp" gives lots of information about a - Solaris system. -
    • Sparc: recognise sparclite and sparclet (which configfsf.sub accepts). - These have umul but not udiv, or something like - that. Check the mpn/sparc32/v8 code is suitable, and add -mcpu= options - for gcc. + feature test, examining the compiler output perhaps. It might also be + nice to merge the aix.m4 files into powerpc-defs.m4. +
    • Sparc: config.guess recognises various exact sparcs, make + use of that information in configure (work on this is in + progress).
    • Sparc32: floating point or integer udiv should be selected according to the CPU target. Currently floating point ends up being used on all sparcs, which is probably not right for generic V7 and V8. -
      Sparc: The use of -xtarget=native with cc is +
    • Sparc: The use of -xtarget=native with cc is incorrect when cross-compiling, the target should be set according to the configured $host CPU.
    • m68k: config.guess can detect 68000, 68010, CPU32 and 68020, but relies @@ -803,7 +786,7 @@ MA 02111-1307, USA. someone with a dual cygwin/mingw setup to test.
    • Automake: Latest automake has a CCAS, CCASFLAGS scheme. Though we probably wouldn't be using its assembler support we - could do use those variables in compatible ways. + could try to use those variables in compatible ways.
    @@ -823,13 +806,14 @@ MA 02111-1307, USA.
  • Perhaps the 2exp and general LC cases should be split, for clarity (if the general case is retained). -
  • gmp_randinit_mm (named after Mitchell and Moore) for the - 55-element delayed Fibonacci generator from Knuth vol 2. Being additive - it should be fast, and might be random enough for GMP test program - purposes, if nothing else. Niels Möller has started on this. +
  • gmp_randinit_mers for a Mersenne Twister generator. It's + likely to be more random and about the same speed as Knuth's 55-element + Fibonacci generator, and can probably become the default. Pedro Gimeno + has started on this.
  • gmp_randinit_lc: Finish or remove. Doing a division for every every step won't be very fast, so check whether the usefulness of - this algorithm can be justified. + this algorithm can be justified. (Consensus is that it's not useful and + can be removed.)
  • Blum-Blum-Shub: Finish or remove. A separate gmp_randinit_bbs would be wanted, not the currently commented out case in gmp_randinit. @@ -862,8 +846,6 @@ MA 02111-1307, USA. |x|<|b| and |y|<|a| or something like that, but there's probably two solutions under just those restrictions.
  • mpz_invert should call mpn_gcdext directly. -
  • demos/factorize.c should use the GMP random functions when restarting - Pollard-Rho, not random / mrand48.
  • demos/factorize.c: use mpz_divisible_ui_p rather than mpz_tdiv_qr_ui. (Of course dividing multiple primes at a time would be better still.) @@ -882,8 +864,9 @@ MA 02111-1307, USA. wouldn't need to watch out for overlaps).
  • PowerPC: The cpu time base registers (per mftb and mftbu) could be used for the speed and tune programs. Would - need to know its frequency though, for instance it seemed to be 25 MHz on - a couple of Apples (compared to the CPU speed of 350 or 450 MHz). + need to know its frequency of course. Usually it's 1/4 of bus speed + (eg. 25 MHz) but some chips drive it from an external input. Probably + have to measure to be sure.
  • MUL_FFT_THRESHOLD etc: the FFT thresholds should allow a return to a previous k at certain sizes. This arises basically due to the step effect caused by size multiples effectively used for each k. -- cgit v1.2.1