Add cray inline mpn_popcount and mpn_hamdist.

Add bright idea for inner product based remainders.
author: Kevin Ryde <user42@zip.com.au> 2001-11-15 23:08:22 +0100
committer: Kevin Ryde <user42@zip.com.au> 2001-11-15 23:08:22 +0100
commit: 157cbdbfdedd9f05c29cbe704151490be0a93319 (patch)
tree: f1849c36c81084fef3e5a9fa0ab1a9e256a5e222 /doc
parent: 39afc22107404102e672a5bd541165513eead033 (diff)
download: gmp-157cbdbfdedd9f05c29cbe704151490be0a93319.tar.gz
1 files changed, 12 insertions, 3 deletions
diff --git a/doc/tasks.html b/doc/tasks.html
index ff49e4fc7..f43c80f73 100644
--- a/doc/tasks.html
+++ b/doc/tasks.html
@@ -34,7 +34,7 @@ Copyright 2000, 2001 Free Software Foundation, Inc.
 <hr>
 <!-- NB. timestamp updated automatically by emacs -->
 <comment>
-  This file current as of 8 Nov 2001.  An up-to-date version is available at
+  This file current as of 16 Nov 2001.  An up-to-date version is available at
   <a href="http://www.swox.com/gmp/tasks.html">http://www.swox.com/gmp/tasks.html</a>.
   Please send comments about this page to
   <a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>.
@@ -457,8 +457,9 @@ Copyright 2000, 2001 Free Software Foundation, Inc.
      -hpipeline3 seems promising.  We should at least up -O to -O2 or -O3.
 <li> Cray: <code>mpn_com_n</code> and <code>mpn_and_n</code> etc very probably
      wants a pragma like <code>MPN_COPY_INCR</code>.
-<li> Cray vector systems: <code>mpn_lshift</code> and <code>mpn_rshift</code>
-     are nice and small and could be inlined to avoid function calls.
+<li> Cray vector systems: <code>mpn_lshift</code>, <code>mpn_rshift</code>,
+     <code>mpn_popcount</code> and <code>mpn_hamdist</code> are nice and small
+     and could be inlined to avoid function calls.
 <li> Cray: Variable length arrays seem to be faster than the tal-notreent.c
      scheme.  Not sure why, maybe they merely give the compiler more
      information about aliasing (or the lack thereof).  Would like to modify
@@ -821,6 +822,14 @@ near future, but are at least worth thinking about.
      future scheme for allowing out-of-memory or divide-by-zero exceptions.
      Though such things may or may not be feasible, it seems wisest not to
      close the door on them yet.
+<li> Nx1 remainders can be taken at multiplier throughput speed by
+     pre-calculating an array "p[i] = 2^(i*<code>BITS_PER_MP_LIMB</code>) mod
+     m", then for the input limbs x calculating an inner product "sum
+     p[i]*x[i]", and a final 3x1 limb remainder mod m.  If those powers take
+     roughly N divide steps to calculate then there'd be an advantage any time
+     the same m is used three or more times.  Suggested by Victor Shoup in
+     connection with chinese-remainder style decompositions, but perhaps with
+     other uses.
 </ul>
 <hr>
author	Kevin Ryde <user42@zip.com.au>	2001-11-15 23:08:22 +0100
committer	Kevin Ryde <user42@zip.com.au>	2001-11-15 23:08:22 +0100
commit	157cbdbfdedd9f05c29cbe704151490be0a93319 (patch)
tree	f1849c36c81084fef3e5a9fa0ab1a9e256a5e222 /doc
parent	39afc22107404102e672a5bd541165513eead033 (diff)
download	gmp-157cbdbfdedd9f05c29cbe704151490be0a93319.tar.gz