summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--mpn/cray/README12
1 files changed, 11 insertions, 1 deletions
diff --git a/mpn/cray/README b/mpn/cray/README
index 145a2af72..14d7a006e 100644
--- a/mpn/cray/README
+++ b/mpn/cray/README
@@ -63,7 +63,7 @@ max allowed vn 2097152
number of multiplies 16
-IDEAS:
+IDEA:
* Rewrite mpn_add_n:
short cy[n + 1];
#pragma _CRI ivdep
@@ -91,3 +91,13 @@ IDEAS:
and 2, and generate cy[]. Then add operand 3 to the partial result,
and accumulate carry into cy[]. Finally propagate carry just like
in the new mpn_add_n.
+
+IDEA:
+
+Store fewer bits, perhaps 62, per limb. That brings mpn_add_n time
+down to 2.5 cycles/limb and mpn_addmul_1 times to 4 cycles/limb. By
+storing even fewer bits per limb, perhaps 56, it would be possible to
+write a mul_mul_basecase that would run at effectively 1 cycle/limb.
+(Use VM here to better handle the romb-shaped multiply area, perhaps
+rouding operand sizes up to the next power of 2.)
+