[mulders.c] fixed bug in mpfr_divhigh_n (this routine was not used yet)

[div.c] now use Mulders' short product for large division. It remains to do the automatic tuning of MPFR_DIV_THRESHOLD. The speedup is nice, for example on my Core 2 Duo laptop we got with MPFR 3.0.0: [zimmerma@coing tests]$ ./timings-mpfr 1000 Using MPFR-3.0.0 with GMP-5.0.2 [precision is 3322 bits] x*y took 0.004543 ms (262143 eval in 1191 ms) x*x took 0.003616 ms (524287 eval in 1896 ms) x/y took 0.009087 ms (131071 eval in 1191 ms) sqrt(x) took 0.007004 ms (262143 eval in 1836 ms) exp(x) took 0.293040 ms (4095 eval in 1200 ms) log(x) took 0.253724 ms (4095 eval in 1039 ms) sin(x) took 0.306960 ms (4095 eval in 1257 ms) cos(x) took 0.290842 ms (4095 eval in 1191 ms) arccos(x) took 0.590620 ms (2047 eval in 1209 ms) arctan(x) took 0.560332 ms (2047 eval in 1147 ms) and now we get: [zimmerma@coing tests]$ ./timings-mpfr 1000 Using MPFR-3.1.0-dev with GMP-5.0.2 [precision is 3322 bits] x*y took 0.004444 ms (262143 eval in 1165 ms) x*x took 0.002686 ms (524287 eval in 1408 ms) x/y took 0.006989 ms (262143 eval in 1832 ms) sqrt(x) took 0.007084 ms (262143 eval in 1857 ms) exp(x) took 0.292063 ms (4095 eval in 1196 ms) log(x) took 0.246886 ms (4095 eval in 1011 ms) sin(x) took 0.259096 ms (4095 eval in 1061 ms) cos(x) took 0.244933 ms (4095 eval in 1003 ms) arccos(x) took 0.556424 ms (2047 eval in 1139 ms) arctan(x) took 0.526624 ms (2047 eval in 1078 ms) We see that other routines also benefit from the speedup in mpfr_sqr and mpfr_div (log, sin, cos, arccos, arctan). git-svn-id: svn://scm.gforge.inria.fr/svn/mpfr/trunk@7765 280ebfd0-de03-0410-8827-d642c229c3f4
author: zimmerma <zimmerma@280ebfd0-de03-0410-8827-d642c229c3f4> 2011-07-29 20:15:02 +0000
committer: zimmerma <zimmerma@280ebfd0-de03-0410-8827-d642c229c3f4> 2011-07-29 20:15:02 +0000
commit: 0539f0130741bf6b8c6147ac4a8e443af584c784 (patch)
tree: 804e897dec006d82fba5812a8f2f1d9e3b0f88fc /src/mulders.c
parent: 08a16d3f27470b0cbe21f9699caa54ecd0aba696 (diff)
download: mpfr-0539f0130741bf6b8c6147ac4a8e443af584c784.tar.gz
1 files changed, 7 insertions, 3 deletions
diff --git a/src/mulders.c b/src/mulders.c
index 8225448cd..efa694026 100644
--- a/src/mulders.c
+++ b/src/mulders.c
@@ -205,14 +205,15 @@ mpfr_divhigh_n (mpfr_limb_ptr qp, mpfr_limb_ptr np, mpfr_limb_ptr dp,
   mpfr_limb_ptr tp;
   MPFR_TMP_DECL(marker);
 
-  k = divhigh_ktab[n];
-  MPFR_ASSERTD ((n+4)/2 <= k && k <= n); /* bounds from [1] */
+  MPFR_ASSERTN (MPFR_MULHIGH_TAB_SIZE >= 15); /* so that 2*(n/3) >= (n+4)/2 */
+  k = MPFR_LIKELY (n < MPFR_DIVHIGH_TAB_SIZE) ? divhigh_ktab[n] : 2*(n/3);
 
   /* for k=n, we use a full division (mpn_divrem) */
 
   if (k == n)
     return mpn_divrem (qp, 0, np, 2 * n, dp, n);
 
+  MPFR_ASSERTD ((n+4)/2 <= k && k < n); /* bounds from [1] */
   MPFR_TMP_MARK (marker);
   l = n - k;
   /* first divide the most significant 2k limbs from N by the most significant
@@ -221,11 +222,14 @@ mpfr_divhigh_n (mpfr_limb_ptr qp, mpfr_limb_ptr np, mpfr_limb_ptr dp,
 
   /* it remains {np,2l+k} = {np,n+l} as remainder */
 
-  /* now we have to subtract high(Q1)*D0 where Q1={qp+l,k} and D0={dp,l} */
+  /* now we have to subtract high(Q1)*D0 where Q1=qh*B^k+{qp+l,k} and
+     D0={dp,l} */
   tp = MPFR_TMP_LIMBS_ALLOC (2 * l);
   mpfr_mulhigh_n (tp, qp + k, dp, l);
   /* we are only interested in the upper l limbs from {tp,2l} */
   cy = mpn_sub_n (np + n, np + n, tp + l, l);
+  if (qh)
+    cy += mpn_sub_n (np + n, np + n, dp, l);
   while (cy > 0) /* Q1 was too large: subtract 1 to Q1 and add D to np+l */
     {
       qh -= mpn_sub_1 (qp + l, qp + l, k, MPFR_LIMB_ONE);
author	zimmerma <zimmerma@280ebfd0-de03-0410-8827-d642c229c3f4>	2011-07-29 20:15:02 +0000
committer	zimmerma <zimmerma@280ebfd0-de03-0410-8827-d642c229c3f4>	2011-07-29 20:15:02 +0000
commit	0539f0130741bf6b8c6147ac4a8e443af584c784 (patch)
tree	804e897dec006d82fba5812a8f2f1d9e3b0f88fc /src/mulders.c
parent	08a16d3f27470b0cbe21f9699caa54ecd0aba696 (diff)
download	mpfr-0539f0130741bf6b8c6147ac4a8e443af584c784.tar.gz