Merged the following changesets from the trunk:

r10664-10686,10689-10690,10692,10695-10696,10699-10775 i.e. all the latest changes except for src/{add1sp.c,sub1sp.c} to avoid build failures (to be solved later). Currently only tcan_round fails, due to MPFR_RNDF. git-svn-id: svn://scm.gforge.inria.fr/svn/mpfr/branches/faithful@10776 280ebfd0-de03-0410-8827-d642c229c3f4
author: vlefevre <vlefevre@280ebfd0-de03-0410-8827-d642c229c3f4> 2016-09-02 07:13:38 +0000
committer: vlefevre <vlefevre@280ebfd0-de03-0410-8827-d642c229c3f4> 2016-09-02 07:13:38 +0000
commit: b4df47ef18e9e516e90a97d2e6626e4dc493d813 (patch)
tree: a39018105250ca3eda936cdd57b32fbfe1fef45e
parent: 61816b69da7f34b565572ec787a6302205d3256d (diff)
download: mpfr-b4df47ef18e9e516e90a97d2e6626e4dc493d813.tar.gz
43 files changed, 1988 insertions, 491 deletions
diff --git a/INSTALL b/INSTALL
index 66dff6275..97abc121b 100644
--- a/INSTALL
+++ b/INSTALL
@@ -516,6 +516,19 @@ you should set the OBJECT_MODE environment variable to 64, e.g., with:
 line: AR="ar -X64" NM="nm -B -X64".
 
 
+Notes on Solaris
+================
+
+Do not put a -R<dir> option in the LD_OPTIONS environment variable, at least
+if the directory <dir> can contain an MPFR library. Otherwise this MPFR
+library may be chosen for the tests (make check) instead of the one that has
+just been built, in which case, either you will get errors due to unmatched
+versions or this problem may remain undetected. The reason is that this
+option will appear before the -R options added by libtool, such as the one
+to the src/.libs directory containing the MPFR library that has just been
+built, and will have the precedence.
+
+
 MPFR for use with Windows Applications
 ======================================
 
@@ -531,7 +544,9 @@ a. Using MinGW
     it also provides native Windows code.
 
 2 - If you just want to make a binary with gcc, there is nothing to do:
-    GMP, MPFR and the program compile exactly as under Linux.
+    GMP, MPFR and the program compile exactly as under Linux. (It is
+    recommended to pass --build=xxx-yyy-mingw64 to the GMP configure command,
+    or --build=xxx with xxx containing mingw.)
 
 3 - To avoid using the Microsoft runtime (which might not be conform to ISO C),
     you can use the MinGW runtime package (which is an integral part of MinGW).
diff --git a/NEWS b/NEWS
index c2ac55e23..9dc89dd71 100644
--- a/NEWS
+++ b/NEWS
@@ -72,6 +72,9 @@ Changes from versions 3.1.* to version 4.0.0:
 - Major speedup in mpfr_add, mpfr_sub, mpfr_mul, mpfr_div and mpfr_sqrt when
   all operands have the same precision and this precision is less than the
   number of bits per word, e.g., less than 64 on a 64-bit computer.
+- Major speedup in mpfr_add and mpfr_sub when all operands have the same
+  precision and this precision is less than two words, e.g., at most 127
+  on a 64-bit computer.
 - Speedup by a factor of almost 2 in the double <--> mpfr conversions
   (mpfr_set_d and mpfr_get_d).
 - Speedup in the mpfr_const_euler function (contributed by Fredrik Johansson),
diff --git a/TODO b/TODO
index dc72031c6..b5b97ca7a 100644
--- a/TODO
+++ b/TODO
@@ -529,11 +529,12 @@ Table of contents:
 
 - use symbol versioning.
 
-- check whether mpz_t caching is necessary. Timings with -static with
-  details about the C / C library implementation should be put somewhere
-  as a comment in the source or in the doc. Using -static is important
-  because otherwise the cache saves the dynamic call to mpz_init and
-  mpz_clear; so, what we're measuring is not clear. See thread:
+- check whether mpz_t caching (pool) is necessary. Timings with -static
+  with details about the C / C library implementation should be put
+  somewhere as a comment in the source or in the doc. Using -static
+  is important because otherwise the cache saves the dynamic call to
+  mpz_init and mpz_clear; so, what we're measuring is not clear.
+  See thread:
     https://gmplib.org/list-archives/gmp-devel/2015-September/004147.html
   Summary: It will not be integrated in GMP because 1) This yields
   problems with threading (in MPFR, we have TLS variables, but this is
@@ -549,13 +550,37 @@ Table of contents:
     arccos(x)  took 0.043580 ms (32767 eval in 1428 ms)
     arctan(x)  took 0.035401 ms (32767 eval in 1160 ms)
   mpfr_acos doesn't use mpz, but calls mpfr_atan, so that the issue comes
-  from mpfr_atan, which uses mpz a lot. But there are very few mpz_init
-  and mpz_clear (just at the end). So, the differences in the timings are
-  very surprising, even with a bad malloc implementation. Or is the reason
-  due to that due to mpz_t caching, the mpz_t's are preallocated with a
-  size that is large enough, thus avoiding internal reallocations in GMP
-  (with memory copies)? In such a case, mpz_t caching would not be the
-  solution, but mpz_init2 (see GMP manual, 3.11 "Efficiency").
+  from mpfr_atan, which uses mpz a lot. The problem mainly comes from the
+  reallocations in GMP because mpz_init is used instead of mpz_init2 with
+  the estimated maximum size. Other places in the code that uses mpz_init
+  may be concerned.
+  Issues with mpz_t caching:
+    * The pool can take much memory, which may no longer be useful.
+      For instance:
+        mpfr_init2 (x, 10000000);
+        mpfr_log_ui (x, 17, MPFR_RNDN);
+        /* ... */
+        mpfr_clear (x);
+        /* followed by code using only small precision */
+      while contrary to real caches, they contain no data. This is not
+      valuable memory: freeing/allocating a large block of memory is
+      much faster than the actual computations, so that mpz_t caching
+      has no impact on the performance in such cases. A pool with large
+      blocks also potentially destroys the data locality.
+    * It assumes that the real GMP functions are __gmpz_init and
+      __gmpz_clear, which are not part of the official GMP API, thus
+      is based on GMP internals, which may change in the future or
+      may be different in forks / compatible libraries / etc. This
+      can be solved if MPFR code calls mpfr_mpz_init / mpfr_mpz_clear
+      directly, avoiding the #define's.
+  If it is decided to keep some form of mpz_t caching, a possible solution
+  for both issues: define mpfr_mpz_init2 and mpfr_mpz_clear2, which both
+  take 2 arguments like mpz_init2, where mpfr_mpz_init2 behaves in a way
+  similar to mpz_init2, and mpfr_mpz_clear2 behaves in a way similar to
+  mpz_clear but where the size argument is a hint for the pool; if it is
+  too large, then the mpz_t should not be pushed back to the pool. The
+  size argument of mpfr_mpz_init2 could also be a hint to decide which
+  element to pull from the pool.
 
 - in tsum, add testcases for mpfr_sum triggering the bug fixed in r9722,
   that is, with a large error during the computation of the secondary term
@@ -586,6 +611,18 @@ Table of contents:
   (The one done in r10573 allowed us to find a bug even without
   assertion checking.)
 
+- after the fix of mpfr_can_round in r9883, tzeta has a different trace
+  from r9882, which may be expected as mpfr_can_round has an influence
+  in tgeneric.c, but a closer look is needed: check only the first
+  difference (with various GMP_CHECK_RANDOMIZE values), since after it,
+  the random numbers do not match.
+  With the 32-bit ABI, tzeta r9955 is even 2.17 times as slow as r9954;
+  the corresponding change:
+    -  test_generic (2, 70, 5);
+    +  test_generic (MPFR_PREC_MIN, 70, 5);
+  Find the real cause, whether a wrong result could be obtained before
+  the fix r9883, and whether the slowdown can be avoided.
+
 
 ##############################################################################
 7. Portability
diff --git a/doc/README.dev b/doc/README.dev
index 955fcaaab..0476892fd 100644
--- a/doc/README.dev
+++ b/doc/README.dev
@@ -229,6 +229,9 @@ To make a release (for the MPFR team):
      platforms where the endianness is unknown (or can't be specified
      without AC_CONFIG_HEADERS).
      Check also without mpz_t caching (-DMPFR_MY_MPZ_INIT=0).
+     Check with -DMPFR_GENERIC_ABI to test the generic code, not tied to
+     a particular ABI; this is useful when there is specific MPFR code
+     for both GMP_NUMB_BITS == 32 and GMP_NUMB_BITS == 64.
 
      Check that make and make check pass with a C++ compiler, for example:
      ./configure CC=g++ (MPFR 2.3.2 did not).
@@ -304,7 +307,7 @@ To make a release (for the MPFR team):
      MPFR_SUSPICIOUS_OVERFLOW).
 
      Check there is no branch misprediction due to wrong MPFR_LIKELY or
-     MPFR_UNLIKELY statements. For that test, configure with 
+     MPFR_UNLIKELY statements. For that test, configure with
      --enable-debug-prediction, run "timings-mpfr 100", and check that
      the output contains no WARNING.
 
@@ -433,6 +436,9 @@ Format of long double.
 + MPFR_HAVE_BUILTIN_UNREACHABLE: Define if the __builtin_unreachable
                         GCC built-in is supported.
 
++ MPFR_GENERIC_ABI:     Define to disable code that is tied to a specific
+                        ABI (e.g. GMP_NUMB_BITS value).
+
 List of macros used for checking MPFR:
 
 + MPFR_HAVE_FESETROUND: Define if the function fesetround() is available
@@ -560,6 +566,13 @@ Quoted from <http://www.gnu.org/software/gcc/codingconventions.html>:
   or field names. Explicit casts should be used to convert between
   void* and other pointer types.
 
+When a string literal ("...") is followed by a macro name, there
+must be white space between them, otherwise this is parsed as a
+user-defined string literal in C++11:
+
+  http://en.cppreference.com/w/cpp/language/user_literal
+  http://stackoverflow.com/a/6402166/3782797
+
    =====================================================================
 
 C-Reduce may be useful to try to identify whether a bug comes from the
@@ -666,6 +679,11 @@ Do not use TMP_DECL / TMP_ALLOC, ... but MPFR_TMP_DECL, MPFR_TMP_ALLOC, ...
 In the tests, use only tests_allocate, tests_reallocate and tests_free
 (there may be some rare exceptions, such as in tabort_defalloc*.c).
 
+Avoid code that would yield unnecessary reallocations, which can be very
+expensive. In particular, for code that is based on the mpz layer of GMP,
+do not use mpz_init, but mpz_init2 with the estimated maximum size; it is
+better to overestimate this size a bit than underestimating it.
+
 ===========================================================================
 
 Do not use C99-only features, such as empty macro arguments or C++-style
diff --git a/doc/mpfr.texi b/doc/mpfr.texi
index 5758bd95a..61027be05 100644
--- a/doc/mpfr.texi
+++ b/doc/mpfr.texi
@@ -3,7 +3,7 @@
 @setfilename mpfr.info
 @documentencoding UTF-8
 @set VERSION 4.0.0-dev
-@set UPDATED-MONTH June 2016
+@set UPDATED-MONTH August 2016
 @settitle GNU MPFR @value{VERSION}
 @synindex tp fn
 @iftex
@@ -2272,10 +2272,23 @@ calculations if a lower or equal precision is requested. To free these caches,
 use @code{mpfr_free_cache} or @code{mpfr_free_cache2}.
 @end deftypefun
 
+@c FIXME: It would be better to have a dedicated function concerning
+@c mp_set_memory_functions (mpfr_free_cache is there mainly to free some
+@c memory, not to solve issues with mp_set_memory_functions, even if
+@c these issues are currently solved exactly by calling mpfr_free_cache),
+@c in case we want to change things in the future (new features, etc.).
+@c Moreover, a documentation of memory allocation in MPFR should be given
+@c (the user should know how memory is allocated if this can affect his
+@c programs). Recommendations for libraries using MPFR should also be
+@c given: if they keep MPFR objects on their side, a clean-up is needed,
+@c while mpfr_free_cache (or the dedicated function) would not touch
+@c them.
 @deftypefun void mpfr_free_cache (void)
 Free various caches used by MPFR internally (thoses local to the
 current thread and shared by all threads).
-You should call this function before terminating a thread, even if you did
+In addition to clearing all MPFR variables,
+you should call this function before terminating a thread, of before calling
+@code{mp_set_memory_functions}, even if you did
 not call @code{mpfr_const_*} functions directly (they could have been called
 internally).
 @end deftypefun
diff --git a/src/atan.c b/src/atan.c
index f4c602b04..335144481 100644
--- a/src/atan.c
+++ b/src/atan.c
@@ -274,7 +274,7 @@ mpfr_atan (mpfr_ptr atan, mpfr_srcptr x, mpfr_rnd_t rnd_mode)
   prec = realprec + GMP_NUMB_BITS;
 
   /* Initialisation */
-  mpz_init (ukz);
+  mpz_init2 (ukz, prec); /* ukz will need 'prec' bits below */
   MPFR_GROUP_INIT_4 (group, prec, sk, tmp, tmp2, arctgt);
   oldn0 = 0;
 
@@ -305,18 +305,12 @@ mpfr_atan (mpfr_ptr atan, mpfr_srcptr x, mpfr_rnd_t rnd_mode)
       /* Initialisation */
       MPFR_GROUP_REPREC_4 (group, prec, sk, tmp, tmp2, arctgt);
       MPFR_ASSERTD (n0 <= MPFR_PREC_BITS);
-      if (MPFR_LIKELY (oldn0 == 0))
-        {
-          oldn0 = 3 * (n0 + 1);
-          for (i = 0; i < oldn0; i++)
-            mpz_init (tabz[i]);
-        }
-      else if (oldn0 < 3 * (n0 + 1))
-        {
-          for (i = oldn0; i < 3 * (n0 + 1); i++)
-            mpz_init (tabz[i]);
-          oldn0 = 3 * (n0 + 1);
-        }
+      /* Note: the tabz[] entries are used to get a rational approximation
+         of atan(x) to precision 'prec', thus allocating them to 'prec' bits
+         should be good enough. */
+      for (i = oldn0; i < 3 * (n0 + 1); i++)
+        mpz_init2 (tabz[i], prec);
+      oldn0 = 3 * (n0 + 1);
 
       /* The mpfr_ui_div below mustn't underflow. This is guaranteed by
          MPFR_SAVE_EXPO_MARK, but let's check that for maintainability. */
diff --git a/src/atan2.c b/src/atan2.c
index 1f0b298a1..93c566e2a 100644
--- a/src/atan2.c
+++ b/src/atan2.c
@@ -154,7 +154,7 @@ mpfr_atan2 (mpfr_ptr dest, mpfr_srcptr y, mpfr_srcptr x, mpfr_rnd_t rnd_mode)
 
   /* When x is a power of two, we call directly atan(y/x) since y/x is
      exact. */
-  if (MPFR_UNLIKELY (MPFR_IS_POWER_OF_2 (x)))
+  if (MPFR_UNLIKELY (MPFR_IS_POS (x) && mpfr_powerof2_raw (x)))
     {
       int r;
       mpfr_t yoverx;
diff --git a/src/cos.c b/src/cos.c
index b2b864345..36d2b0dfc 100644
--- a/src/cos.c
+++ b/src/cos.c
@@ -49,7 +49,7 @@ mpfr_cos2_aux (mpfr_ptr f, mpfr_srcptr r)
 
   /* compute minimal i such that i*(i+1) does not fit in an unsigned long,
      assuming that there are no padding bits. */
-  maxi = 1UL << (CHAR_BIT * sizeof(unsigned long) / 2);
+  maxi = 1UL << (sizeof(unsigned long) * CHAR_BIT / 2);
   if (maxi * (maxi / 2) == 0) /* test checked at compile time */
     {
       /* can occur only when there are padding bits. */
diff --git a/src/div.c b/src/div.c
index baf840fa5..8b8e4a686 100644
--- a/src/div.c
+++ b/src/div.c
@@ -29,34 +29,59 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
 #define MPFR_NEED_LONGLONG_H
 #include "mpfr-impl.h"
 
+#if !defined(MPFR_GENERIC_ABI)
+
 #if defined(WANT_GMP_INTERNALS) && defined(HAVE___GMPN_INVERT_LIMB)
 /* The following macro is copied from GMP-6.1.1, file gmp-impl.h,
-   macro udiv_qrnnd_preinv, and specialized to the case nl=0.
-   It computes q and r such that nh*2^GMP_NUMB_BITS = q*d + r,
-   with 0 <= r < d. */
-#define __udiv_qrnnd_preinv(q, r, nh, d)                                \
+   macro udiv_qrnnd_preinv.
+   It computes q and r such that nh*2^GMP_NUMB_BITS + nl = q*d + r,
+   with 0 <= r < d, assuming di = __gmpn_invert_limb (d). */
+#define __udiv_qrnnd_preinv(q, r, nh, nl, d, di)                        \
+  do {                                                                  \
+    mp_limb_t _qh, _ql, _r, _mask;                                      \
+    umul_ppmm (_qh, _ql, (nh), (di));                                   \
+    if (__builtin_constant_p (nl) && (nl) == 0)                         \
+      {                                                                 \
+        _qh += (nh) + 1;                                                \
+        _r = - _qh * (d);                                               \
+        _mask = -(mp_limb_t) (_r > _ql); /* both > and >= are OK */     \
+        _qh += _mask;                                                   \
+        _r += _mask & (d);                                              \
+      }                                                                 \
+    else                                                                \
+      {                                                                 \
+        add_ssaaaa (_qh, _ql, _qh, _ql, (nh) + 1, (nl));                \
+        _r = (nl) - _qh * (d);                                          \
+        _mask = -(mp_limb_t) (_r > _ql); /* both > and >= are OK */     \
+        _qh += _mask;                                                   \
+        _r += _mask & (d);                                              \
+        if (MPFR_UNLIKELY (_r >= (d)))                                  \
+          {                                                             \
+            _r -= (d);                                                  \
+            _qh++;                                                      \
+          }                                                             \
+      }                                                                 \
+    (r) = _r;                                                           \
+    (q) = _qh;                                                          \
+  } while (0)
+
+/* specialized version for nl = 0, with di computed inside */
+#define __udiv_qrnd_preinv(q, r, nh, d)                                 \
   do {                                                                  \
-    mp_limb_t _qh, _ql, _r, _mask, _di;                                 \
+    mp_limb_t _di;                                                      \
                                                                         \
     MPFR_ASSERTD ((d) != 0);                                            \
     MPFR_ASSERTD ((nh) < (d));                                          \
     MPFR_ASSERTD ((d) & MPFR_LIMB_HIGHBIT);                             \
                                                                         \
     _di = __gmpn_invert_limb (d);                                       \
-    umul_ppmm (_qh, _ql, (nh), _di);                                    \
-    _qh += (nh) + 1;                                                    \
-    _r = - _qh * (d);                                                   \
-    _mask = -(mp_limb_t) (_r > _ql); /* both > and >= are OK */         \
-    _qh += _mask;                                                       \
-    _r += _mask & (d);                                                  \
-    (r) = _r;                                                           \
-    (q) = _qh;                                                          \
+    __udiv_qrnnd_preinv (q, r, nh, 0, d, _di);                          \
   } while (0)
 #else
 /* Same as __udiv_qrnnd_c from longlong.h, but using a single UWType/UWtype
    division instead of two, and with n0=0. The analysis below assumes a limb
    has 64 bits for simplicity. */
-#define __udiv_qrnnd_preinv(q, r, n1, d)                                \
+#define __udiv_qrnd_preinv(q, r, n1, d)                                 \
   do {                                                                  \
     UWtype __d1, __d0, __q1, __q0, __r1, __r0, __i;                     \
                                                                         \
@@ -119,6 +144,341 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
   } while (0)
 #endif
 
+/* special code for p=PREC(q) < GMP_NUMB_BITS,
+   and PREC(u), PREC(v) <= GMP_NUMB_BITS */
+static int
+mpfr_divsp1 (mpfr_ptr q, mpfr_srcptr u, mpfr_srcptr v, mpfr_rnd_t rnd_mode)
+{
+  mpfr_prec_t p = MPFR_GET_PREC(q);
+  mpfr_limb_ptr up = MPFR_MANT(u);
+  mpfr_limb_ptr vp = MPFR_MANT(v);
+  mpfr_limb_ptr qp = MPFR_MANT(q);
+  mpfr_exp_t qx = MPFR_GET_EXP(u) - MPFR_GET_EXP(v);
+  mpfr_prec_t sh = GMP_NUMB_BITS - p;
+  mp_limb_t h, rb, sb, mask = MPFR_LIMB_MASK(sh);
+
+  if (up[0] >= vp[0])
+    {
+      if (p < GMP_NUMB_BITS / 2 && MPFR_PREC(v) <= GMP_NUMB_BITS / 2)
+        {
+          mp_limb_t v = vp[0] >> (GMP_NUMB_BITS / 2);
+          h = (up[0] - vp[0]) / v;
+          sb = (up[0] - vp[0]) % v;
+          h <<= GMP_NUMB_BITS / 2;
+          sb <<= GMP_NUMB_BITS / 2;
+        }
+      else
+        __udiv_qrnd_preinv (h, sb, up[0] - vp[0], vp[0]);
+      /* Noting W = 2^GMP_NUMB_BITS, we have up[0]*W = (W + h) * vp[0] + sb,
+         thus up[0]/vp[0] = 1 + h/W + sb/vp[0]/W, with 0 <= sb < vp[0]. */
+      qx ++;
+      sb |= h & 1;
+      h = MPFR_LIMB_HIGHBIT | (h >> 1);
+      rb = h & (MPFR_LIMB_ONE << (sh - 1));
+      sb |= (h & mask) ^ rb;
+      qp[0] = h & ~mask;
+    }
+  else
+    {
+      if (p < GMP_NUMB_BITS / 2 && MPFR_PREC(v) <= GMP_NUMB_BITS / 2)
+        {
+          mp_limb_t v = vp[0] >> (GMP_NUMB_BITS / 2);
+          h = up[0] / v;
+          sb = up[0] % v;
+          h <<= GMP_NUMB_BITS / 2;
+          sb <<= GMP_NUMB_BITS / 2;
+        }
+      else
+        __udiv_qrnd_preinv (h, sb, up[0], vp[0]);
+      /* now up[0]*2^GMP_NUMB_BITS = h*vp[0] + sb */
+      rb = h & (MPFR_LIMB_ONE << (sh - 1));
+      sb |= (h & mask) ^ rb;
+      qp[0] = h & ~mask;
+    }
+
+  MPFR_SIGN(q) = MPFR_MULT_SIGN (MPFR_SIGN (u), MPFR_SIGN (v));
+
+  /* rounding */
+  if (qx > __gmpfr_emax)
+    return mpfr_overflow (q, rnd_mode, MPFR_SIGN(q));
+
+  /* Warning: underflow should be checked *after* rounding, thus when rounding
+     away and when q > 0.111...111*2^(emin-1), or when rounding to nearest and
+     q >= 0.111...111[1]*2^(emin-1), there is no underflow. */
+  if (qx < __gmpfr_emin)
+    {
+      /* for RNDN, mpfr_underflow always rounds away, thus for |q|<=2^(emin-2)
+         we have to change to RNDZ */
+      if (rnd_mode == MPFR_RNDN)
+        {
+          if ((qx == __gmpfr_emin - 1) && (qp[0] == ~mask) && rb)
+            goto rounding; /* no underflow */
+          if (qx < __gmpfr_emin - 1 || (qp[0] == MPFR_LIMB_HIGHBIT && sb == 0))
+            rnd_mode = MPFR_RNDZ;
+        }
+      else if (!MPFR_IS_LIKE_RNDZ(rnd_mode, MPFR_IS_NEG (q)))
+        {
+          if ((qx == __gmpfr_emin - 1) && (qp[0] == ~mask) && (rb | sb))
+            goto rounding; /* no underflow */
+        }
+      return mpfr_underflow (q, rnd_mode, MPFR_SIGN(q));
+    }
+
+ rounding:
+  MPFR_EXP (q) = qx; /* Don't use MPFR_SET_EXP since qx might be < __gmpfr_emin
+                        in the cases "goto rounding" above. */
+  if (rb == 0 && sb == 0)
+    {
+      MPFR_ASSERTD(qx >= __gmpfr_emin);
+      return 0; /* idem than MPFR_RET(0) but faster */
+    }
+  else if (rnd_mode == MPFR_RNDN)
+    {
+      if (rb == 0 || (rb && sb == 0 &&
+                      (qp[0] & (MPFR_LIMB_ONE << sh)) == 0))
+        goto truncate;
+      else
+        goto add_one_ulp;
+    }
+  else if (MPFR_IS_LIKE_RNDZ(rnd_mode, MPFR_IS_NEG(q)))
+    {
+    truncate:
+      MPFR_ASSERTD(qx >= __gmpfr_emin);
+      MPFR_RET(-MPFR_SIGN(q));
+    }
+  else /* round away from zero */
+    {
+    add_one_ulp:
+      qp[0] += MPFR_LIMB_ONE << sh;
+      if (qp[0] == 0)
+        {
+          qp[0] = MPFR_LIMB_HIGHBIT;
+          if (MPFR_UNLIKELY(qx + 1 > __gmpfr_emax))
+            return mpfr_overflow (q, rnd_mode, MPFR_SIGN(q));
+          MPFR_ASSERTD(qx + 1 <= __gmpfr_emax);
+          MPFR_ASSERTD(qx + 1 >= __gmpfr_emin);
+          MPFR_SET_EXP (q, qx + 1);
+        }
+      MPFR_RET(MPFR_SIGN(q));
+    }
+}
+
+#if defined(WANT_GMP_INTERNALS) && defined(HAVE___GMPN_INVERT_LIMB)
+/* special code for GMP_NUMB_BITS < PREC(q) < 2*GMP_NUMB_BITS and
+   GMP_NUMB_BITS < PREC(u), PREC(v) <= 2*GMP_NUMB_BITS */
+static int
+mpfr_divsp2 (mpfr_ptr q, mpfr_srcptr u, mpfr_srcptr v, mpfr_rnd_t rnd_mode)
+{
+  mpfr_prec_t p = MPFR_GET_PREC(q);
+  mpfr_limb_ptr up = MPFR_MANT(u);
+  mpfr_limb_ptr vp = MPFR_MANT(v);
+  mpfr_limb_ptr qp = MPFR_MANT(q);
+  mpfr_exp_t qx = MPFR_GET_EXP(u) - MPFR_GET_EXP(v);
+  mpfr_prec_t sh = 2*GMP_NUMB_BITS - p;
+  mp_limb_t inv, h, rb, sb, mask = MPFR_LIMB_MASK(sh);
+  mp_limb_t q1, q0, r3, r2, r1, r0, l, t;
+  int extra;
+
+  inv = __gmpn_invert_limb (vp[1]);
+  extra = up[1] > vp[1] || (up[1] == vp[1] && up[0] >= vp[0]);
+  if (extra)
+    sub_ddmmss (r3, r2, up[1], up[0], vp[1], vp[0]);
+  else
+    r3 = up[1], r2 = up[0];
+
+  MPFR_ASSERTD(r3 < vp[1] || (r3 == vp[1] && r2 < vp[0]));
+
+  if (MPFR_UNLIKELY(r3 == vp[1])) /* can occur in some rare cases */
+    {
+      /* This can only occur in case extra=0, since otherwise we would have
+         u_old >= u_new + v >= B^2/2 + B^2/2 = B^2. In this case we have
+         r3 = u1 and r2 = u0, thus the remainder u*B-q1*v is
+         v1*B^2+u0*B-(B-1)*(v1*B+v0) = (u0-v0+v1)*B+v0.
+         Warning: in this case q1 = B-1 can be too large, for example with
+         u = B^2/2 and v = B^2/2 + B - 1, then u*B-(B-1)*u = -1/2*B^2+2*B-1. */
+      MPFR_ASSERTD(extra == 0);
+      q1 = ~MPFR_LIMB_ZERO;
+      r1 = vp[0];
+      t = vp[0] - up[0]; /* t > 0 since u < v */
+      r2 = vp[1] - t;
+      if (t > vp[1]) /* q1 = B-1 is too large, we need q1 = B-2, which is ok
+                        since u*B - q1*v >= v1*B^2-(B-2)*(v1*B+B-1) =
+                        -B^2 + 2*B*v1 + 3*B - 2 >= 0 since v1>=B/2 and B>=2 */
+        {
+          q1 --;
+          /* add v to r2:r1 */
+          r1 += vp[0];
+          r2 += vp[1] + (r1 < vp[0]);
+        }
+    }
+  else
+    {
+      __udiv_qrnnd_preinv (q1, r2, r3, r2, vp[1], inv);
+      /* u-extra*v = q1 * v1 + r2 */
+
+      /* now subtract q1*v0 to r2:0 */
+      umul_ppmm (h, l, q1, vp[0]);
+      t = r2; /* save old value of r2 */
+      r1 = -l;
+      r2 -= h + (l != 0);
+      /* Note: h + (l != 0) < 2^GMP_NUMB_BITS. */
+
+      /* we have r2:r1 = oldr2:0 - q1*v0 mod 2^(2*GMP_NUMB_BITS)
+         thus (u-extra*v)*B = q1 * v + r2:r1 mod 2^(2*GMP_NUMB_BITS) */
+
+      while (r2 > t) /* borrow when subtracting h + (l != 0), q1 too large */
+        {
+          q1 --;
+          /* add v1:v0 to r2:r1 */
+          t = r2;
+          r1 += vp[0];
+          r2 += vp[1] + (r1 < vp[0]);
+          /* note: since 2^(GMP_NUMB_BITS-1) <= vp[1] + (r1 < vp[0])
+             <= 2^GMP_NUMB_BITS, it suffices to check if r2 <= t to see
+             if there was a carry or not. */
+        }
+    }
+
+  /* now (u-extra*v)*B = q1 * v + r2:r1 with 0 <= r2:r1 < v */
+
+  MPFR_ASSERTD(r2 < vp[1] || (r2 == vp[1] && r1 < vp[0]));
+
+  if (MPFR_UNLIKELY(r2 == vp[1]))
+    {
+      q0 = ~MPFR_LIMB_ZERO;
+      /* r2:r1:0 - q0*(v1:v0) = v1:r1:0 - (B-1)*(v1:v0)
+         = r1:0 - v0:0 + v1:v0 */
+      r0 = vp[0];
+      t = vp[0] - r1; /* t > 0 since r2:r1 < v1:v0 */
+      r1 = vp[1] - t;
+      if (t > vp[1])
+        {
+          q0 --;
+          /* add v to r1:r0 */
+          r0 += vp[0];
+          r1 += vp[1] + (r0 < vp[0]);
+        }
+    }
+  else
+    {
+      /* divide r2:r1 by v1 */
+      __udiv_qrnnd_preinv (q0, r1, r2, r1, vp[1], inv);
+
+      /* r2:r1 = q0*v1 + r1 */
+
+      /* subtract q0*v0 to r1:0 */
+      umul_ppmm (h, l, q0, vp[0]);
+      t = r1;
+      r0 = -l;
+      r1 -= h + (l != 0);
+
+      while (r1 > t) /* borrow when subtracting h + (l != 0),
+                        q0 was too large */
+        {
+          q0 --;
+          /* add v1:v0 to r1:r0 */
+          t = r1;
+          r0 += vp[0];
+          r1 += vp[1] + (r0 < vp[0]);
+          /* note: since 2^(GMP_NUMB_BITS-1) <= vp[1] + (r0 < vp[0])
+             <= 2^GMP_NUMB_BITS, it suffices to check if r1 <= t to see
+             if there was a carry or not. */
+        }
+    }
+
+  MPFR_ASSERTD(r1 < vp[1] || (r1 == vp[1] && r0 < vp[0]));
+
+  /* now (u-extra*v)*B^2 = (q1:q0) * v + r1:r0 */
+
+  sb = r1 | r0;
+
+  if (extra)
+    {
+      qx ++;
+      sb |= q0 & 1;
+      q0 = (q1 << (GMP_NUMB_BITS - 1)) | (q0 >> 1);
+      q1 = MPFR_LIMB_HIGHBIT | (q1 >> 1);
+    }
+  rb = q0 & (MPFR_LIMB_ONE << (sh - 1));
+  sb |= (q0 & mask) ^ rb;
+  qp[1] = q1;
+  qp[0] = q0 & ~mask;
+
+  MPFR_SIGN(q) = MPFR_MULT_SIGN (MPFR_SIGN (u), MPFR_SIGN (v));
+
+  /* rounding */
+  if (qx > __gmpfr_emax)
+    return mpfr_overflow (q, rnd_mode, MPFR_SIGN(q));
+
+  /* Warning: underflow should be checked *after* rounding, thus when rounding
+     away and when q > 0.111...111*2^(emin-1), or when rounding to nearest and
+     q >= 0.111...111[1]*2^(emin-1), there is no underflow. */
+  if (qx < __gmpfr_emin)
+    {
+      /* for RNDN, mpfr_underflow always rounds away, thus for |q|<=2^(emin-2)
+         we have to change to RNDZ */
+      if (rnd_mode == MPFR_RNDN)
+        {
+          if ((qx == __gmpfr_emin - 1) &&
+              (qp[1] == ~MPFR_LIMB_ZERO && qp[0] == ~mask) && rb)
+            goto rounding; /* no underflow */
+          if (qx < __gmpfr_emin - 1 ||
+              (qp[1] == MPFR_LIMB_HIGHBIT &&
+               qp[0] == MPFR_LIMB_ZERO && sb == 0))
+            rnd_mode = MPFR_RNDZ;
+        }
+      else if (!MPFR_IS_LIKE_RNDZ(rnd_mode, MPFR_IS_NEG (q)))
+        {
+          if ((qx == __gmpfr_emin - 1) &&
+              (qp[1] == ~MPFR_LIMB_ZERO && qp[0] == ~mask) && (rb | sb))
+            goto rounding; /* no underflow */
+        }
+      return mpfr_underflow (q, rnd_mode, MPFR_SIGN(q));
+    }
+
+ rounding:
+  MPFR_EXP (q) = qx; /* Don't use MPFR_SET_EXP since qx might be < __gmpfr_emin
+                        in the cases "goto rounding" above. */
+  if (rb == 0 && sb == 0)
+    {
+      MPFR_ASSERTD(qx >= __gmpfr_emin);
+      return 0; /* idem than MPFR_RET(0) but faster */
+    }
+  else if (rnd_mode == MPFR_RNDN)
+    {
+      if (rb == 0 || (rb && sb == 0 &&
+                      (qp[0] & (MPFR_LIMB_ONE << sh)) == 0))
+        goto truncate;
+      else
+        goto add_one_ulp;
+    }
+  else if (MPFR_IS_LIKE_RNDZ(rnd_mode, MPFR_IS_NEG(q)))
+    {
+    truncate:
+      MPFR_ASSERTD(qx >= __gmpfr_emin);
+      MPFR_RET(-MPFR_SIGN(q));
+    }
+  else /* round away from zero */
+    {
+    add_one_ulp:
+      qp[0] += MPFR_LIMB_ONE << sh;
+      qp[1] += (qp[0] == 0);
+      if (qp[1] == 0)
+        {
+          qp[1] = MPFR_LIMB_HIGHBIT;
+          if (MPFR_UNLIKELY(qx + 1 > __gmpfr_emax))
+            return mpfr_overflow (q, rnd_mode, MPFR_SIGN(q));
+          MPFR_ASSERTD(qx + 1 <= __gmpfr_emax);
+          MPFR_ASSERTD(qx + 1 >= __gmpfr_emin);
+          MPFR_SET_EXP (q, qx + 1);
+        }
+      MPFR_RET(MPFR_SIGN(q));
+    }
+}
+#endif
+
+#endif /* !defined(MPFR_GENERIC_ABI) */
+
 #ifdef DEBUG2
 #define mpfr_mpn_print(ap,n) mpfr_mpn_print3 (ap,n,MPFR_LIMB_ZERO)
 static void
@@ -340,124 +700,6 @@ mpfr_div_with_mpz_tdiv_q (mpfr_ptr q, mpfr_srcptr u, mpfr_srcptr v,
   return ok;
 }
 
-/* special code for p=PREC(q) < GMP_NUMB_BITS, PREC(u), PREC(v) <= GMP_NUMB_BITS */
-static int
-mpfr_divsp1 (mpfr_ptr q, mpfr_srcptr u, mpfr_srcptr v, mpfr_rnd_t rnd_mode)
-{
-  mpfr_prec_t p = MPFR_GET_PREC(q);
-  mpfr_limb_ptr up = MPFR_MANT(u);
-  mpfr_limb_ptr vp = MPFR_MANT(v);
-  mpfr_limb_ptr qp = MPFR_MANT(q);
-  mpfr_exp_t qx = MPFR_GET_EXP(u) - MPFR_GET_EXP(v);
-  mpfr_prec_t sh = GMP_NUMB_BITS - p;
-  mp_limb_t h, rb, sb, mask = MPFR_LIMB_MASK(sh);
-
-  if (up[0] >= vp[0])
-    {
-      if (p < GMP_NUMB_BITS / 2 && MPFR_PREC(v) <= GMP_NUMB_BITS / 2)
-        {
-          mp_limb_t v = vp[0] >> (GMP_NUMB_BITS / 2);
-          h = (up[0] - vp[0]) / v;
-          sb = (up[0] - vp[0]) % v;
-          h <<= GMP_NUMB_BITS / 2;
-          sb <<= GMP_NUMB_BITS / 2;
-        }
-      else
-        __udiv_qrnnd_preinv (h, sb, up[0] - vp[0], vp[0]);
-      /* Noting W = 2^GMP_NUMB_BITS, we have up[0]*W = (W + h) * vp[0] + sb,
-         thus up[0]/vp[0] = 1 + h/W + sb/vp[0]/W, with 0 <= sb < vp[0]. */
-      qx ++;
-      sb |= h & 1;
-      h = MPFR_LIMB_HIGHBIT | (h >> 1);
-      rb = h & (MPFR_LIMB_ONE << (sh - 1));
-      sb |= (h & mask) ^ rb;
-      qp[0] = h & ~mask;
-    }
-  else
-    {
-      if (p < GMP_NUMB_BITS / 2 && MPFR_PREC(v) <= GMP_NUMB_BITS / 2)
-        {
-          mp_limb_t v = vp[0] >> (GMP_NUMB_BITS / 2);
-          h = up[0] / v;
-          sb = up[0] % v;
-          h <<= GMP_NUMB_BITS / 2;
-          sb <<= GMP_NUMB_BITS / 2;
-        }
-      else
-        __udiv_qrnnd_preinv (h, sb, up[0], vp[0]);
-      /* now up[0]*2^GMP_NUMB_BITS = h*vp[0] + sb */
-      rb = h & (MPFR_LIMB_ONE << (sh - 1));
-      sb |= (h & mask) ^ rb;
-      qp[0] = h & ~mask;
-    }
-
-  MPFR_SIGN(q) = MPFR_MULT_SIGN (MPFR_SIGN (u), MPFR_SIGN (v));
-
-  /* rounding */
-  if (qx > __gmpfr_emax)
-    return mpfr_overflow (q, rnd_mode, MPFR_SIGN(q));
-
-  /* Warning: underflow should be checked *after* rounding, thus when rounding
-     away and when q > 0.111...111*2^(emin-1), or when rounding to nearest and
-     q >= 0.111...111[1]*2^(emin-1), there is no underflow. */
-  if (qx < __gmpfr_emin)
-    {
-      /* for RNDN, mpfr_underflow always rounds away, thus for |q|<=2^(emin-2)
-         we have to change to RNDZ */
-      if (rnd_mode == MPFR_RNDN)
-        {
-          if ((qx == __gmpfr_emin - 1) && (qp[0] == ~mask) && rb)
-            goto rounding; /* no underflow */
-          if (qx < __gmpfr_emin - 1 || (qp[0] == MPFR_LIMB_HIGHBIT && sb == 0))
-            rnd_mode = MPFR_RNDZ;
-        }
-      else if (!MPFR_IS_LIKE_RNDZ(rnd_mode, MPFR_IS_NEG (q)))
-        {
-          if ((qx == __gmpfr_emin - 1) && (qp[0] == ~mask) && (rb | sb))
-            goto rounding; /* no underflow */
-        }
-      return mpfr_underflow (q, rnd_mode, MPFR_SIGN(q));
-    }
-
- rounding:
-  MPFR_EXP (q) = qx; /* Don't use MPFR_SET_EXP since qx might be < __gmpfr_emin
-                        in the cases "goto rounding" above. */
-  if (rb == 0 && sb == 0)
-    {
-      MPFR_ASSERTD(qx >= __gmpfr_emin);
-      return 0; /* idem than MPFR_RET(0) but faster */
-    }
-  else if (rnd_mode == MPFR_RNDN)
-    {
-      if (rb == 0 || (rb && sb == 0 &&
-                      (qp[0] & (MPFR_LIMB_ONE << sh)) == 0))
-        goto truncate;
-      else
-        goto add_one_ulp;
-    }
-  else if (MPFR_IS_LIKE_RNDZ(rnd_mode, MPFR_IS_NEG(q)))
-    {
-    truncate:
-      MPFR_ASSERTD(qx >= __gmpfr_emin);
-      MPFR_RET(-MPFR_SIGN(q));
-    }
-  else /* round away from zero */
-    {
-    add_one_ulp:
-      qp[0] += MPFR_LIMB_ONE << sh;
-      if (qp[0] == 0)
-        {
-          qp[0] = MPFR_LIMB_HIGHBIT;
-          if (MPFR_UNLIKELY(qx + 1 > __gmpfr_emax))
-            return mpfr_overflow (q, rnd_mode, MPFR_SIGN(q));
-          MPFR_ASSERTD(qx + 1 <= __gmpfr_emax);
-          MPFR_ASSERTD(qx + 1 >= __gmpfr_emin);
-          MPFR_SET_EXP (q, qx + 1);
-        }
-      MPFR_RET(MPFR_SIGN(q));
-    }
-}
-
 MPFR_HOT_FUNCTION_ATTR int
 mpfr_div (mpfr_ptr q, mpfr_srcptr u, mpfr_srcptr v, mpfr_rnd_t rnd_mode)
 {
@@ -545,13 +787,24 @@ mpfr_div (mpfr_ptr q, mpfr_srcptr u, mpfr_srcptr v, mpfr_rnd_t rnd_mode)
         }
     }
 
-  if (MPFR_GET_PREC(q) < GMP_NUMB_BITS && MPFR_GET_PREC(u) <= GMP_NUMB_BITS
-      && MPFR_GET_PREC(v) <= GMP_NUMB_BITS)
+  usize = MPFR_LIMB_SIZE(u);
+  vsize = MPFR_LIMB_SIZE(v);
+
+  /* When MPFR_GENERIC_ABI is defined, we don't use special code. */
+#if !defined(MPFR_GENERIC_ABI)
+
+  if (MPFR_GET_PREC(q) < GMP_NUMB_BITS && usize == 1 && vsize == 1)
     return mpfr_divsp1 (q, u, v, rnd_mode);
 
+#if defined(WANT_GMP_INTERNALS) && defined(HAVE___GMPN_INVERT_LIMB)
+  if (GMP_NUMB_BITS < MPFR_GET_PREC(q) && MPFR_GET_PREC(q) < 2 * GMP_NUMB_BITS
+      && usize == 2 && vsize == 2)
+    return mpfr_divsp2 (q, u, v, rnd_mode);
+#endif
+
+#endif /* !defined(MPFR_GENERIC_ABI) */
+
   q0size = MPFR_LIMB_SIZE(q); /* number of limbs of destination */
-  usize = MPFR_LIMB_SIZE(u);
-  vsize = MPFR_LIMB_SIZE(v);
   q0p = MPFR_MANT(q);
   up = MPFR_MANT(u);
   vp = MPFR_MANT(v);
diff --git a/src/div_ui.c b/src/div_ui.c
index 215c56b1d..e80cca3d1 100644
--- a/src/div_ui.c
+++ b/src/div_ui.c
@@ -174,10 +174,9 @@ mpfr_div_ui (mpfr_ptr y, mpfr_srcptr x, unsigned long int u, mpfr_rnd_t rnd_mode
         }
       else
         { /* this happens only if u == 1 and xp[xn-1] >=
-             1<<(GMP_NUMB_BITS-1). It might be better to handle the
-             u == 1 case separately?
+             MPFR_LIMB_ONE << (GMP_NUMB_BITS-1). It might be better to
+             handle the u == 1 case separately?
           */
-
              MPN_COPY (yp, tmp + 1, yn);
         }
     }
diff --git a/src/lngamma.c b/src/lngamma.c
index 682888601..d07ea033e 100644
--- a/src/lngamma.c
+++ b/src/lngamma.c
@@ -490,7 +490,7 @@ GAMMA_FUNC (mpfr_ptr y, mpfr_srcptr z0, mpfr_rnd_t rnd)
       mpfr_mul (u, u, u, MPFR_RNDN); /* 1/z^2 * (1+u)^3 */
 
       /* m <= maxm ensures that 2*m*(2*m+1) <= ULONG_MAX */
-      maxm = 1UL << (GMP_NUMB_BITS / 2 - 1);
+      maxm = 1UL << (sizeof(unsigned long) * CHAR_BIT / 2 - 1);
 
       /* s:(1+u)^15, t:(1+u)^2, t <= 3/128 */
 
diff --git a/src/mpfr-gmp.h b/src/mpfr-gmp.h
index 628891a62..3088338fa 100644
--- a/src/mpfr-gmp.h
+++ b/src/mpfr-gmp.h
@@ -252,8 +252,19 @@ __MPFR_DECLSPEC extern const struct bases mpfr_bases[257];
 #define MAX(h,i) ((h) > (i) ? (h) : (i))
 #define numberof(x)  (sizeof (x) / sizeof ((x)[0]))
 
-/* Allocate func are defined in gmp-impl.h */
-
+/* Allocate func are defined in gmp-impl.h.
+   Warning: the code below fetches the GMP memory allocation functions the first
+   time one allocates some mpfr_t, and then always uses those initial functions,
+   even if the user later changes the GMP memory allocation functions with
+   mp_set_memory_functions(). This is fine as long as the user who wants to use
+   different memory allocation functions first calls mp_set_memory_functions()
+   before any call to mpfr_init or mpfr_init2.
+   For more complex usages, change #if 1 into #if 0. Warning! But in this
+   case, the user must make sure that there are no data internal to MPFR
+   allocated with the previous allocator. Freeing all the caches may be
+   necessary, but this is not guaranteed to be sufficient. */
+
+#if 1
 #undef __gmp_allocate_func
 #undef __gmp_reallocate_func
 #undef __gmp_free_func
@@ -265,6 +276,12 @@ __MPFR_DECLSPEC extern const struct bases mpfr_bases[257];
 #define __gmp_allocate_func   (MPFR_GET_MEMFUNC, mpfr_allocate_func)
 #define __gmp_reallocate_func (MPFR_GET_MEMFUNC, mpfr_reallocate_func)
 #define __gmp_free_func       (MPFR_GET_MEMFUNC, mpfr_free_func)
+#else
+extern void * (*__gmp_allocate_func) (size_t);
+extern void * (*__gmp_reallocate_func) (void *, size_t, size_t);
+extern void (*__gmp_free_func) (void *, size_t);
+#endif
+
 __MPFR_DECLSPEC extern MPFR_THREAD_ATTR void * (*mpfr_allocate_func)   (size_t);
 __MPFR_DECLSPEC extern MPFR_THREAD_ATTR void * (*mpfr_reallocate_func) (void *, size_t, size_t);
 __MPFR_DECLSPEC extern MPFR_THREAD_ATTR void   (*mpfr_free_func)       (void *, size_t);
diff --git a/src/mpfr-impl.h b/src/mpfr-impl.h
index d62ae437e..7e9fd60bd 100644
--- a/src/mpfr-impl.h
+++ b/src/mpfr-impl.h
@@ -495,9 +495,9 @@ __MPFR_DECLSPEC extern const mpfr_t __gmpfr_const_log2_RNDU;
    "-Werror=return-type".
    WARNING: It doesn't use do { } while (0) for Insure++ */
 #if defined(HAVE_BUILTIN_UNREACHABLE)
-# define MPFR_RET_NEVER_GO_HERE() { __builtin_unreachable(); }
+# define MPFR_RET_NEVER_GO_HERE() do { __builtin_unreachable(); } while (0)
 #else
-# define MPFR_RET_NEVER_GO_HERE()  { MPFR_ASSERTN(0); return 0; }
+# define MPFR_RET_NEVER_GO_HERE() do { MPFR_ASSERTN(0); return 0; } while (0)
 #endif
 
 
@@ -942,9 +942,6 @@ typedef uintmax_t mpfr_ueexp_t;
   (MPFR_UNLIKELY(MPFR_IS_SINGULAR_OR_UBF(x)) || \
    MPFR_UNLIKELY(MPFR_IS_SINGULAR_OR_UBF(y)))
 
-#define MPFR_IS_POWER_OF_2(x) \
-  (mpfr_cmp_ui_2exp ((x), 1, MPFR_GET_EXP (x) - 1) == 0)
-
 
 /******************************************************
  ********************  Sign macros  *******************
@@ -2086,6 +2083,7 @@ __MPFR_DECLSPEC mpfr_exp_t mpfr_ceil_mul (mpfr_exp_t, int, int);
 __MPFR_DECLSPEC int mpfr_exp_2 (mpfr_ptr, mpfr_srcptr,mpfr_rnd_t);
 __MPFR_DECLSPEC int mpfr_exp_3 (mpfr_ptr, mpfr_srcptr,mpfr_rnd_t);
 __MPFR_DECLSPEC int mpfr_powerof2_raw (mpfr_srcptr);
+__MPFR_DECLSPEC int mpfr_powerof2_raw2 (const mp_limb_t *, mp_size_t);
 
 __MPFR_DECLSPEC int mpfr_pow_general (mpfr_ptr, mpfr_srcptr,
                            mpfr_srcptr, mpfr_rnd_t, int, mpfr_save_expo_t *);
diff --git a/src/mpfr-longlong.h b/src/mpfr-longlong.h
index d3ffbecb8..c29c688fb 100644
--- a/src/mpfr-longlong.h
+++ b/src/mpfr-longlong.h
@@ -1887,11 +1887,11 @@ extern UWtype __MPN(udiv_qrnnd) (UWtype *, UWtype, UWtype, UWtype);
 /* FIXME: "sidi" here is highly doubtful, should sometimes be "diti".  */
 #if !defined (umul_ppmm) && defined (__umulsidi3)
 #define umul_ppmm(ph, pl, m0, m1) \
-  {									\
+  do {									\
     UDWtype __ll = __umulsidi3 (m0, m1);				\
     ph = (UWtype) (__ll >> W_TYPE_SIZE);				\
     pl = (UWtype) __ll;							\
-  }
+  } while (0)
 #endif
 
 #if !defined (__umulsidi3)
diff --git a/src/mpfr.h b/src/mpfr.h
index bc5802bdb..82468617d 100644
--- a/src/mpfr.h
+++ b/src/mpfr.h
@@ -301,7 +301,10 @@ typedef enum {
 #endif
 
 /* Use MPFR_DEPRECATED to mark MPFR functions, types or variables as
-   deprecated. Code inspired by Apache Subversion's svn_types.h file. */
+   deprecated. Code inspired by Apache Subversion's svn_types.h file.
+   For compatibility with MSVC, MPFR_DEPRECATED must be put before
+   __MPFR_DECLSPEC (not at the end of the function declaration as
+   documented in the GCC manual); GCC does not seem to care. */
 #if defined(__GNUC__) && \
   (__GNUC__ >= 4 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1))
 # define MPFR_DEPRECATED __attribute__ ((deprecated))
@@ -310,6 +313,11 @@ typedef enum {
 #else
 # define MPFR_DEPRECATED
 #endif
+/* TODO: Also define MPFR_EXPERIMENTAL for experimental functions?
+   See SVN_EXPERIMENTAL in Subversion 1.9+ as an example:
+   __attribute__((warning("..."))) can be used with GCC 4.3.1+ but
+   not __llvm__, and __declspec(deprecated("...")) can be used with
+   MSC as above. */
 
 /* Note: In order to be declared, some functions need a specific
    system header to be included *before* "mpfr.h". If the user
@@ -503,6 +511,7 @@ __MPFR_DECLSPEC void mpfr_free_str (char *);
 
 __MPFR_DECLSPEC int mpfr_urandom (mpfr_ptr, gmp_randstate_t,
                                   mpfr_rnd_t);
+MPFR_DEPRECATED
 __MPFR_DECLSPEC int mpfr_grandom (mpfr_ptr, mpfr_ptr, gmp_randstate_t,
                                   mpfr_rnd_t);
 __MPFR_DECLSPEC int mpfr_nrandom (mpfr_ptr, gmp_randstate_t,
diff --git a/src/mul.c b/src/mul.c
index 64af72a68..3c2c486ae 100644
--- a/src/mul.c
+++ b/src/mul.c
@@ -228,13 +228,12 @@ mpfr_mulsp1 (mpfr_ptr a, mpfr_srcptr b, mpfr_srcptr c, mpfr_rnd_t rnd_mode,
       sb = 0;
   */
   umul_ppmm (h, sb, MPFR_MANT(b)[0], MPFR_MANT(c)[0]);
-  if ((h & MPFR_LIMB_HIGHBIT) == 0)
+  if (h < MPFR_LIMB_HIGHBIT)
     {
       ax --;
       h = (h << 1) | (sb >> (GMP_NUMB_BITS - 1));
       sb = sb << 1;
     }
-  ap[0] = h;
   rb = h & (MPFR_LIMB_ONE << (sh - 1));
   sb |= (h & mask) ^ rb;
   ap[0] = h & ~mask;
diff --git a/src/powerof2.c b/src/powerof2.c
index f10ace0ad..53f0551cd 100644
--- a/src/powerof2.c
+++ b/src/powerof2.c
@@ -31,16 +31,17 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
 int
 mpfr_powerof2_raw (mpfr_srcptr x)
 {
-  mp_limb_t *xp;
-  mp_size_t xn;
-
   /* This is an internal function, and we may call it with some
      wrong numbers (ie good mantissa but wrong flags or exp)
      So we don't want to test if it is a pure FP.
      MPFR_ASSERTN(MPFR_IS_PURE_FP(x)); */
-  xp = MPFR_MANT(x);
-  xn = (MPFR_PREC(x) - 1) / GMP_NUMB_BITS;
-  if (xp[xn] != MPFR_LIMB_HIGHBIT)
+  return mpfr_powerof2_raw2 (MPFR_MANT(x), MPFR_LIMB_SIZE(x));
+}
+
+int
+mpfr_powerof2_raw2 (const mp_limb_t *xp, mp_size_t xn)
+{
+  if (xp[--xn] != MPFR_LIMB_HIGHBIT)
     return 0;
   while (xn > 0)
     if (xp[--xn] != 0)
diff --git a/src/rec_sqrt.c b/src/rec_sqrt.c
index 1797e3f77..fc87aa49f 100644
--- a/src/rec_sqrt.c
+++ b/src/rec_sqrt.c
@@ -26,11 +26,13 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
 #define LIMB_SIZE(x) ((((x)-1)>>MPFR_LOG2_GMP_NUMB_BITS) + 1)
 
 #define MPFR_COM_N(x,y,n)                               \
-  {                                                     \
-    mp_size_t i;                                        \
-    for (i = 0; i < n; i++)                             \
-      *((x)+i) = ~*((y)+i);                             \
-  }
+  do                                                    \
+    {                                                   \
+      mp_size_t i;                                      \
+      for (i = 0; i < n; i++)                           \
+        *((x)+i) = ~*((y)+i);                           \
+    }                                                   \
+  while (0)
 
 /* Put in X a p-bit approximation of 1/sqrt(A),
    where X = {x, n}/B^n, n = ceil(p/GMP_NUMB_BITS),
diff --git a/src/round_p.c b/src/round_p.c
index 9f9a77cd7..e201a1051 100644
--- a/src/round_p.c
+++ b/src/round_p.c
@@ -34,10 +34,16 @@ mpfr_round_p (mp_limb_t *bp, mp_size_t bn, mpfr_exp_t err0, mpfr_prec_t prec)
 
   i1 = mpfr_round_p_2 (bp, bn, err0, prec);
 
-  /* compare with mpfr_can_round_raw */
+  /* Note: since revision 10747, mpfr_can_round_raw is supposed to be always
+     correct, whereas mpfr_round_p_2 might return 0 in some cases where one
+     could round, for example with err0=67 and prec=54:
+     b = 1111101101010001100011111011100010100011101111011011101111111111
+     thus we cannot compare i1 and i2, we only can check that we don't have
+     i1 <> 0 and i2 = 0.
+  */
   i2 = mpfr_can_round_raw (bp, bn, MPFR_SIGN_POS, err0,
                            MPFR_RNDN, MPFR_RNDZ, prec);
-  if (i1 != i2)
+  if (i1 && (i2 == 0))
     {
       fprintf (stderr, "mpfr_round_p(%d) != mpfr_can_round(%d)!\n"
                "bn = %lu, err0 = %ld, prec = %lu\nbp = ", i1, i2,
diff --git a/src/round_prec.c b/src/round_prec.c
index d4a5cb4a9..11f0a3ec6 100644
--- a/src/round_prec.c
+++ b/src/round_prec.c
@@ -150,26 +150,29 @@ mpfr_can_round (mpfr_srcptr b, mpfr_exp_t err, mpfr_rnd_t rnd1,
    power of 2. */
 
 int
-mpfr_can_round_raw (const mp_limb_t *bp, mp_size_t bn, int neg, mpfr_exp_t err0,
+mpfr_can_round_raw (const mp_limb_t *bp, mp_size_t bn, int neg, mpfr_exp_t err,
                     mpfr_rnd_t rnd1, mpfr_rnd_t rnd2, mpfr_prec_t prec)
 {
-  mpfr_prec_t err, prec0 = prec;
+  mpfr_prec_t prec2;
   mp_size_t k, k1, tn;
   int s, s1;
   mp_limb_t cc, cc2;
   mp_limb_t *tmp;
+  mp_limb_t cy = 0, tmp_hi;
+  int res;
   MPFR_TMP_DECL(marker);
 
-  MPFR_ASSERTD(bp[bn - 1] & MPFR_LIMB_HIGHBIT);
+  /* Since mpfr_can_round is a function in the API, use MPFR_ASSERTN.
+     The specification makes sense only for prec >= 1. */
+  MPFR_ASSERTN (prec >= 1);
 
-  if (MPFR_UNLIKELY(err0 < 0 || (mpfr_uexp_t) err0 <= prec))
-    return 0;  /* can't round */
+  MPFR_ASSERTD(bp[bn - 1] & MPFR_LIMB_HIGHBIT);
 
   MPFR_ASSERT_SIGN(neg);
   neg = MPFR_IS_NEG_SIGN(neg);
 
   /* Transform RNDD and RNDU to Zero / Away */
-  MPFR_ASSERTD((neg == 0) || (neg == 1));
+  MPFR_ASSERTD (neg == 0 || neg == 1);
   /* transform RNDF to RNDN */
   if (rnd1 == MPFR_RNDF)
     rnd1 = MPFR_RNDN;
@@ -178,22 +181,145 @@ mpfr_can_round_raw (const mp_limb_t *bp, mp_size_t bn, int neg, mpfr_exp_t err0,
   if (rnd2 != MPFR_RNDN)
     rnd2 = MPFR_IS_LIKE_RNDZ(rnd2, neg) ? MPFR_RNDZ : MPFR_RNDA;
 
+  /* For err < prec (+1 for rnd1=RNDN), we can never round correctly, since
+     the error is at least 2*ulp(b) >= ulp(round(b)).
+     However for err = prec (+1 for rnd1=RNDN), we can round correctly in some
+     rare cases where ulp(b) = 1/2*ulp(U) [see below for the definition of U],
+     which implies rnd1 = RNDZ or RNDN, and rnd2 = RNDA or RNDN. */
+
+  if (MPFR_UNLIKELY (err < prec + (rnd1 == MPFR_RNDN) ||
+                     (err == prec + (rnd1 == MPFR_RNDN) &&
+                      (rnd1 == MPFR_RNDA ||
+                       rnd2 == MPFR_RNDZ))))
+    return 0;  /* can't round */
+
+  /* As a consequence... */
+  MPFR_ASSERTD (err >= prec);
+
+  /* The bound c on the error |x-b| is: c = 2^(MPFR_EXP(b)-err) <= b/2.
+   * So, we now know that x and b have the same sign. By symmetry,
+   * assume x > 0 and b > 0. We have: L <= x <= U, where, depending
+   * on rnd1:
+   *   MPFR_RNDN: L = b-c, U = b+c
+   *   MPFR_RNDZ: L = b,   U = b+c
+   *   MPFR_RNDA: L = b-c, U = b
+   *
+   * We can round x iff round(L,prec,rnd2) = round(U,prec,rnd2).
+   */
+
   if (MPFR_UNLIKELY (prec > (mpfr_prec_t) bn * GMP_NUMB_BITS))
-    { /* Then prec < PREC(b): we can round:
-         (i) in rounding to the nearest iff err0 >= prec + 2
+    { /* Then prec > PREC(b): we can round:
+         (i) in rounding to the nearest as long as err >= prec + 2.
+             When err = prec + 1 and b is not a power
+             of two (so that a change of binade cannot occur), then one
+             can round to nearest thanks to the even rounding rule (in the
+             target precision prec, the significand of b ends with a 0).
+             When err = prec + 1 and b is a power of two, when rnd1 = RNDZ one
+             can round too.
          (ii) in directed rounding mode iff rnd1 is compatible with rnd2
-              and err0 >= prec + 1, unless b = 2^k and rnd1=rnd2=RNDA in
-              which case we need err0 >= prec + 2. */
+              and err >= prec + 1, unless b = 2^k and rnd1 = RNDA or RNDN in
+              which case we need err >= prec + 2.
+      */
+      if ((rnd1 == rnd2 || rnd2 == MPFR_RNDN) && err >= prec + 1)
+        {
+          if (rnd1 != MPFR_RNDZ &&
+              err == prec + 1 &&
+              mpfr_powerof2_raw2 (bp, bn))
+            return 0;
+          else
+            return 1;
+        }
+      return 0;
+    }
+
+  /* now prec <= bn * GMP_NUMB_BITS */
+
+  if (MPFR_UNLIKELY (err > (mpfr_prec_t) bn * GMP_NUMB_BITS))
+    {
+      /* we distinguish the case where b is a power of two:
+         rnd1 rnd2 can round?
+         RNDZ RNDZ ok
+         RNDZ RNDA no
+         RNDZ RNDN ok
+         RNDA RNDZ no
+         RNDA RNDA ok except when err = prec + 1
+         RNDA RNDN ok except when err = prec + 1
+         RNDN RNDZ no
+         RNDN RNDA no
+         RNDN RNDN ok except when err = prec + 1 */
+      if (mpfr_powerof2_raw2 (bp, bn))
+        {
+          if ((rnd2 == MPFR_RNDZ || rnd2 == MPFR_RNDA) && rnd1 != rnd2)
+            return 0;
+          else if (rnd1 == MPFR_RNDZ)
+            return 1; /* RNDZ RNDZ and RNDZ RNDN */
+          else
+            return err > prec + 1;
+        }
+
+      /* now the general case where b is not a power of two:
+         rnd1 rnd2 can round?
+         RNDZ RNDZ ok
+         RNDZ RNDA except when b is representable in precision 'prec'
+         RNDZ RNDN except when b is the middle of two representable numbers in
+                   precision 'prec' and b ends with 'xxx0[1]',
+                   or b is representable in precision 'prec'
+                   and err = prec + 1 and b ends with '1'.
+         RNDA RNDZ except when b is representable in precision 'prec'
+         RNDA RNDA ok
+         RNDA RNDN except when b is the middle of two representable numbers in
+                   precision 'prec' and b ends with 'xxx1[1]',
+                   or b is representable in precision 'prec'
+                   and err = prec + 1 and b ends with '1'.
+         RNDN RNDZ except when b is representable in precision 'prec'
+         RNDN RNDA except when b is representable in precision 'prec'
+         RNDN RNDN except when b is the middle of two representable numbers in
+                   precision 'prec', or b is representable in precision 'prec'
+                   and err = prec + 1 and b ends with '1'. */
       if (rnd2 == MPFR_RNDN)
-        return (mpfr_uexp_t) err0 - 2 >= prec;
+        {
+          if (err == prec + 1 && (bp[0] & 1))
+            return 0; /* err == prec + 1 implies prec = bn * GMP_NUMB_BITS */
+          if (prec < (mpfr_prec_t) bn * GMP_NUMB_BITS)
+            {
+              k1 = MPFR_PREC2LIMBS (prec + 1);
+              MPFR_UNSIGNED_MINUS_MODULO(s1, prec + 1);
+              if (((bp[bn - k1] >> s1) & 1) &&
+                  mpfr_round_raw2 (bp, bn, neg, MPFR_RNDA, prec + 1) == 0)
+                { /* b is the middle of two representable numbers */
+                  if (rnd1 == MPFR_RNDN)
+                    return 0;
+                  k1 = MPFR_PREC2LIMBS (prec);
+                  MPFR_UNSIGNED_MINUS_MODULO(s1, prec);
+                  return (rnd1 == MPFR_RNDZ) ^
+                    (((bp[bn - k1] >> s1) & 1) == 0);
+                }
+            }
+          return 1;
+        }
+      else if (rnd1 == rnd2)
+        {
+          if (rnd1 == MPFR_RNDN && prec < (mpfr_prec_t) bn * GMP_NUMB_BITS)
+            {
+              /* then rnd2 = RNDN, and for prec = bn * GMP_NUMB_BITS we cannot
+                 have b the middle of two representable numbers */
+              k1 = MPFR_PREC2LIMBS (prec + 1);
+              MPFR_UNSIGNED_MINUS_MODULO(s1, prec + 1);
+              if (((bp[bn - k1] >> s1) & 1) &&
+                  mpfr_round_raw2 (bp, bn, neg, MPFR_RNDA, prec + 1) == 0)
+                /* b is representable in precision prec+1 and ends with a 1 */
+                return 0;
+              else
+                return 1;
+            }
+          else
+            return 1;
+        }
       else
-        return (rnd1 == rnd2) && (mpfr_uexp_t) err0 - 2 >= prec;
+        return mpfr_round_raw2 (bp, bn, neg, MPFR_RNDA, prec) != 0;
     }
 
-  /* if the error is smaller than ulp(b), then anyway it will propagate
-     up to ulp(b) */
-  err = ((mpfr_uexp_t) err0 > (mpfr_prec_t) bn * GMP_NUMB_BITS) ?
-    (mpfr_prec_t) bn * GMP_NUMB_BITS : (mpfr_prec_t) err0;
+  /* now err <= bn * GMP_NUMB_BITS */
 
   /* warning: if k = m*GMP_NUMB_BITS, consider limb m-1 and not m */
   k = (err - 1) / GMP_NUMB_BITS;
@@ -211,7 +337,7 @@ mpfr_can_round_raw (const mp_limb_t *bp, mp_size_t bn, int neg, mpfr_exp_t err0,
      Warning! The number with updated bn may no longer be normalized. */
   k -= k1;
   bn -= k1;
-  prec -= (mpfr_prec_t) k1 * GMP_NUMB_BITS;
+  prec2 = prec - (mpfr_prec_t) k1 * GMP_NUMB_BITS;
 
   /* We can decide of the correct rounding if rnd2(b-eps) and rnd2(b+eps)
      give the same result to the target precision 'prec', i.e., if when
@@ -230,64 +356,150 @@ mpfr_can_round_raw (const mp_limb_t *bp, mp_size_t bn, int neg, mpfr_exp_t err0,
   switch (rnd1)
     {
     case MPFR_RNDZ:
-      /* Round to Zero */
+      /* rnd1 = Round to Zero */
       cc = (bp[bn - 1] >> s1) & 1;
       /* mpfr_round_raw2 returns 1 if one should add 1 at ulp(b,prec),
          and 0 otherwise */
-      cc ^= mpfr_round_raw2 (bp, bn, neg, rnd2, prec);
+      cc ^= mpfr_round_raw2 (bp, bn, neg, rnd2, prec2);
       /* cc is the new value of bit s1 in bp[bn-1] after rounding 'rnd2' */
 
       /* now round b + 2^(MPFR_EXP(b)-err) */
-      mpn_add_1 (tmp + bn - k, bp + bn - k, k, MPFR_LIMB_ONE << s);
-      /* if there was a carry here, then necessarily bit s1 of bp[bn-1]
-         changed, thus we surely cannot round for directed rounding, but this
-         will be detected below, with cc2 != cc */
+      cy = mpn_add_1 (tmp + bn - k, bp + bn - k, k, MPFR_LIMB_ONE << s);
+      /* propagate carry up to most significant limb */
+      for (tn = 0; tn + 1 < k1 && cy != 0; tn ++)
+        cy = ~bp[bn + tn] == 0;
+      if (cy == 0 && err == prec)
+        {
+          res = 0;
+          goto end;
+        }
+      if (MPFR_UNLIKELY(cy))
+        {
+          /* when a carry occurs, we have b < 2^h <= b+c, we can round iff:
+             rnd2 = RNDZ: never, since b and b+c round to different values;
+             rnd2 = RNDA: when b+c is an exact power of two, and err > prec
+                          (since for err = prec, b = 2^h - 1/2*ulp(2^h) is
+                          exactly representable and thus rounds to itself);
+             rnd2 = RNDN: whenever cc = 0, since err >= prec implies
+                          c <= ulp(b) = 1/2*ulp(2^h), thus b+c rounds to 2^h,
+                          and b+c >= 2^h implies that bit 'prec' of b is 1,
+                          thus cc = 0 means that b is rounded to 2^h too. */
+          res = (rnd2 == MPFR_RNDZ) ? 0
+            : (rnd2 == MPFR_RNDA) ? (err > prec && k == bn && tmp[0] == 0)
+            : cc == 0;
+          goto end;
+        }
       break;
     case MPFR_RNDN:
-      /* Round to nearest */
+      /* rnd1 = Round to nearest */
 
       /* first round b+2^(MPFR_EXP(b)-err) */
-      mpn_add_1 (tmp + bn - k, bp + bn - k, k, MPFR_LIMB_ONE << s);
-      /* same remark as above in case a carry occurs in mpn_add_1() */
+      cy = mpn_add_1 (tmp + bn - k, bp + bn - k, k, MPFR_LIMB_ONE << s);
+      /* propagate carry up to most significant limb */
+      for (tn = 0; tn + 1 < k1 && cy != 0; tn ++)
+        cy = ~bp[bn + tn] == 0;
       cc = (tmp[bn - 1] >> s1) & 1; /* gives 0 when cc=1 */
-      cc ^= mpfr_round_raw2 (tmp, bn, neg, rnd2, prec);
+      cc ^= mpfr_round_raw2 (tmp, bn, neg, rnd2, prec2);
       /* cc is the new value of bit s1 in bp[bn-1]+eps after rounding 'rnd2' */
-
+      if (MPFR_UNLIKELY (cy != 0))
+        {
+          /* when a carry occurs, we have b-c < b < 2^h <= b+c, we can round
+             iff:
+             rnd2 = RNDZ: never, since b-c and b+c round to different values;
+             rnd2 = RNDA: when b+c is an exact power of two, and
+                          err > prec + 1 (since for err <= prec + 1,
+                          b-c <= 2^h - 1/2*ulp(2^h) is exactly representable
+                          and thus rounds to itself);
+             rnd2 = RNDN: whenever err > prec + 1, since for err = prec + 1,
+                          b+c rounds to 2^h, and b-c rounds to nextbelow(2^h).
+                          For err > prec + 1, c <= 1/4*ulp(b) <= 1/8*ulp(2^h),
+                          thus
+                          2^h - 1/4*ulp(b) <= b-c < b+c <= 2^h + 1/8*ulp(2^h),
+                          therefore both b-c and b+c round to 2^h. */
+          res = (rnd2 == MPFR_RNDZ) ? 0
+            : (rnd2 == MPFR_RNDA) ? (err > prec + 1 && k == bn && tmp[0] == 0)
+            : err > prec + 1;
+          goto end;
+        }
     subtract_eps:
-      /* now round b-2^(MPFR_EXP(b)-err) */
-      cc2 = mpn_sub_1 (tmp + bn - k, bp + bn - k, k, MPFR_LIMB_ONE << s);
+      /* now round b-2^(MPFR_EXP(b)-err), this happens for
+         rnd1 = RNDN or RNDA */
+      MPFR_ASSERTD(rnd1 == MPFR_RNDN || rnd1 == MPFR_RNDA);
+      cy = mpn_sub_1 (tmp + bn - k, bp + bn - k, k, MPFR_LIMB_ONE << s);
       /* propagate the potential borrow up to the most significant limb
          (it cannot propagate further since the most significant limb is
-         at least MPFR_LIMB_HIGHBIT) */
-      for (tn = 0; tn + 1 < k1 && (cc2 != 0); tn ++)
-        cc2 = bp[bn + tn] == 0;
-      /* We have an exponent decrease when either:
-           (i) k1 = 0 and tmp[bn-1] < MPFR_LIMB_HIGHBIT
-           (ii) k1 > 0 and cc <> 0 and bp[bn + tn] = MPFR_LIMB_HIGHBIT
-                (then necessarily tn = k1-1).
-         Then for directed rounding we cannot round,
-         and for rounding to nearest we cannot round when err = prec + 1.
+         at least MPFR_LIMB_HIGHBIT).
+         Note: we use the same limb tmp[bn-1] to subtract. */
+      tmp_hi = tmp[bn - 1];
+      for (tn = 0; tn < k1 && cy != 0; tn ++)
+        cy = mpn_sub_1 (&tmp_hi, bp + bn + tn, 1, cy);
+      /* We have an exponent decrease when tn = k1 and
+         tmp[bn-1] < MPFR_LIMB_HIGHBIT:
+         b-c < 2^h <= b (for RNDA) or b+c (for RNDN).
+         Then we surely cannot round when rnd2 = RNDZ, since b or b+c round to
+         a value >= 2^h, and b-c rounds to a value < 2^h.
+         We also surely cannot round when (rnd1,rnd2) = (RNDN,RNDA), since
+         b-c rounds to a value <= 2^h, and b+c > 2^h rounds to a value > 2^h.
+         It thus remains:
+         (rnd1,rnd2) = (RNDA,RNDA), (RNDA,RNDN) and (RNDN,RNDN).
+         For (RNDA,RNDA) we can round only when b-c and b round to 2^h, which
+         implies b = 2^h and err > prec (which is true in that case):
+         a necessary condition is that cc = 0.
+         For (RNDA,RNDN) we can round only when b-c and b round to 2^h, which
+         implies b-c >= 2^h - 1/4*ulp(2^h), and b <= 2^h + 1/2*ulp(2^h);
+         since ulp(2^h) = ulp(b), this implies c <= 3/4*ulp(b), thus
+         err > prec.
+         For (RNDN,RNDN) we can round only when b-c and b+c round to 2^h,
+         which implies b-c >= 2^h - 1/4*ulp(2^h), and
+         b+c <= 2^h + 1/2*ulp(2^h);
+         since ulp(2^h) = ulp(b), this implies 2*c <= 3/4*ulp(b), thus
+         err > prec+1.
       */
-      if (((k1 == 0 && tmp[bn - 1] < MPFR_LIMB_HIGHBIT) ||
-           (k1 != 0 && cc2 != 0 && bp[bn + tn] == MPFR_LIMB_HIGHBIT)) &&
-          (rnd2 != MPFR_RNDN || err0 == prec0 + 1))
+      if (tn == k1 && tmp_hi < MPFR_LIMB_HIGHBIT) /* exponent decrease */
+        {
+          if (rnd2 == MPFR_RNDZ || (rnd1 == MPFR_RNDN && rnd2 == MPFR_RNDA) ||
+              cc != 0 /* b or b+c does not round to 2^h */)
+            {
+              res = 0;
+              goto end;
+            }
+          /* in that case since the most significant bit of tmp is 0, we
+             should consider one more bit; res = 0 when b-c does not round
+             to 2^h. */
+          res = mpfr_round_raw2 (tmp, bn, neg, rnd2, prec2 + 1) != 0;
+          goto end;
+        }
+      if (err == prec + (rnd1 == MPFR_RNDN))
         {
-          MPFR_TMP_FREE(marker);
-          return 0;
+          /* No exponent increase nor decrease, thus we have |U-L| = ulp(b).
+             For rnd2 = RNDZ or RNDA, either [L,U] contains one representable
+             number in the target precision, and then L and U round
+             differently; or both L and U are representable: they round
+             differently too; thus in all cases we cannot round.
+             For rnd2 = RNDN, the only case where we can round is when the
+             middle of [L,U] (i.e. b) is representable, and ends with a 0. */
+          res = (rnd2 == MPFR_RNDN && (((bp[bn - 1] >> s1) & 1) == 0) &&
+                 mpfr_round_raw2 (bp, bn, neg, MPFR_RNDZ, prec2) ==
+                 mpfr_round_raw2 (bp, bn, neg, MPFR_RNDA, prec2));
+          goto end;
         }
       break;
     default:
-      /* Round away */
+      /* rnd1 = Round away */
+      MPFR_ASSERTD (rnd1 == MPFR_RNDA);
       cc = (bp[bn - 1] >> s1) & 1;
-      cc ^= mpfr_round_raw2 (bp, bn, neg, rnd2, prec);
+      /* the mpfr_round_raw2() call below returns whether one should add 1 or
+         not for rounding */
+      cc ^= mpfr_round_raw2 (bp, bn, neg, rnd2, prec2);
       /* cc is the new value of bit s1 in bp[bn-1]+eps after rounding 'rnd2' */
 
       goto subtract_eps;
     }
 
   cc2 = (tmp[bn - 1] >> s1) & 1;
-  cc2 ^= mpfr_round_raw2 (tmp, bn, neg, rnd2, prec);
+  res = cc == (cc2 ^ mpfr_round_raw2 (tmp, bn, neg, rnd2, prec2));
 
+ end:
   MPFR_TMP_FREE(marker);
-  return cc == cc2;
+  return res;
 }
diff --git a/src/sqrt.c b/src/sqrt.c
index af993fa64..332720e3f 100644
--- a/src/sqrt.c
+++ b/src/sqrt.c
@@ -20,14 +20,115 @@ along with the GNU MPFR Library; see the file COPYING.LESSER.  If not, see
 http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. */
 
+/*
+*** Note ***
+
+  The code of mpfr_sqrt1 and/or functions it calls depends on
+  implementation-defined features of the C standard. See lines with a
+  cast to mp_limb_signed_t, associated with a shift to the right '>>'.
+  Such features are known to behave as the code below expects with GCC,
+  according to the GCC manual (excerpt from the GCC 4.9.3 manual):
+
+    4 C Implementation-defined behavior
+    ***********************************
+
+    4.5 Integers
+    ============
+
+    * 'The result of, or the signal raised by, converting an integer to
+      a signed integer type when the value cannot be represented in an
+      object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).'
+
+      For conversion to a type of width N, the value is reduced modulo
+      2^N to be within range of the type; no signal is raised.
+
+    * 'The results of some bitwise operations on signed integers (C90
+      6.3, C99 and C11 6.5).'
+
+      Bitwise operators act on the representation of the value including
+      both the sign and value bits, where the sign bit is considered
+      immediately above the highest-value value bit.  Signed '>>' acts
+      on negative numbers by sign extension.
+
+  It is not known whether it works with other compilers. Thus this code
+  is currently enabled only when __GNUC__ is defined (which includes
+  compilers that declare a compatibility with GCC). A configure test
+  might be an alternative solution (but without any guarantee, in case
+  the result may also depend on the context).
+
+  Warning! The right shift of a negative value corresponds to an integer
+  division by a power of two, with rounding toward negative.
+
+  TODO: Complete the comments when a right shift of a negative value
+  may be involved, so that the rounding toward negative appears in the
+  proof. There has been at least an error with a proof of a bound!
+*/
+
 #define MPFR_NEED_LONGLONG_H
 #include "mpfr-impl.h"
 
+#if !defined(MPFR_GENERIC_ABI) && (GMP_NUMB_BITS == 32 || GMP_NUMB_BITS == 64)
+
+/* The tables T1[] and T2[] below were generated using the Sage code below,
+   with T1,T2 = bipartite(4,4,4,12,16). Note: we would get a slightly smaller
+   error using an approximation of the form T1[a,b] * (1 + T2[a,b]), but this
+   would make the code more complex, and the approximation would not always fit
+   on p2 bits (assuming p2 >= p1).
+# bi-partite method, using a table T1 of p1 bits, and T2 of p2 bits
+# approximate sqrt(a:b:c) with T1[a,b] + T2[a,c]
+def bipartite(pa,pb,pc,p1,p2):
+   T1 = dict()
+   T2 = dict()
+   maxerr = maxnum = 0
+   for a in range(2^(pa-2),2^pa):
+      for b in range(2^pb):
+         A1 = (a*2^pb+b)/2^(pa+pb)
+         A2 = (a*2^pb+b+1)/2^(pa+pb)
+         X = (1/sqrt(A1) + 1/sqrt(A2))/2
+         X = round(X*2^p1)/2^p1
+         T1[a,b] = X*2^p1
+         maxnum = max(maxnum, abs(T1[a,b]))
+         # print "a=", a, "b=", b, "T1=", x
+         maxerr = max(maxerr, n(abs(1/sqrt(A1) - X)))
+         maxerr = max(maxerr, n(abs(1/sqrt(A2) - X)))
+   print "maxerr1 =", maxerr, log(maxerr)/log(2.0), "maxnum=", maxnum
+   maxerr = maxnum = 0
+   for a in range(2^(pa-2),2^pa):
+      for c in range(2^pc):
+         Xmin = infinity
+         Xmax = -infinity
+         for b in range(2^pb):
+            A = (a*2^(pb+pc)+b*2^pc+c)/2^(pa+pb+pc)
+            X = 1/sqrt(A) - T1[a,b]/2^p1
+            X = round(X*2^p2)/2^p2
+            Xmin = min (Xmin, X)
+            Xmax = max (Xmax, X)
+            A = (a*2^(pb+pc)+b*2^pc+c+1)/2^(pa+pb+pc)
+            X = 1/sqrt(A) - T1[a,b]/2^p1
+            X = round(X*2^p2)/2^p2
+            Xmin = min (Xmin, X)
+            Xmax = max (Xmax, X)
+         T2[a,c] = round((Xmin + Xmax)/2*2^p2)
+         maxnum = max(maxnum, abs(T2[a,c]))
+         # print "a=", a, "c=", c, "T2=", T2[a,c]
+         for b in range(2^pb):
+            A = (a*2^(pb+pc)+b*2^pc+c)/2^(pa+pb+pc)
+            X = 1/sqrt(A)
+            maxerr = max(maxerr, n(abs(X - (T1[a,b]/2^p1 + T2[a,c]/2^p2))))
+            A = (a*2^(pb+pc)+b*2^pc+c+1)/2^(pa+pb+pc)
+            X = 1/sqrt(A)
+            maxerr = max(maxerr, n(abs(X - (T1[a,b]/2^p1 + T2[a,c]/2^p2))))
+   print "maxerr2 =", maxerr, log(maxerr)/log(2.0), "maxnum=", maxnum
+   return [T1[a,b] for a in range(2^(pa-2),2^pa) for b in range(2^pb)], \
+          [T2[a,c] for a in range(2^(pa-2),2^pa) for c in range(2^pc)]
+*/
+
 static const unsigned short T1[] = {8160, 8098, 8037, 7977, 7919, 7861, 7805, 7751, 7697, 7644, 7593, 7542, 7493, 7445, 7397, 7350, 7304, 7260, 7215, 7172, 7129, 7088, 7047, 7006, 6966, 6927, 6889, 6851, 6814, 6778, 6742, 6706, 6671, 6637, 6603, 6570, 6537, 6505, 6473, 6442, 6411, 6381, 6351, 6321, 6292, 6263, 6235, 6206, 6179, 6152, 6125, 6098, 6072, 6046, 6020, 5995, 5970, 5946, 5921, 5897, 5874, 5850, 5827, 5804, 5781, 5759, 5737, 5715, 5693, 5672, 5651, 5630, 5609, 5589, 5569, 5549, 5529, 5509, 5490, 5471, 5452, 5433, 5415, 5396, 5378, 5360, 5342, 5324, 5307, 5290, 5273, 5256, 5239, 5222, 5206, 5189, 5173, 5157, 5141, 5125, 5110, 5094, 5079, 5064, 5049, 5034, 5019, 5004, 4990, 4975, 4961, 4947, 4933, 4919, 4905, 4892, 4878, 4865, 4851, 4838, 4825, 4812, 4799, 4786, 4773, 4761, 4748, 4736, 4724, 4711, 4699, 4687, 4675, 4663, 4652, 4640, 4628, 4617, 4605, 4594, 4583, 4572, 4561, 4550, 4539, 4528, 4517, 4506, 4496, 4485, 4475, 4464, 4454, 4444, 4434, 4423, 4413, 4403, 4394, 4384, 4374, 4364, 4355, 4345, 4335, 4326, 4317, 4307, 4298, 4289, 4280, 4271, 4262, 4253, 4244, 4235, 4226, 4217, 4208, 4200, 4191, 4183, 4174, 4166, 4157, 4149, 4141, 4132, 4124, 4116, 4108, 4100};
 
 static const short T2[] = {420, 364, 308, 252, 196, 140, 84, 28, -31, -87, -141, -194, -248, -302, -356, -410, 307, 267, 226, 185, 145, 104, 63, 21, -24, -65, -105, -145, -185, -224, -263, -303, 237, 205, 174, 142, 111, 79, 48, 16, -16, -45, -75, -105, -136, -167, -198, -229, 187, 161, 136, 110, 85, 61, 36, 12, -15, -41, -67, -92, -117, -142, -167, -192, 159, 138, 117, 96, 76, 54, 33, 12, -10, -30, -51, -71, -92, -113, -134, -154, 130, 112, 95, 77, 60, 43, 26, 9, -10, -28, -46, -64, -81, -98, -116, -133, 115, 100, 85, 70, 54, 39, 23, 8, -7, -22, -37, -52, -66, -82, -96, -111, 99, 86, 73, 60, 46, 32, 19, 6, -8, -21, -34, -47, -60, -73, -86, -100, 86, 75, 63, 52, 40, 28, 17, 5, -7, -19, -31, -43, -54, -66, -78, -90, 77, 66, 56, 46, 35, 25, 16, 6, -5, -15, -26, -36, -47, -57, -67, -77, 70, 60, 51, 42, 33, 23, 14, 5, -5, -14, -24, -33, -43, -52, -62, -72, 65, 57, 49, 40, 31, 23, 14, 6, -3, -12, -20, -28, -37, -45, -54, -62};
 
-#if GMP_NUMB_BITS == 64
+/* return x0 and write rp[0] such that a0 = x0^2 + rp[0]
+   with x0^2 <= a0 < (x0+1)^2 */
 static mp_limb_t
 mpn_sqrtrem1 (mpfr_limb_ptr rp, mp_limb_t a0)
 {
@@ -36,21 +137,41 @@ mpn_sqrtrem1 (mpfr_limb_ptr rp, mp_limb_t a0)
   mp_limb_t c = (a0 >> (GMP_NUMB_BITS - 12)) & 0xf;
   mp_limb_t x0, a1, t, y, x2;
 
-  x0 = (T1[(a-4)*16+b] << 4) + T2[(a-4)*16+c];
-  /* now x0 is a 16-bit approximation, with maximal error 2^(-9.46) */
-
+  x0 = ((mp_limb_t) T1[(a-4)*16+b] << 4) + T2[(a-4)*16+c];
+  /* now x0/2^16 is a (1+16)-bit approximation of 2^6/sqrt(a*2^8+b*2^4+c),
+     thus of 2^(GMP_NUMB_BITS/2)/sqrt(a0), with maximal error 2^(-9.46) */
+
+#if GMP_NUMB_BITS == 32
+  x0 -= 89; /* x0 -= 93 ensures that x0/2^16 <= 2^16/sqrt(a0) (proof by
+               exhaustive search), which ensures that t = a0-y^2 >= below, and
+               thus that the truncated Karp-Markstein trick gives x0 <= sqrt(a0
+               at the end. However (still by exhaustive search) x0 -= 89 is
+               enough to guarantee x0^2 <= a0 at the end, and at most one
+               correction is needed. With x0 -= 89 the probability of correction
+               is 0.097802, with x0 -= 93 it is 0.106486. */
+  a1 = a0 >> (GMP_NUMB_BITS - 16); /* a1 has 16 bits */
+  y = (a1 * (x0 >> 1)) >> 15; /* y is near 2^32/x0, with 16 bits, and should be
+                                 an approximation of sqrt(a0) */
+  /* |a0 - y^2| <= 13697110 < 2^24 (by exhaustive search) */
+  /* a0 >= y^2 (proof by exhaustive search) */
+  t = (a0 - y * y) >> 8;
+  /* t/2^24 approximates a0/2^32 - (y/2^16)^2, with |t| < 2^16 */
+  /* x0*t/2^41 approximates (x0/2^16)/2*(a0/2^32 - (y/2^16)^2) */
+  /* x0 * t < 2^32 (proof by exhaustive search) */
+  x0 = y + ((x0 * t) >> 25);
+#else /* GMP_NUMB_BITS = 64 */
   a1 = a0 >> (GMP_NUMB_BITS - 32);
   /* a1 has 32 bits, thus a1*x0^2 has 64 bits */
-  t = (mp_limb_signed_t) (a1 * (x0 * x0)) >> 32;
-  /* t has 32 bits now */
-  x0 = (x0 << 15) - ((mp_limb_signed_t) (x0 * t) >> (17+1));
+  /* a1*x^0 might exceed 2^64, but we are only interested in
+     a1*x^0 - 2^64, which is small */
+  t = (mp_limb_signed_t) (a1 * (x0 * x0)) >> 9;
+  /* |t| < 2^46 (proof by exhaustive search on all possible values of a1,
+     since x0 depends on a1 only) */
+  x0 = (x0 << 16) - ((mp_limb_signed_t) (x0 * t) >> (39+1)) - 1;
 
-  /* now x0 is a 31-bit approximation (32 bits, 1 <= x0/2^31 <= 2),
-     with maximal error 2^(-19.19). More precisely by exhaustive search
-     on all 32-bit values of a1:
-     -1.665416e-06 <= x0/2^31 - 1/sqrt(a1/2^32) <= 6.959122e-10,
-     with the left worst case for a1=1326448622, x0=3864240766,
-     and the right worst case for a1=1081715548, x0=4279108124. */
+  /* now x0 is a (1+32)-bit approximation such that (by exhaustive search on all
+     32-bit values of a1):
+     -1.67e-06 <= x0/2^32 - 2^16/sqrt(a1) <= 0 */
 
   /* we now use Karp-Markstein's trick to get a 32-bit approximation of the
      square root of a0:
@@ -58,16 +179,23 @@ mpn_sqrtrem1 (mpfr_limb_ptr rp, mp_limb_t a0)
      t = a - y^2 [target accuracy, high 19 bits are zero]
      y = y + x0/2 * t [target accuracy] */
 
-  /* a1*x0^2 is near 2^94, thus a1*x0 is near 2^94/x0 */
-  y = (a1 * x0) >> 31; /* y is near 2^63/x0, with 32 bits */
-  t = (mp_limb_signed_t) (a0 - y * y) >> 14;
-  /* t should have at most 64-14-19=31 significant bits */
-  x0 = y + ((mp_limb_signed_t) (x0 * t) >> (49+1));
+  /* a1*x0^2 is near 2^96, thus a1*x0 is near 2^96/x0, thus near from
+     2^48*sqrt(a1), thus near from 2^32*sqrt(a0) */
+  y = (a1 * x0) >> 32; /* y is near sqrt(a0), with 32 bits */
+  /* now a0 >= y^2 */
+  t = (a0 - y * y) >> 13;
+  /* t < 2^31 (by exhaustive search on all possible values of a1, with
+     a0 = 2^32*a1+(2^32-1) */
+  /* since x0 < 2^33 and t < 2^31, x0*t does not overflow */
+  x0 = y + ((x0 * t) >> (64-13+1));
+#endif
 
-  /* x0 is now a 32-bit approximation of sqrt(a0) */
+  /* x0 is now a (GMP_NUMB_BITS/2)-bit approximation of sqrt(a0),
+     with x0 <= sqrt(a0) */
 
   x2 = x0 * x0;
-  while (x2 + 2*x0 < a0)
+  if (x2 + 2*x0 < a0) /* x0 is too small: probability of correction is 0.097802
+                         for GMP_NUMB_BITS=32, 0.000017 for GMP_NUMB_BITS=64 */
     {
       x2 += 2*x0 + 1;
       x0++;
@@ -77,10 +205,11 @@ mpn_sqrtrem1 (mpfr_limb_ptr rp, mp_limb_t a0)
   return x0;
 }
 
-/* Return a (1+40)-bit approximation x0 of 2^72/sqrt(a0).
-   Assume a0 >= 2^(GMP_NUMB_BITS-2), thus x0 has 41 bits.
-   Assume GMP_NUMB_BITS=64.
-*/
+/* For GMP_NUMB_BITS=32: return a (1+20)-bit approximation x0 of 2^36/sqrt(a0).
+   For GMP_NUMB_BITS=64: return a (1+40)-bit approximation x0 of 2^72/sqrt(a0).
+   Assume a0 >= 2^(GMP_NUMB_BITS-2), and GMP_NUMB_BITS = 32 or 64.
+   Must ensure: x0 <= 2^36/sqrt(a0) for GMP_NUMB_BITS=32,
+                x0 <= 2^72/sqrt(a0) for GMP_NUMB_BITS=64. */
 static mp_limb_t
 mpn_rsqrtrem1 (mp_limb_t a0)
 {
@@ -89,12 +218,31 @@ mpn_rsqrtrem1 (mp_limb_t a0)
   mp_limb_t c = (a0 >> (GMP_NUMB_BITS - 12)) & 0xf;
   mp_limb_t x0, a1, t;
 
-  MPFR_STAT_STATIC_ASSERT (GMP_NUMB_BITS == 64);
+  MPFR_STAT_STATIC_ASSERT (GMP_NUMB_BITS == 32 || GMP_NUMB_BITS == 64);
 
-  x0 = (T1[(a-4)*16+b] << 4) + T2[(a-4)*16+c];
+  x0 = ((mp_limb_t) T1[(a-4)*16+b] << 4) + T2[(a-4)*16+c];
   /* now x0 is a 16-bit approximation, with maximal error 2^(-9.46):
      -2^(-9.46) <= x0/2^16 - 1/sqrt(a/2^4) <= 2^(-9.46) */
 
+#if GMP_NUMB_BITS == 32
+  x0 >>= 1; /* reduce approximation to 1+15 bits */
+  /* In principle, a0>>10 can have up to 22 bits, and (x0^2)>>12 can have up to
+     20 bits, thus the product can have up to 42 bits, but since a0*x0^2 is very
+     near 2^62, thus (a0>>10)*(x0^2)>>12 is very near 2^40, and when we reduce
+     it mod 2^32 and interpret as a signed number in [-2^31, 2^31-1], we get
+     the correct remainder (a0>>10)*(x0^2)>>12 - 2^40 */
+  t = (mp_limb_signed_t) ((a0 >> 10) * ((x0 * x0) >> 12)) >> 8;
+  /* |t| < 6742843 <= 2^23 - 256 (by exhaustive search) */
+  t = (mp_limb_signed_t) t >> 8; /* now |t| < 2^15, thus |x0*t| < 2^31 */
+  x0 = (x0 << 5) - ((mp_limb_signed_t) (x0 * t) >> 20);
+
+  /* by exhaustive search on all possible values of a0, we get:
+     -1.61 <= x0 - 2^36/sqrt(a0) <= 3.11 thus
+     -2^(-19.3) < -1.54e-6 <= x0/2^20 - 2^16/sqrt(a0) <= 2.97e-6 < 2^(-18.3)
+  */
+
+  return x0 - 4; /* ensures x0 <= 2^36/sqrt(a0) */
+#else /* GMP_NUMB_BITS = 64 */
   a1 = a0 >> (GMP_NUMB_BITS - 32);
   /* a1 has 32 bits, thus a1*x0^2 has 64 bits */
   t = (mp_limb_signed_t) (-a1 * (x0 * x0)) >> 32;
@@ -112,13 +260,21 @@ mpn_rsqrtrem1 (mp_limb_t a0)
 
   /* now x0 is a 1+40-bit approximation,
      more precisely we have (experimentally):
-     -3.157551e-12 <= x0/2^40 - 2^32/sqrt(a0) <= 3.834848e-12
+     -2^(-38.2) < -3.16e-12 <= x0/2^40 - 2^32/sqrt(a0) <= 3.84e-12 < 2^(-37.9)
   */
-
-  return x0;
+  return x0 - 5; /* ensures x0 <= 2^72/sqrt(a0) */
+#endif
 }
 
-/* Comments below are for GMP_NUMB_BITS=64 for simplicity, but the code is valid
+/* Given as input np[0] and np[1], with B/4 <= np[1] (where B = 2^GMP_NUMB_BITS),
+   mpn_sqrtrem2 returns a value x, 0 <= x <= 1, and stores values s in sp[0] and
+   r in rp[0] such that:
+
+   n := np[1]*B + np[0] = s^2 + x*B + r, with n < (s+1)^2
+
+   or equivalently x*B + r <= 2*s.
+
+   This comment is for GMP_NUMB_BITS=64 for simplicity, but the code is valid
    for any even value of GMP_NUMB_BITS.
    The algorithm used is the following, and uses Karp-Markstein's trick:
    - start from x, a 33-bit approximation of 2^64/sqrt(n1), with x <= 2^64/sqrt(n1)
@@ -126,6 +282,7 @@ mpn_rsqrtrem1 (mp_limb_t a0)
    - t = n1 - y^2
    - u = (x * t) >> 33
    - y = (y << 32) + u
+
    Proof:
    * we know that Newton's iteration for the reciprocal square root,
 
@@ -159,7 +316,32 @@ mpn_sqrtrem2 (mpfr_limb_ptr sp, mpfr_limb_ptr rp, mpfr_limb_srcptr np)
   mp_limb_t x, y, t, high, low;
 
   x = mpn_rsqrtrem1 (np[1]);
-  /* x is an approximation of 2^72/sqrt(n1), x has 41 bits */
+  /* we must have x^2*n1 <= 2^72 for GMP_NUMB_BITS=32
+                         <= 2^144 for GMP_NUMB_BITS=64 */
+
+#if GMP_NUMB_BITS == 32
+  MPFR_ASSERTD ((double) x * (double) x * (double) np[1]
+                < 4722366482869645213696.0);
+  /* x is an approximation of 2^36/sqrt(n1), x has 1+20 bits */
+
+  /* We know x/2^20 <= 2^16/sqrt(n1) + 2^(-18)
+     thus n1*x/2^36 <= sqrt(n1) + 2^(-18)*n1/2^16 <= sqrt(n1) + 2^(-2). */
+
+  /* compute y = floor(np[1]*x/2^36), cutting the upper 24 bits of n1 in two
+     parts of 12 and 11 bits, which can be multiplied by x without overflow
+     (warning: if we take 12 bits from low, it might overflow with x */
+  high = np[1] >> 20; /* upper 12 bits from n1 */
+  MPFR_ASSERTD((double) high * (double) x < 4294967296.0);
+  low = (np[1] >> 9) & 0x7ff; /* next 11 bits */
+  MPFR_ASSERTD((double) low * (double) x < 4294967296.0);
+  y = high * x + ((low * x) >> 11); /* y approximates n1*x/2^20 */
+  y = (y - 0x4000) >> 16; /* the constant 0x4000 = 2^(36-2-20) takes
+                             into account the 2^(-2) error above, to ensure
+                             y <= sqrt(n1) */
+#else /* GMP_NUMB_BITS = 64 */
+  MPFR_ASSERTD ((double) x * (double) x * (double) np[1]
+                < 2.2300745198530623e43);
+  /* x is an approximation of 2^72/sqrt(n1), x has 1+40 bits */
 
   /* We know x/2^40 <= 2^32/sqrt(n1) + 3.9e-12 <= 2^32/sqrt(n1) + 2^(-37)
      thus n1*x/2^72 <= sqrt(n1) + 2^(-37)*n1/2^32 <= sqrt(n1) + 2^(-5). */
@@ -172,16 +354,26 @@ mpn_sqrtrem2 (mpfr_limb_ptr sp, mpfr_limb_ptr rp, mpfr_limb_srcptr np)
   low = (np[1] >> 17) & 0x7fffff; /* next 23 bits */
   MPFR_ASSERTD((double) low * (double) x < 18446744073709551616.0);
   y = high * x + ((low * x) >> 23); /* y approximates n1*x/2^40 */
-  /* y-1 takes into account the 2^(-31) error term, to ensure y <= sqrt(n1)
-     after the right shift below */
   y = (y - 0x8000000) >> 32; /* the constant 0x8000000 = 2^(72-5-40) takes
                                 into account the 2^(-5) error above, to ensure
                                 y <= sqrt(n1) */
+#endif
   /* y is an approximation of sqrt(n1), with y <= sqrt(n1) */
 
   t = np[1] - y * y;
   MPFR_ASSERTD((mp_limb_signed_t) t >= 0);
 
+#if GMP_NUMB_BITS == 32
+  /* we now compute t = (x * t) >> (20 + 1), but x * t might have
+     more than 32 bits thus we cut t in two parts of 12 and 11 bits */
+  high = t >> 11;
+  low = t & 0x7ff;
+  MPFR_ASSERTD((double) high * (double) x < 4294967296.0);
+  MPFR_ASSERTD((double) low * (double) x < 4294967296.0);
+  t = high * x + ((low * x) >> 11); /* approximates t*x/2^11 */
+
+  y = (y << (GMP_NUMB_BITS / 2)) + (t >> 10);
+#else
   /* we now compute t = (x * t) >> (40 + 1), but x * t might have
      more than 64 bits thus we cut t in two parts of 24 and 23 bits */
   high = t >> 23;
@@ -191,8 +383,11 @@ mpn_sqrtrem2 (mpfr_limb_ptr sp, mpfr_limb_ptr rp, mpfr_limb_srcptr np)
   t = high * x + ((low * x) >> 23); /* approximates t*x/2^23 */
 
   y = (y << (GMP_NUMB_BITS / 2)) + (t >> 18);
-  if (y < (1UL << 63))
-    y = 1UL << 63; /* the correction code below assumes y >= 2^63 */
+#endif
+
+  /* the correction code below assumes y >= 2^(GMP_NUMB_BITS - 1) */
+  if (y < (MPFR_LIMB_ONE << (GMP_NUMB_BITS - 1)))
+    y = MPFR_LIMB_ONE << (GMP_NUMB_BITS - 1);
 
   umul_ppmm (x, t, y, y);
   MPFR_ASSERTD(x < np[1] || (x == np[1] && t <= np[0])); /* y should not be too large */
@@ -206,8 +401,11 @@ mpn_sqrtrem2 (mpfr_limb_ptr sp, mpfr_limb_ptr rp, mpfr_limb_srcptr np)
       x -= 1 + (t < (2 * y + 1));
       t -= 2 * y + 1;
       y ++;
-      /* maximal number of loops observed is 3, for n1=4651405965214438987,
-         n0=18443926066120424952, average is 0.576 */
+      /* GMP_NUMB_BITS=32: average number of loops observed is 0.982,
+         max is 3 (for n1=1277869843, n0=3530774227);
+         GMP_NUMB_BITS=64: average number of loops observed is 0.593,
+         max is 3 (for n1=4651405965214438987, n0=18443926066120424952).
+      */
     }
 
   sp[0] = y;
@@ -225,12 +423,13 @@ mpfr_sqrt1 (mpfr_ptr r, mpfr_srcptr u, mpfr_rnd_t rnd_mode)
   mpfr_limb_ptr rp = MPFR_MANT(r);
 
   u0 = MPFR_MANT(u)[0];
-  if (exp_u & 1)
+  if (((unsigned int) exp_u & 1) != 0)
     {
       u0 >>= 1;
       exp_u ++;
     }
-  exp_r = exp_u >> 1;
+  MPFR_ASSERTD (((unsigned int) exp_u & 1) == 0);
+  exp_r = exp_u / 2;
 
   if (p < GMP_NUMB_BITS / 2)
     r0 = mpn_sqrtrem1 (&sb, u0) << (GMP_NUMB_BITS / 2);
@@ -309,7 +508,275 @@ mpfr_sqrt1 (mpfr_ptr r, mpfr_srcptr u, mpfr_rnd_t rnd_mode)
       MPFR_RET(1);
     }
 }
-#endif
+
+#if GMP_NUMB_BITS == 64
+/* For GMP_NUMB_BITS=64: return a (1+80)-bit approximation x = xp[1]*B+xp[0]
+   of 2^144/sqrt(ap[1]*B+ap[0]).
+   Assume ap[1] >= B/4, thus sqrt(ap[1]*B+ap[0]) >= B/2, thus x <= 2^81. */
+static void
+mpn_rsqrtrem2 (mpfr_limb_ptr xp, mpfr_limb_srcptr ap)
+{
+  mp_limb_t t1, t0, u1, u0, r0, r1, r2;
+
+  MPFR_STAT_STATIC_ASSERT (GMP_NUMB_BITS == 64);
+
+  xp[1] = mpn_rsqrtrem1 (ap[1]);
+
+  /* now we should compute x + (x/2) * (1 - a*x^2), where the upper ~40 bits of
+     a*x^2 should cancel with 1, thus we only need the following 40 bits */
+
+  /* xp[1] has 1+40 bits, with xp[1] <= 2^72/sqrt(ap[1]) */
+  umul_ppmm (t1, t0, xp[1], xp[1]);
+
+  /* now t1 has at most 18 bits, with least 16 bits being its fractional value */
+
+  u1 = ap[1] >> 48;
+  u0 = (ap[1] << 16) | (ap[0] >> 48);
+
+  /* u1 has the 16 most significant bits of a, and u0 the next 64 bits */
+
+  /* we want t1*u1 << 48 + (t1*u0+t0*u1) >> 16 + t0*u0 >> 80
+              [32 bits]       [64 bits]            [48 bits]
+     but since we know the upper ~40 bits cancel with 1, we can ignore t1*u1. */
+  umul_ppmm (r2, r1, t0, u0);
+  r0 = t1 * u0; /* we ignore the upper 16 bits of the product */
+  r1 = t0 * u1; /* we ignore the upper 16 bits of the product */
+  r0 = r0 + r1 + r2;
+
+  /* the upper ~8 bits of r0 should cancel with 1, we are interested in the next
+     40 bits */
+
+  umul_ppmm (t1, t0, xp[1], -r0);
+
+  /* we should now add t1 >> 33 at xp[1] */
+  xp[1] += t1 >> 33;
+  xp[0] = t1 << 31;
+  /* shift by 24 bits to the right, since xp[1] has 24 leading zeros,
+     and we now expect 48 */
+  xp[0] = (xp[1] << 40) | (xp[0] >> 24);
+  xp[1] = xp[1] >> 24;
+}
+
+/* Given as input ap[0-3], with B/4 <= ap[3] (where B = 2^GMP_NUMB_BITS),
+   mpn_sqrtrem4 returns a value x, 0 <= x <= 1, and stores values s in sp[0-1] and
+   r in rp[0-1] such that:
+
+   n := ap[3]*B^3 + ap[2]*B^2 + ap[1]*B + ap[0] = s^2 + x*B + r, with n < (s+1)^2
+
+   or equivalently x*B + r <= 2*s.
+
+   This code currently assumes GMP_NUMB_BITS = 64. */
+static mp_limb_t
+mpn_sqrtrem4 (mpfr_limb_ptr sp, mpfr_limb_ptr rp, mpfr_limb_srcptr ap)
+{
+  mp_limb_t x[2], t1, t0, r2, r1, h, l, u2, u1, b[4];
+
+  MPFR_STAT_STATIC_ASSERT(GMP_NUMB_BITS == 64);
+
+  mpn_rsqrtrem2 (x, ap + 2);
+
+  /* x[1]*B+x[0] is a 80-bit approximation of 2^144/sqrt(ap[3]*B+ap[2]),
+     and should be smaller */
+
+  /* first compute y0 = a*x with at least 80 bits of precision */
+
+  t1 = ap[3] >> 48;
+  t0 = (ap[3] << 16) | (ap[2] >> 48);
+
+  /* now t1:t0 is a (16+64)-bit approximation of a,
+     (x1*B+x0) * (t1*B+t0) = (x1*t1)*B^2 + (x1*t0+x0*t1)*B + x0*t0 */
+  r2 = x[1] * t1; /* r2 has 32 bits */
+  umul_ppmm (h, r1, x[1], t0);
+  r2 += h;
+  umul_ppmm (h, l, x[0], t1);
+  r1 += l;
+  r2 += h + (r1 < l);
+  umul_ppmm (h, l, x[0], t0);
+  r1 += h;
+  r2 += (r1 < h);
+
+  /* r2 has 32 bits, r1 has 64 bits, thus we have 96 bits in total, we put 64
+     bits in r2 and 16 bits in r1 */
+  r2 = (r2 << 32) | (r1 >> 32);
+  r1 = (r1 << 32) >> 48;
+
+  /* we consider y0 = r2*2^16 + r1, which has 80 bits, and should be smaller than
+     2^16*sqrt(ap[3]*B+ap[2]) */
+
+  /* Now compute y0 + (x/2)*(a - y0^2), which should give ~160 correct bits.
+     Since a - y0^2 has its ~80 most significant bits that cancel, it remains
+     only ~48 bits. */
+
+  /* now r2:r1 represents y0, with r2 of 64 bits and r1 of 16 bits,
+     and we compute y0^2, whose upper ~80 bits should cancel with a:
+     y0^2 = r2^2*2^32 + 2*r2*r1*2^16 + r1^2. */
+  t1 = r2 * r2; /* we can simply ignore the upper 64 bits of r2^2 */
+  umul_ppmm (h, l, r2, r1);
+  t0 = l << 49; /* takes into account the factor 2 in 2*r2*r1 */
+  u1 = (r1 * r1) << 32; /* temporary variable */
+  t0 += u1;
+  t1 += ((h << 49) | (l >> 15)) + (t0 < u1); /* takes into account the factor 2 */
+
+  /* now t1:t0 >> 32 equals y0^2 mod 2^96, since y0 has 160 bits, we should shift
+     t1:t0 by 64 bits to the right */
+  t0 = ap[2] - t1 - (t0 != 0); /* we round downwards to get a lower approximation
+                                  of sqrt(a) at the end */
+
+  /* now t0 equals ap[3]*B+ap[2] - ceil(y0^2/2^32) */
+
+  umul_ppmm (u2, u1, x[1], t0);
+  umul_ppmm (h,  l,  x[0], t0);
+  u1 += h;
+  u2 += (u1 < h);
+
+  /* divide by 2 to take into account the factor 1/2 in (x/2)*(a - y0^2) */
+  u1 = (u2 << 63) | (u1 >> 1);
+  u2 = u2 >> 1;
+
+  /* u2:u1 approximates (x/2)*(ap[3]*B+ap[2] - y0^2/2^32) / 2^64,
+     and should be smaller */
+
+  r1 <<= 48; /* put back the most significant bits of r1 in place */
+
+  /* add u2:u1 >> 16 to y0 */
+  sp[0] = r1 + ((u2 << 48) | (u1 >> 16));
+  sp[1] = r2 + (u2 >> 16) + (sp[0] < r1);
+
+  mpn_mul_n (b, sp, sp, 2);
+  b[2] = ap[2] - b[2] - mpn_sub_n (rp, ap, b, 2);
+
+  /* invariant: the remainder {ap, 4} - {sp, 2}^2 is b[2]*B^2 + {rp, 2} */
+
+  t0 = mpn_lshift (b, sp, 2, 1);
+
+  /* Warning: the initial {sp, 2} might be < 2^127, thus t0 might be 0. */
+
+  /* invariant: 2*{sp,2} = t0*B + {b, 2} */
+
+  /* While the remainder is greater than 2*s we should subtract 2*s+1 to the
+     remainder, and add 1 to the square root. This loop seems to be executed
+     at most twice. */
+  while (b[2] > t0 || (b[2] == t0 &&
+                       (rp[1] > b[1] || (rp[1] == b[1] && rp[0] > b[0]))))
+    {
+      /* subtract 2*s to b[2]*B^2 + {rp, 2} */
+      b[2] -= t0 + mpn_sub_n (rp, rp, b, 2);
+      /* subtract 1 to b[2]*B^2 + {rp, 2}: b[2] -= mpn_sub_1 (rp, rp, 2, 1) */
+      if (rp[0]-- == 0)
+        b[2] -= (rp[1]-- == 0);
+      /* add 1 to s */
+      mpn_add_1 (sp, sp, 2, 1);
+      /* add 2 to t0*B + {b, 2}: t0 += mpn_add_1 (b, b, 2, 2) */
+      b[0] += 2;
+      if (b[0] < 2)
+        t0 += (b[1]++ == 0);
+    }
+
+  return b[2];
+}
+
+/* Special code for GMP_NUMB_BITS < prec(r) < 2*GMP_NUMB_BITS,
+   and GMP_NUMB_BITS < prec(u) <= 2*GMP_NUMB_BITS.
+   This code should work for any value of GMP_NUMB_BITS, but since mpn_sqrtrem4
+   currently assumes GMP_NUMB_BITS=64, it only works for GMP_NUMB_BITS=64. */
+static int
+mpfr_sqrt2 (mpfr_ptr r, mpfr_srcptr u, mpfr_rnd_t rnd_mode)
+{
+  mpfr_prec_t p = MPFR_GET_PREC(r);
+  mpfr_limb_ptr up = MPFR_MANT(u), rp = MPFR_MANT(r);
+  mp_limb_t np[4], tp[2], rb, sb, mask;
+  mpfr_prec_t exp_u = MPFR_EXP(u), exp_r, sh = 2 * GMP_NUMB_BITS - p;
+
+  if (((unsigned int) exp_u & 1) != 0)
+    {
+      np[3] = up[1] >> 1;
+      np[2] = (up[1] << (GMP_NUMB_BITS-1)) | (up[0] >> 1);
+      exp_u ++;
+    }
+  else
+    {
+      np[3] = up[1];
+      np[2] = up[0];
+    }
+  MPFR_ASSERTD (((unsigned int) exp_u & 1) == 0);
+  exp_r = exp_u / 2;
+
+  np[1] = np[0] = 0;
+  sb = mpn_sqrtrem4 (rp, tp, np);
+  sb |= tp[0] | tp[1];
+  rb = rp[0] & (MPFR_LIMB_ONE << (sh - 1));
+  mask = MPFR_LIMB_MASK(sh);
+  sb |= (rp[0] & mask) ^ rb;
+  rp[0] = rp[0] & ~mask;
+
+  /* rounding */
+  if (exp_r > __gmpfr_emax)
+    return mpfr_overflow (r, rnd_mode, 1);
+
+  /* See comments in mpfr_divsp1 */
+  if (exp_r < __gmpfr_emin)
+    {
+      if (rnd_mode == MPFR_RNDN)
+        {
+          if ((exp_r == __gmpfr_emin - 1) && (rp[1] = ~MPFR_LIMB_ZERO &&
+                                              rp[0] == ~mask) && rb)
+            goto rounding; /* no underflow */
+          if (exp_r < __gmpfr_emin - 1 || (rp[1] == MPFR_LIMB_HIGHBIT &&
+                                           rp[0] == MPFR_LIMB_ZERO && sb == 0))
+            rnd_mode = MPFR_RNDZ;
+        }
+      else if (!MPFR_IS_LIKE_RNDZ(rnd_mode, 0))
+        {
+          if ((exp_r == __gmpfr_emin - 1) && (rp[1] = ~MPFR_LIMB_ZERO &&
+                                              rp[0] == ~mask) && (rb | sb))
+            goto rounding; /* no underflow */
+        }
+      return mpfr_underflow (r, rnd_mode, 1);
+    }
+
+ rounding:
+  MPFR_EXP (r) = exp_r;
+  if (rb == 0 && sb == 0)
+    {
+      MPFR_ASSERTD(exp_r >= __gmpfr_emin);
+      MPFR_ASSERTD(exp_r <= __gmpfr_emax);
+      return 0; /* idem than MPFR_RET(0) but faster */
+    }
+  else if (rnd_mode == MPFR_RNDN)
+    {
+      if (rb == 0 || (rb && sb == 0 &&
+                      (rp[0] & (MPFR_LIMB_ONE << sh)) == 0))
+        goto truncate;
+      else
+        goto add_one_ulp;
+    }
+  else if (MPFR_IS_LIKE_RNDZ(rnd_mode, 0))
+    {
+    truncate:
+      MPFR_ASSERTD(exp_r >= __gmpfr_emin);
+      MPFR_ASSERTD(exp_r <= __gmpfr_emax);
+      MPFR_RET(-1);
+    }
+  else /* round away from zero */
+    {
+    add_one_ulp:
+      rp[0] += MPFR_LIMB_ONE << sh;
+      rp[1] += rp[0] == 0;
+      if (rp[1] == 0)
+        {
+          rp[1] = MPFR_LIMB_HIGHBIT;
+          if (MPFR_UNLIKELY(exp_r + 1 > __gmpfr_emax))
+            return mpfr_overflow (r, rnd_mode, 1);
+          MPFR_ASSERTD(exp_r + 1 <= __gmpfr_emax);
+          MPFR_ASSERTD(exp_r + 1 >= __gmpfr_emin);
+          MPFR_SET_EXP (r, exp_r + 1);
+        }
+      MPFR_RET(1);
+    }
+}
+#endif /* GMP_NUMB_BITS == 64 */
+
+#endif /* !defined(MPFR_GENERIC_ABI) && (GMP_NUMB_BITS == 32 || GMP_NUMB_BITS == 64) */
 
 int
 mpfr_sqrt (mpfr_ptr r, mpfr_srcptr u, mpfr_rnd_t rnd_mode)
@@ -372,10 +839,18 @@ mpfr_sqrt (mpfr_ptr r, mpfr_srcptr u, mpfr_rnd_t rnd_mode)
     }
   MPFR_SET_POS(r);
 
-#if GMP_NUMB_BITS == 64  
+  /* See the note at the beginning of this file about __GNUC__. */
+#if !defined(MPFR_GENERIC_ABI) && defined(__GNUC__) && \
+    (GMP_NUMB_BITS == 32 || GMP_NUMB_BITS == 64)
   if (MPFR_GET_PREC (r) < GMP_NUMB_BITS && MPFR_GET_PREC (u) < GMP_NUMB_BITS)
     return mpfr_sqrt1 (r, u, rnd_mode);
-#endif  
+#endif
+
+#if !defined(MPFR_GENERIC_ABI) && GMP_NUMB_BITS == 64
+  if (GMP_NUMB_BITS < MPFR_GET_PREC (r) && MPFR_GET_PREC (r) < 2*GMP_NUMB_BITS
+      && MPFR_LIMB_SIZE(u) == 2)
+    return mpfr_sqrt2 (r, u, rnd_mode);
+#endif
 
   MPFR_TMP_MARK (marker);
   MPFR_UNSIGNED_MINUS_MODULO (sh, MPFR_GET_PREC (r));
diff --git a/tests/mpfr-test.h b/tests/mpfr-test.h
index c7d5f1057..0c94254be 100644
--- a/tests/mpfr-test.h
+++ b/tests/mpfr-test.h
@@ -214,7 +214,8 @@ extern gmp_randstate_t  mpfr_rands;
 
 typedef __gmp_randstate_struct *mpfr_gmp_randstate_ptr;
 
-/* Allocation */
+/* Memory Allocation */
+extern int tests_memory_disabled;
 void *tests_allocate (size_t);
 void *tests_reallocate (void *, size_t, size_t);
 void tests_free (void *, size_t);
diff --git a/tests/reuse.c b/tests/reuse.c
index d10b56b34..c05c5354f 100644
--- a/tests/reuse.c
+++ b/tests/reuse.c
@@ -22,8 +22,15 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
 
 #include "mpfr-test.h"
 
-#define DISP(s, t) {printf(s); mpfr_out_str(stdout, 2, 0, t, MPFR_RNDN); }
-#define DISP2(s,t) {DISP(s,t); putchar('\n');}
+#define DISP(s,t)                                       \
+  do                                                    \
+    {                                                   \
+      printf (s);                                       \
+      mpfr_out_str (stdout, 2, 0, t, MPFR_RNDN);        \
+    }                                                   \
+  while (0)
+
+#define DISP2(s,t) do { DISP(s,t); putchar ('\n'); } while (0)
 
 #define SPECIAL_MAX 12
 
@@ -569,117 +576,125 @@ pow_int (mpfr_rnd_t rnd)
 int
 main (void)
 {
-  int rnd;
+  int i, rnd;
   mpfr_prec_t p;
+
   tests_start_mpfr ();
 
-  p = (randlimb () % 200) + MPFR_PREC_MIN;
-  RND_LOOP (rnd)
-  {
-    test2a (mpfr_round, "mpfr_round", p);
-    test2a (mpfr_ceil, "mpfr_ceil", p);
-    test2a (mpfr_floor, "mpfr_floor", p);
-    test2a (mpfr_trunc, "mpfr_trunc", p);
-
-    test2ui (mpfr_add_ui, "mpfr_add_ui", p, (mpfr_rnd_t) rnd);
-    test2ui (mpfr_div_2exp, "mpfr_div_2exp", p, (mpfr_rnd_t) rnd);
-    test2ui (mpfr_div_ui, "mpfr_div_ui", p, (mpfr_rnd_t) rnd);
-    test2ui (mpfr_mul_2exp, "mpfr_mul_2exp", p, (mpfr_rnd_t) rnd);
-    test2ui (mpfr_mul_ui, "mpfr_mul_ui", p, (mpfr_rnd_t) rnd);
-    test2ui (mpfr_pow_ui, "mpfr_pow_ui", p, (mpfr_rnd_t) rnd);
-    test2ui (mpfr_sub_ui, "mpfr_sub_ui", p, (mpfr_rnd_t) rnd);
-
-    testui2 (mpfr_ui_div, "mpfr_ui_div", p, (mpfr_rnd_t) rnd);
-    testui2 (mpfr_ui_sub, "mpfr_ui_sub", p, (mpfr_rnd_t) rnd);
-    testui2 (mpfr_ui_pow, "mpfr_ui_pow", p, (mpfr_rnd_t) rnd);
-
-    test2 (mpfr_sqr, "mpfr_sqr", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_sqrt, "mpfr_sqrt", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_abs, "mpfr_abs", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_neg, "mpfr_neg", p, (mpfr_rnd_t) rnd);
-
-    test2 (mpfr_log, "mpfr_log", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_log2, "mpfr_log2", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_log10, "mpfr_log10", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_log1p, "mpfr_log1p", p, (mpfr_rnd_t) rnd);
-
-    test2 (mpfr_exp, "mpfr_exp", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_exp2, "mpfr_exp2", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_exp10, "mpfr_exp10", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_expm1, "mpfr_expm1", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_eint, "mpfr_eint", p, (mpfr_rnd_t) rnd);
-
-    test2 (mpfr_sinh, "mpfr_sinh", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_cosh, "mpfr_cosh", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_tanh, "mpfr_tanh", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_asinh, "mpfr_asinh", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_acosh, "mpfr_acosh", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_atanh, "mpfr_atanh", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_sech, "mpfr_sech", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_csch, "mpfr_csch", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_coth, "mpfr_coth", p, (mpfr_rnd_t) rnd);
-
-    test2 (mpfr_asin, "mpfr_asin", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_acos, "mpfr_acos", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_atan, "mpfr_atan", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_cos, "mpfr_cos", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_sin, "mpfr_sin", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_tan, "mpfr_tan", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_sec, "mpfr_sec", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_csc, "mpfr_csc", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_cot, "mpfr_cot", p, (mpfr_rnd_t) rnd);
-
-    test2 (mpfr_erf,  "mpfr_erf",  p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_erfc, "mpfr_erfc", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_j0,   "mpfr_j0",   p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_j1,   "mpfr_j1",   p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_y0,   "mpfr_y0",   p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_y1,   "mpfr_y1",   p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_zeta, "mpfr_zeta", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_gamma, "mpfr_gamma", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_lngamma, "mpfr_lngamma", p, (mpfr_rnd_t) rnd);
-
-    test2 (mpfr_rint, "mpfr_rint", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_rint_ceil, "mpfr_rint_ceil", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_rint_floor, "mpfr_rint_floor", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_rint_round, "mpfr_rint_round", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_rint_trunc, "mpfr_rint_trunc", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_frac, "mpfr_frac", p, (mpfr_rnd_t) rnd);
-
-    test3 (mpfr_add, "mpfr_add", p, (mpfr_rnd_t) rnd);
-    test3 (mpfr_sub, "mpfr_sub", p, (mpfr_rnd_t) rnd);
-    test3 (mpfr_mul, "mpfr_mul", p, (mpfr_rnd_t) rnd);
-    test3 (mpfr_div, "mpfr_div", p, (mpfr_rnd_t) rnd);
-
-    test3 (mpfr_agm, "mpfr_agm", p, (mpfr_rnd_t) rnd);
-    test3 (mpfr_min, "mpfr_min", p, (mpfr_rnd_t) rnd);
-    test3 (mpfr_max, "mpfr_max", p, (mpfr_rnd_t) rnd);
-
-    test3 (reldiff_wrapper, "mpfr_reldiff", p, (mpfr_rnd_t) rnd);
-    test3 (mpfr_dim, "mpfr_dim", p, (mpfr_rnd_t) rnd);
-
-    test3 (mpfr_remainder, "mpfr_remainder", p, (mpfr_rnd_t) rnd);
-    test3 (mpfr_pow, "mpfr_pow", p, (mpfr_rnd_t) rnd);
-    pow_int ((mpfr_rnd_t) rnd);
-    test3 (mpfr_atan2, "mpfr_atan2", p, (mpfr_rnd_t) rnd);
-    test3 (mpfr_hypot, "mpfr_hypot", p, (mpfr_rnd_t) rnd);
-
-    test3a (mpfr_sin_cos, "mpfr_sin_cos", p, (mpfr_rnd_t) rnd);
-
-    test4 (mpfr_fma, "mpfr_fma", p, (mpfr_rnd_t) rnd);
-    test4 (mpfr_fms, "mpfr_fms", p, (mpfr_rnd_t) rnd);
-
-    test2 (mpfr_li2, "mpfr_li2",  p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_rec_sqrt, "mpfr_rec_sqrt",  p, (mpfr_rnd_t) rnd);
-    test3 (mpfr_fmod, "mpfr_fmod", p, (mpfr_rnd_t) rnd);
-    test3a (mpfr_modf, "mpfr_modf", p, (mpfr_rnd_t) rnd);
-    test3a (mpfr_sinh_cosh, "mpfr_sinh_cosh", p, (mpfr_rnd_t) rnd);
+  for (i = 1; i <= 5; i++)
+    {
+      /* Test on i limb(s), with a random number of trailing bits. */
+      p = GMP_NUMB_BITS * i - (randlimb () % GMP_NUMB_BITS);
+      if (p < MPFR_PREC_MIN)
+        p = MPFR_PREC_MIN;
+
+      RND_LOOP (rnd)
+        {
+          test2a (mpfr_round, "mpfr_round", p);
+          test2a (mpfr_ceil, "mpfr_ceil", p);
+          test2a (mpfr_floor, "mpfr_floor", p);
+          test2a (mpfr_trunc, "mpfr_trunc", p);
+
+          test2ui (mpfr_add_ui, "mpfr_add_ui", p, (mpfr_rnd_t) rnd);
+          test2ui (mpfr_div_2exp, "mpfr_div_2exp", p, (mpfr_rnd_t) rnd);
+          test2ui (mpfr_div_ui, "mpfr_div_ui", p, (mpfr_rnd_t) rnd);
+          test2ui (mpfr_mul_2exp, "mpfr_mul_2exp", p, (mpfr_rnd_t) rnd);
+          test2ui (mpfr_mul_ui, "mpfr_mul_ui", p, (mpfr_rnd_t) rnd);
+          test2ui (mpfr_pow_ui, "mpfr_pow_ui", p, (mpfr_rnd_t) rnd);
+          test2ui (mpfr_sub_ui, "mpfr_sub_ui", p, (mpfr_rnd_t) rnd);
+
+          testui2 (mpfr_ui_div, "mpfr_ui_div", p, (mpfr_rnd_t) rnd);
+          testui2 (mpfr_ui_sub, "mpfr_ui_sub", p, (mpfr_rnd_t) rnd);
+          testui2 (mpfr_ui_pow, "mpfr_ui_pow", p, (mpfr_rnd_t) rnd);
+
+          test2 (mpfr_sqr, "mpfr_sqr", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_sqrt, "mpfr_sqrt", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_abs, "mpfr_abs", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_neg, "mpfr_neg", p, (mpfr_rnd_t) rnd);
+
+          test2 (mpfr_log, "mpfr_log", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_log2, "mpfr_log2", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_log10, "mpfr_log10", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_log1p, "mpfr_log1p", p, (mpfr_rnd_t) rnd);
+
+          test2 (mpfr_exp, "mpfr_exp", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_exp2, "mpfr_exp2", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_exp10, "mpfr_exp10", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_expm1, "mpfr_expm1", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_eint, "mpfr_eint", p, (mpfr_rnd_t) rnd);
+
+          test2 (mpfr_sinh, "mpfr_sinh", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_cosh, "mpfr_cosh", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_tanh, "mpfr_tanh", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_asinh, "mpfr_asinh", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_acosh, "mpfr_acosh", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_atanh, "mpfr_atanh", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_sech, "mpfr_sech", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_csch, "mpfr_csch", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_coth, "mpfr_coth", p, (mpfr_rnd_t) rnd);
+
+          test2 (mpfr_asin, "mpfr_asin", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_acos, "mpfr_acos", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_atan, "mpfr_atan", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_cos, "mpfr_cos", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_sin, "mpfr_sin", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_tan, "mpfr_tan", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_sec, "mpfr_sec", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_csc, "mpfr_csc", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_cot, "mpfr_cot", p, (mpfr_rnd_t) rnd);
+
+          test2 (mpfr_erf,  "mpfr_erf",  p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_erfc, "mpfr_erfc", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_j0,   "mpfr_j0",   p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_j1,   "mpfr_j1",   p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_y0,   "mpfr_y0",   p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_y1,   "mpfr_y1",   p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_zeta, "mpfr_zeta", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_gamma, "mpfr_gamma", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_lngamma, "mpfr_lngamma", p, (mpfr_rnd_t) rnd);
+
+          test2 (mpfr_rint, "mpfr_rint", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_rint_ceil, "mpfr_rint_ceil", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_rint_floor, "mpfr_rint_floor", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_rint_round, "mpfr_rint_round", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_rint_trunc, "mpfr_rint_trunc", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_frac, "mpfr_frac", p, (mpfr_rnd_t) rnd);
+
+          test3 (mpfr_add, "mpfr_add", p, (mpfr_rnd_t) rnd);
+          test3 (mpfr_sub, "mpfr_sub", p, (mpfr_rnd_t) rnd);
+          test3 (mpfr_mul, "mpfr_mul", p, (mpfr_rnd_t) rnd);
+          test3 (mpfr_div, "mpfr_div", p, (mpfr_rnd_t) rnd);
+
+          test3 (mpfr_agm, "mpfr_agm", p, (mpfr_rnd_t) rnd);
+          test3 (mpfr_min, "mpfr_min", p, (mpfr_rnd_t) rnd);
+          test3 (mpfr_max, "mpfr_max", p, (mpfr_rnd_t) rnd);
+
+          test3 (reldiff_wrapper, "mpfr_reldiff", p, (mpfr_rnd_t) rnd);
+          test3 (mpfr_dim, "mpfr_dim", p, (mpfr_rnd_t) rnd);
+
+          test3 (mpfr_remainder, "mpfr_remainder", p, (mpfr_rnd_t) rnd);
+          test3 (mpfr_pow, "mpfr_pow", p, (mpfr_rnd_t) rnd);
+          pow_int ((mpfr_rnd_t) rnd);
+          test3 (mpfr_atan2, "mpfr_atan2", p, (mpfr_rnd_t) rnd);
+          test3 (mpfr_hypot, "mpfr_hypot", p, (mpfr_rnd_t) rnd);
+
+          test3a (mpfr_sin_cos, "mpfr_sin_cos", p, (mpfr_rnd_t) rnd);
+
+          test4 (mpfr_fma, "mpfr_fma", p, (mpfr_rnd_t) rnd);
+          test4 (mpfr_fms, "mpfr_fms", p, (mpfr_rnd_t) rnd);
+
+          test2 (mpfr_li2, "mpfr_li2",  p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_rec_sqrt, "mpfr_rec_sqrt",  p, (mpfr_rnd_t) rnd);
+          test3 (mpfr_fmod, "mpfr_fmod", p, (mpfr_rnd_t) rnd);
+          test3a (mpfr_modf, "mpfr_modf", p, (mpfr_rnd_t) rnd);
+          test3a (mpfr_sinh_cosh, "mpfr_sinh_cosh", p, (mpfr_rnd_t) rnd);
 
 #if MPFR_VERSION >= MPFR_VERSION_NUM(3,0,0)
-    test2 (mpfr_ai, "mpfr_ai", p, (mpfr_rnd_t) rnd);
-    test2 (mpfr_digamma, "mpfr_digamma", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_ai, "mpfr_ai", p, (mpfr_rnd_t) rnd);
+          test2 (mpfr_digamma, "mpfr_digamma", p, (mpfr_rnd_t) rnd);
 #endif
-  }
+        }
+    }
 
   tests_end_mpfr ();
   return 0;
diff --git a/tests/tadd1sp.c b/tests/tadd1sp.c
index 43ea7fd66..00ebfbcc3 100644
--- a/tests/tadd1sp.c
+++ b/tests/tadd1sp.c
@@ -96,12 +96,14 @@ main (void)
     }                                                                   \
  while (0)
 
-#define SET_PREC(_p) \
-  { \
-    p = _p; \
-    mpfr_set_prec(a1, _p); mpfr_set_prec(a2, _p); \
-    mpfr_set_prec(b,  _p); mpfr_set_prec(c,  _p); \
-  }
+#define SET_PREC(_p)                                    \
+  do                                                    \
+    {                                                   \
+      p = _p;                                           \
+      mpfr_set_prec(a1, _p); mpfr_set_prec(a2, _p);     \
+      mpfr_set_prec(b,  _p); mpfr_set_prec(c,  _p);     \
+    }                                                   \
+  while (0)
 
 static void
 check_random (mpfr_prec_t p)
diff --git a/tests/talloc-cache.c b/tests/talloc-cache.c
new file mode 100644
index 000000000..f75b2df09
--- /dev/null
+++ b/tests/talloc-cache.c
@@ -0,0 +1,93 @@
+/* talloc-cache  -- test file concerning memory allocation and cache
+
+Copyright 2016 Free Software Foundation, Inc.
+Contributed by the AriC and Caramba projects, INRIA.
+
+This file is part of the GNU MPFR Library.
+
+The GNU MPFR Library is free software; you can redistribute it and/or modify
+it under the terms of the GNU Lesser General Public License as published by
+the Free Software Foundation; either version 3 of the License, or (at your
+option) any later version.
+
+The GNU MPFR Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+License for more details.
+
+You should have received a copy of the GNU Lesser General Public License
+along with the GNU MPFR Library; see the file COPYING.LESSER.  If not, see
+http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
+51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. */
+
+/* This test program is for temporary testing only. It is not in the
+   usual test suite (not in check_PROGRAMS from Makefile.am).
+   In the 3.1 branch, it crashes before r9467, and r9467 fixes the crash
+   (among other issues).
+   In the trunk, it crashes, which is not expected with the current code.
+*/
+
+#include <stdlib.h>
+
+#include "mpfr-test.h"
+
+#define A 4096
+#define I 10000
+
+static void *
+my_alloc1 (size_t s)
+{
+return malloc (s);
+}
+
+static void *
+my_realloc1 (void *p, size_t t, size_t s)
+{
+return realloc (p, s);
+}
+
+static void
+my_free1 (void *p, size_t t)
+{
+  return free (p);
+}
+
+static void *
+my_alloc2 (size_t s)
+{
+  return (void *) ((char *) malloc (s + A) + A);
+}
+
+static void *
+my_realloc2 (void *p, size_t t, size_t s)
+{
+  return (void *) ((char *) realloc ((char *) p - A, s + A) + A);
+}
+
+static void
+my_free2 (void *p, size_t t)
+{
+  return free ((char *) p - A);
+}
+
+int
+main (void)
+{
+  mpfr_t x;
+
+  mp_set_memory_functions (my_alloc1, my_realloc1, my_free1);
+
+  mpfr_init2 (x, 53);
+  mpfr_set_ui (x, I, MPFR_RNDN);
+  mpfr_sin (x, x, MPFR_RNDN);
+  mpfr_clear (x);
+
+  mp_set_memory_functions (my_alloc2, my_realloc2, my_free2);
+
+  mpfr_init2 (x, 1000);
+  mpfr_set_ui (x, I, MPFR_RNDN);
+  mpfr_sin (x, x, MPFR_RNDN);
+  mpfr_clear (x);
+
+  return 0;
+}
diff --git a/tests/talloc.c b/tests/talloc.c
index ee0e0d4aa..4909599ac 100644
--- a/tests/talloc.c
+++ b/tests/talloc.c
@@ -47,7 +47,11 @@ main (void)
      of memory allocation. In r9454, this crashes because the memory
      allocation function used by this macro (under the condition that
      the size is > MPFR_ALLOCA_MAX) isn't defined yet. This bug comes
-     from r8813. */
+     from r8813.
+     WARNING! Do not add MPFR calls before this test. Otherwise the
+     mentioned problem may no longer be actually tested, making this
+     test useless. For other allocation tests, it is better to use a
+     different test file. */
   mpfr_set_ui (x, 1, MPFR_RNDN);
   mpfr_set_ui (y, 2, MPFR_RNDN);
   mpfr_add (x, x, y, MPFR_RNDN);
diff --git a/tests/taway.c b/tests/taway.c
index ce1fd135f..a321fd442 100644
--- a/tests/taway.c
+++ b/tests/taway.c
@@ -22,8 +22,15 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
 
 #include "mpfr-test.h"
 
-#define DISP(s, t) {printf(s); mpfr_out_str(stdout, 2, 0, t, MPFR_RNDN); }
-#define DISP2(s,t) {DISP(s,t); putchar('\n');}
+#define DISP(s,t)                                       \
+  do                                                    \
+    {                                                   \
+      printf (s);                                       \
+      mpfr_out_str (stdout, 2, 0, t, MPFR_RNDN);        \
+    }                                                   \
+  while (0)
+
+#define DISP2(s,t) do { DISP(s,t); putchar ('\n'); } while (0)
 
 #define SPECIAL_MAX 12
 
diff --git a/tests/tcan_round.c b/tests/tcan_round.c
index 18544f6b0..c4193e222 100644
--- a/tests/tcan_round.c
+++ b/tests/tcan_round.c
@@ -22,6 +22,44 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
 
 #include "mpfr-test.h"
 
+/* Simple cases where err - prec is large enough.
+   One can get failures as a consequence of r9883. */
+static void
+test_simple (void)
+{
+  int t[4] = { 2, 3, -2, -3 };  /* test powers of 2 and non-powers of 2 */
+  int i;
+  int r1, r2;
+
+  for (i = 0; i < 4; i++)
+    RND_LOOP (r1)
+      RND_LOOP (r2)
+        {
+          mpfr_t b;
+          int p, err, prec, inex, c;
+
+          p = 12 + (randlimb() % (2 * GMP_NUMB_BITS));
+          err = p - 3;
+          prec = 4;
+          mpfr_init2 (b, p);
+          inex = mpfr_set_si (b, t[i], MPFR_RNDN);
+          MPFR_ASSERTN (inex == 0);
+          c = mpfr_can_round (b, err, (mpfr_rnd_t) r1, (mpfr_rnd_t) r2, prec);
+          /* If r1 == r2, we can round.
+             TODO: complete this test for r1 != r2. */
+          if (r1 == r2 && !c)
+            {
+              printf ("Error in test_simple for i=%d,"
+                      " err=%d r1=%s, r2=%s, p=%d\n", i, err,
+                      mpfr_print_rnd_mode ((mpfr_rnd_t) r1),
+                      mpfr_print_rnd_mode ((mpfr_rnd_t) r2), p);
+              printf ("b="); mpfr_dump (b);
+              exit (1);
+            }
+          mpfr_clear (b);
+        }
+}
+
 #define MAX_LIMB_SIZE 100
 
 static void
@@ -88,12 +126,7 @@ test_pow2 (mpfr_exp_t i, mpfr_prec_t px, mpfr_rnd_t r1, mpfr_rnd_t r2,
     MPFR_IS_LIKE_RNDU (r1, MPFR_SIGN_POS) ?
     (MPFR_IS_LIKE_RNDD (r2, MPFR_SIGN_POS) ? 0 : prec < i) :
     (r2 != MPFR_RNDN ? 0 : prec < i);
-  /* We only require mpfr_can_round to return 1 when we can really
-     round, it is allowed to return 0 in some rare boundary cases,
-     for example when x = 2^k and the error is 0.25 ulp.
-     Note: if this changes in the future, the test could be improved by
-     removing the "&& expected_b == 0" below. */
-  if (b != expected_b && expected_b == 0)
+  if (b != expected_b)
     {
       printf ("Error for x=2^%d, px=%lu, err=%d, r1=%s, r2=%s, prec=%d\n",
               (int) i, (unsigned long) px, (int) i + 1,
@@ -119,6 +152,80 @@ test_pow2 (mpfr_exp_t i, mpfr_prec_t px, mpfr_rnd_t r1, mpfr_rnd_t r2,
   mpfr_clear (x);
 }
 
+static void
+check_can_round (void)
+{
+  mpfr_t x, xinf, xsup, yinf, ysup;
+  int precx, precy, err;
+  int rnd1, rnd2;
+  int i, u[3] = { 0, 1, 256 };
+  int inex;
+  int expected, got;
+
+  mpfr_inits2 (4 * GMP_NUMB_BITS, x, xinf, xsup, yinf, ysup, (mpfr_ptr) 0);
+
+  for (precx = 3 * GMP_NUMB_BITS - 3; precx <= 3 * GMP_NUMB_BITS + 3; precx++)
+    {
+      mpfr_set_prec (x, precx);
+      for (precy = precx - 4; precy <= precx + 4; precy++)
+        {
+          mpfr_set_prec (yinf, precy);
+          mpfr_set_prec (ysup, precy);
+
+          for (i = 0; i <= 3; i++)
+            {
+              if (i <= 2)
+                {
+                  /* Test x = 2^(precx - 1) + u */
+                  mpfr_set_ui_2exp (x, 1, precx - 1, MPFR_RNDN);
+                  mpfr_add_ui (x, x, u[i], MPFR_RNDN);
+                }
+              else
+                {
+                  /* Test x = 2^precx - 1 */
+                  mpfr_set_ui_2exp (x, 1, precx, MPFR_RNDN);
+                  mpfr_sub_ui (x, x, 1, MPFR_RNDN);
+                }
+              MPFR_ASSERTN (mpfr_get_exp (x) == precx);
+
+              for (err = precy; err <= precy + 3; err++)
+                {
+                  mpfr_set_ui_2exp (xinf, 1, precx - err, MPFR_RNDN);
+                  inex = mpfr_add (xsup, x, xinf, MPFR_RNDN);
+                  MPFR_ASSERTN (inex == 0);
+                  inex = mpfr_sub (xinf, x, xinf, MPFR_RNDN);
+                  MPFR_ASSERTN (inex == 0);
+                  RND_LOOP (rnd1)
+                    RND_LOOP (rnd2)
+                      {
+                        mpfr_set (yinf, MPFR_IS_LIKE_RNDD (rnd1, 1) ?
+                                  x : xinf, (mpfr_rnd_t) rnd2);
+                        mpfr_set (ysup, MPFR_IS_LIKE_RNDU (rnd1, 1) ?
+                                  x : xsup, (mpfr_rnd_t) rnd2);
+                        expected = !! mpfr_equal_p (yinf, ysup);
+                        got = !! mpfr_can_round (x, err, (mpfr_rnd_t) rnd1,
+                                                 (mpfr_rnd_t) rnd2, precy);
+                        if (got != expected)
+                          {
+                            printf ("Error in check_can_round on:\n"
+                                    "precx=%d, precy=%d, i=%d, err=%d, "
+                                    "rnd1=%s, rnd2=%s: got %d\n",
+                                    precx, precy, i, err,
+                                    mpfr_print_rnd_mode ((mpfr_rnd_t) rnd1),
+                                    mpfr_print_rnd_mode ((mpfr_rnd_t) rnd2),
+                                    got);
+                            printf ("x="); mpfr_dump (x);
+                            exit (1);
+                          }
+                      }
+                }
+            }
+        }
+    }
+
+  mpfr_clears (x, xinf, xsup, yinf, ysup, (mpfr_ptr) 0);
+}
+
 int
 main (void)
 {
@@ -129,6 +236,8 @@ main (void)
 
   tests_start_mpfr ();
 
+  test_simple ();
+
   /* checks that rounds to nearest sets the last
      bit to zero in case of equal distance */
   mpfr_init2 (x, 59);
@@ -202,6 +311,8 @@ main (void)
 
   mpfr_clear (x);
 
+  check_can_round ();
+
   check_round_p ();
 
   tests_end_mpfr ();
diff --git a/tests/tdiv.c b/tests/tdiv.c
index b1f184e11..39cab385f 100644
--- a/tests/tdiv.c
+++ b/tests/tdiv.c
@@ -1008,6 +1008,9 @@ consistency (void)
         {
           printf ("Consistency error for i = %d (rnd = %s)\n", i,
                   mpfr_print_rnd_mode (rnd));
+          printf ("inex1=%d inex2=%d\n", inex1, inex2);
+          printf ("z1="); mpfr_dump (z1);
+          printf ("z2="); mpfr_dump (z2);
           exit (1);
         }
     }
@@ -1361,6 +1364,55 @@ testall_rndf (mpfr_prec_t pmax)
     }
 }
 
+static void
+test_mpfr_divsp2 (void)
+{
+  mpfr_t u, v, q;
+
+  /* test to exercise r2 = v1 in mpfr_divsp2 */
+  mpfr_init2 (u, 128);
+  mpfr_init2 (v, 128);
+  mpfr_init2 (q, 83);
+
+  mpfr_set_str (u, "286677858044426991425771529092412636160", 10, MPFR_RNDN);
+  mpfr_set_str (v, "241810647971575979588130185988987264768", 10, MPFR_RNDN);
+  mpfr_div (q, u, v, MPFR_RNDN);
+  mpfr_set_str (u, "5732952910203749289426944", 10, MPFR_RNDN);
+  mpfr_div_2exp (u, u, 82, MPFR_RNDN);
+  MPFR_ASSERTN(mpfr_equal_p (q, u));
+
+  mpfr_clear (u);
+  mpfr_clear (v);
+  mpfr_clear (q);
+}
+
+/* Assertion failure in r10769 with --enable-assert --enable-gmp-internals
+   (same failure in tatan on a similar example). */
+static void
+test_20160831 (void)
+{
+  mpfr_t u, v, q;
+
+  mpfr_inits2 (124, u, v, q, (mpfr_ptr) 0);
+
+  mpfr_set_ui (u, 1, MPFR_RNDN);
+  mpfr_set_str (v, "0x40000000000000005", 16, MPFR_RNDN);
+  mpfr_div (q, u, v, MPFR_RNDN);
+  mpfr_set_str (u, "0xfffffffffffffffecp-134", 16, MPFR_RNDN);
+  MPFR_ASSERTN (mpfr_equal_p (q, u));
+
+  mpfr_set_prec (u, 128);
+  mpfr_set_prec (v, 128);
+  mpfr_set_str (u, "186127091671619245460026015088243485690", 10, MPFR_RNDN);
+  mpfr_set_str (v, "205987256581218236405412302590110119580", 10, MPFR_RNDN);
+  mpfr_div (q, u, v, MPFR_RNDN);
+  mpfr_set_str (u, "19217137613667309953639458782352244736", 10, MPFR_RNDN);
+  mpfr_div_2exp (u, u, 124, MPFR_RNDN);
+  MPFR_ASSERTN (mpfr_equal_p (q, u));
+
+  mpfr_clears (u, v, q, (mpfr_ptr) 0);
+}
+
 int
 main (int argc, char *argv[])
 {
@@ -1388,9 +1440,11 @@ main (int argc, char *argv[])
   test_20070603 ();
   test_20070628 ();
   test_20151023 ();
+  test_20160831 ();
   test_generic (MPFR_PREC_MIN, 800, 50);
   test_bad ();
   test_extreme ();
+  test_mpfr_divsp2 ();
 
   tests_end_mpfr ();
   return 0;
diff --git a/tests/tests.c b/tests/tests.c
index 817a09bea..3696f405d 100644
--- a/tests/tests.c
+++ b/tests/tests.c
@@ -104,6 +104,15 @@ gmp_randstate_t  mpfr_rands;
 
 char *locale = NULL;
 
+/* Programs that test GMP's mp_set_memory_functions() need to set
+   tests_memory_disabled before calling tests_start_mpfr(). */
+#ifdef MPFR_USE_MINI_GMP
+/* disable since mini-gmp does not keep track of old_size in realloc/free */
+int tests_memory_disabled = 1;
+#else
+int tests_memory_disabled = 0;
+#endif
+
 static mpfr_exp_t default_emin, default_emax;
 
 static void tests_rand_start (void);
@@ -206,6 +215,10 @@ test_version (void)
       "    $LD_LIBRARY_PATH directory, you typically get this error.  Do\n"
       "    not use $LD_LIBRARY_PATH on such platforms; it may also break\n"
       "    other things.\n"
+      "  * You may have an ld option that specifies a library search path\n"
+      "    where MPFR can be found, taking the precedence over the path\n"
+      "    added by libtool.  Check your environment variables, such as\n"
+      "    LD_OPTIONS under Solaris.\n"
       "  * Then look at http://www.mpfr.org/mpfr-current/ for any update.\n"
       "  * Try again on a completely clean source (some errors might come\n"
       "    from a previous build or previous source changes).\n"
@@ -246,10 +259,8 @@ tests_start_mpfr (void)
   feclearexcept (FE_ALL_EXCEPT);
 #endif
 
-#ifndef MPFR_USE_MINI_GMP
-  /* disable since mini-gmp does not keep track of old_size in realloc/free */
-  tests_memory_start ();
-#endif
+  if (!tests_memory_disabled)
+    tests_memory_start ();
   tests_rand_start ();
   tests_limit_start ();
 
@@ -277,9 +288,8 @@ tests_end_mpfr (void)
   mpfr_free_cache ();
   mpfr_free_cache2 (MPFR_FREE_GLOBAL_CACHE);
   tests_rand_end ();
-#ifndef MPFR_USE_MINI_GMP
-  tests_memory_end ();
-#endif
+  if (!tests_memory_disabled)
+    tests_memory_end ();
 
 #ifdef MPFR_TESTS_DIVBYZERO
   /* Define to test the use of MPFR_ERRDIVZERO */
diff --git a/tests/tgeneric.c b/tests/tgeneric.c
index 4450df9d0..91c22ce7a 100644
--- a/tests/tgeneric.c
+++ b/tests/tgeneric.c
@@ -654,7 +654,7 @@ test_generic (mpfr_prec_t p0, mpfr_prec_t p1, unsigned int nmax)
               TGENERIC_CHECK ("should have been min MPFR number (underflow)",
                               MPFR_IS_ZERO (y));
             }
-          else if (mpfr_can_round (y, yprec, rnd, rnd, prec))
+          else if (compare == 0 || mpfr_can_round (y, yprec, rnd, rnd, prec))
             {
               ctrn++;
               mpfr_set (t, y, rnd);
diff --git a/tests/tmul.c b/tests/tmul.c
index 559c61e74..3d773856a 100644
--- a/tests/tmul.c
+++ b/tests/tmul.c
@@ -684,42 +684,57 @@ testall_rndf (mpfr_prec_t pmax)
     }
 }
 
-/* check underflow flag corresponds to *after* rounding */
+/* Check underflow flag corresponds to *after* rounding.
+ *
+ * More precisely, we want to test mpfr_mul on inputs b and c such that
+ * EXP(b*c) < emin but EXP(round(b*c, p, rnd)) = emin. Thus an underflow
+ * must not be generated.
+ */
 static void
 test_underflow (mpfr_prec_t pmax)
 {
+  mpfr_exp_t emin;
   mpfr_prec_t p;
-  mpfr_t a, b, c;
+  mpfr_t a0, a, b, c;
   int inex;
 
-  /* for RNDN, we want b*c < 0.5*2^emin but round(b*c, p) = 0.5*2^emin thus
-     b*c >= (0.5 - 1/4*ulp_p(0.5))*2^emin */
+  mpfr_init2 (a0, MPFR_PREC_MIN);
+  emin = mpfr_get_emin ();
+  mpfr_setmin (a0, emin);  /* 0.5 * 2^emin */
+
+  /* for RNDN, we want b*c < 0.5 * 2^emin but RNDN(b*c, p) = 0.5 * 2^emin,
+     thus b*c >= (0.5 - 1/4 * ulp_p(0.5)) * 2^emin */
   for (p = MPFR_PREC_MIN; p <= pmax; p++)
     {
       mpfr_init2 (a, p + 1);
       mpfr_init2 (b, p + 10);
       mpfr_init2 (c, p + 10);
-      mpfr_urandomb (b, RANDS);
-      mpfr_set_ui_2exp (a, 1, -1, MPFR_RNDZ); /* a = 0.5 */
+      do mpfr_urandomb (b, RANDS); while (MPFR_IS_ZERO (b));
+      inex = mpfr_set_ui_2exp (a, 1, -1, MPFR_RNDZ); /* a = 0.5 */
+      MPFR_ASSERTN (inex == 0);
       mpfr_nextbelow (a); /* 0.5 - 1/2*ulp_{p+1}(0.5) = 0.5 - 1/4*ulp_p(0.5) */
       inex = mpfr_div (c, a, b, MPFR_RNDU);
-      /* 0.5 - 1/4*ulp_p(0.5) <= b * c < 0.5 */
-      mpfr_mul_2si (b, b, mpfr_get_emin () / 2, MPFR_RNDZ);
-      mpfr_mul_2si (c, c, (mpfr_get_emin () - 1) / 2, MPFR_RNDZ);
+      /* 0.5 - 1/4 * ulp_p(0.5) = a <= b*c < 0.5 */
+      mpfr_mul_2si (b, b, emin / 2, MPFR_RNDZ);
+      mpfr_mul_2si (c, c, (emin - 1) / 2, MPFR_RNDZ);
+      /* now (0.5 - 1/4 * ulp_p(0.5)) * 2^emin <= b*c < 0.5 * 2^emin,
+         thus b*c should be rounded to 0.5 * 2^emin */
       mpfr_set_prec (a, p);
       mpfr_clear_underflow ();
       mpfr_mul (a, b, c, MPFR_RNDN);
-      if (!mpfr_zero_p (a) && mpfr_underflow_p ())
+      if (mpfr_underflow_p () || ! mpfr_equal_p (a, a0))
         {
-          printf ("Error, underflow flag incorrectly set for emin=%ld, rnd=%s\n",
-                  mpfr_get_emin (), mpfr_print_rnd_mode (MPFR_RNDN));
+          printf ("Error, b*c incorrect or underflow flag incorrectly set"
+                  " for emin=%" MPFR_EXP_FSPEC "d, rnd=%s\n",
+                  (mpfr_eexp_t) emin, mpfr_print_rnd_mode (MPFR_RNDN));
           printf ("b="); mpfr_dump (b);
           printf ("c="); mpfr_dump (c);
           printf ("a="); mpfr_dump (a);
           mpfr_set_prec (a, mpfr_get_prec (b) + mpfr_get_prec (c));
           mpfr_mul_2exp (b, b, 1, MPFR_RNDN);
-          mpfr_mul (a, b, c, MPFR_RNDN);
-          printf ("2*a="); mpfr_dump (a);
+          inex = mpfr_mul (a, b, c, MPFR_RNDN);
+          MPFR_ASSERTN (inex == 0);
+          printf ("Exact 2*a="); mpfr_dump (a);
           exit (1);
         }
       mpfr_clear (a);
@@ -727,41 +742,50 @@ test_underflow (mpfr_prec_t pmax)
       mpfr_clear (c);
     }
 
-  /* for RNDU, we want b*c < 0.5*2^emin but round(b*c, p) = 0.5*2^emin thus
-     b*c > (0.5 - 1/2*ulp_p(0.5))*2^emin */
+  /* for RNDU, we want b*c < 0.5*2^emin but RNDU(b*c, p) = 0.5*2^emin thus
+     b*c > (0.5 - 1/2 * ulp_p(0.5)) * 2^emin */
   for (p = MPFR_PREC_MIN; p <= pmax; p++)
     {
       mpfr_init2 (a, p);
       mpfr_init2 (b, p + 10);
       mpfr_init2 (c, p + 10);
-      mpfr_urandomb (b, RANDS);
-      mpfr_set_ui_2exp (a, 1, -1, MPFR_RNDZ); /* a = 0.5 */
-      mpfr_nextbelow (a); /* 0.5 - 1/2*ulp_p(0.5) */
+      do mpfr_urandomb (b, RANDS); while (MPFR_IS_ZERO (b));
+      inex = mpfr_set_ui_2exp (a, 1, -1, MPFR_RNDZ); /* a = 0.5 */
+      MPFR_ASSERTN (inex == 0);
+      mpfr_nextbelow (a); /* 0.5 - 1/2 * ulp_p(0.5) */
       inex = mpfr_div (c, a, b, MPFR_RNDU);
-      /* 0.5 - 1/2*ulp_p(0.5) <= b * c < 0.5 */
-      mpfr_mul_2si (b, b, mpfr_get_emin () / 2, MPFR_RNDZ);
-      mpfr_mul_2si (c, c, (mpfr_get_emin () - 1) / 2, MPFR_RNDZ);
-      if (inex)
-        mpfr_nextabove (c); /* ensures that b*c > (0.5 - 1/2*ulp_p(0.5))*2^emin */
+      /* 0.5 - 1/2 * ulp_p(0.5) <= b*c < 0.5 */
+      mpfr_mul_2si (b, b, emin / 2, MPFR_RNDZ);
+      mpfr_mul_2si (c, c, (emin - 1) / 2, MPFR_RNDZ);
+      if (inex == 0)
+        mpfr_nextabove (c); /* ensures b*c > (0.5 - 1/2 * ulp_p(0.5)) * 2^emin.
+                               Warning: for p=1, 0.5 - 1/2 * ulp_p(0.5)
+                               = 0.25, thus b*c > 2^(emin-2), which should
+                               also be rounded up with p=1 to 0.5 * 2^emin
+                               with an unbounded exponent range. */
       mpfr_clear_underflow ();
       mpfr_mul (a, b, c, MPFR_RNDU);
-      if (!mpfr_zero_p (a) && mpfr_underflow_p ())
+      if (mpfr_underflow_p () || ! mpfr_equal_p (a, a0))
         {
-          printf ("Error, underflow flag incorrectly set for emin=%ld, rnd=%s\n",
-                  mpfr_get_emin (), mpfr_print_rnd_mode (MPFR_RNDU));
+          printf ("Error, b*c incorrect or underflow flag incorrectly set"
+                  " for emin=%" MPFR_EXP_FSPEC "d, rnd=%s\n",
+                  (mpfr_eexp_t) emin, mpfr_print_rnd_mode (MPFR_RNDU));
           printf ("b="); mpfr_dump (b);
           printf ("c="); mpfr_dump (c);
           printf ("a="); mpfr_dump (a);
           mpfr_set_prec (a, mpfr_get_prec (b) + mpfr_get_prec (c));
           mpfr_mul_2exp (b, b, 1, MPFR_RNDN);
-          mpfr_mul (a, b, c, MPFR_RNDN);
-          printf ("2*a="); mpfr_dump (a);
+          inex = mpfr_mul (a, b, c, MPFR_RNDN);
+          MPFR_ASSERTN (inex == 0);
+          printf ("Exact 2*a="); mpfr_dump (a);
           exit (1);
         }
       mpfr_clear (a);
       mpfr_clear (b);
       mpfr_clear (c);
     }
+
+  mpfr_clear (a0);
 }
 
 int
diff --git a/tests/tpow_z.c b/tests/tpow_z.c
index 004435473..3fc080c6c 100644
--- a/tests/tpow_z.c
+++ b/tests/tpow_z.c
@@ -25,7 +25,7 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
 
 #include "mpfr-test.h"
 
-#define ERROR(str) do { printf("Error for "str"\n"); exit (1); } while (0)
+#define ERROR(str) do { printf ("Error for " str "\n"); exit (1); } while (0)
 
 static void
 check_special (void)
diff --git a/tests/tset_si.c b/tests/tset_si.c
index 33d6d46bf..a2cd1ca47 100644
--- a/tests/tset_si.c
+++ b/tests/tset_si.c
@@ -22,7 +22,7 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
 
 #include "mpfr-test.h"
 
-#define ERROR(str) {printf("Error for "str"\n"); exit(1);}
+#define ERROR(str) do { printf ("Error for " str "\n"); exit (1); } while (0)
 
 static void
 test_2exp (void)
@@ -33,36 +33,36 @@ test_2exp (void)
   mpfr_init2 (x, 32);
 
   mpfr_set_ui_2exp (x, 1, 0, MPFR_RNDN);
-  if (mpfr_cmp_ui(x, 1))
-    ERROR("(1U,0)");
+  if (mpfr_cmp_ui (x, 1) != 0)
+    ERROR ("(1U,0)");
 
   mpfr_set_ui_2exp (x, 1024, -10, MPFR_RNDN);
-  if (mpfr_cmp_ui(x, 1))
-    ERROR("(1024U,-10)");
+  if (mpfr_cmp_ui(x, 1) != 0)
+    ERROR ("(1024U,-10)");
 
   mpfr_set_ui_2exp (x, 1024, 10, MPFR_RNDN);
-  if (mpfr_cmp_ui(x, 1024*1024))
-    ERROR("(1024U,+10)");
+  if (mpfr_cmp_ui (x, 1024 * 1024) != 0)
+    ERROR ("(1024U,+10)");
 
   mpfr_set_si_2exp (x, -1024L * 1024L, -10, MPFR_RNDN);
-  if (mpfr_cmp_si(x, -1024))
-    ERROR("(1M,-10)");
+  if (mpfr_cmp_si (x, -1024) != 0)
+    ERROR ("(1M,-10)");
 
   mpfr_set_ui_2exp (x, 0x92345678, 16, MPFR_RNDN);
-  if (mpfr_cmp_str (x, "92345678@4", 16, MPFR_RNDN))
-    ERROR("(x92345678U,+16)");
+  if (mpfr_cmp_str (x, "92345678@4", 16, MPFR_RNDN) != 0)
+    ERROR ("(x92345678U,+16)");
 
   mpfr_set_si_2exp (x, -0x1ABCDEF0, -256, MPFR_RNDN);
-  if (mpfr_cmp_str (x, "-1ABCDEF0@-64", 16, MPFR_RNDN))
-    ERROR("(-x1ABCDEF0,-256)");
+  if (mpfr_cmp_str (x, "-1ABCDEF0@-64", 16, MPFR_RNDN) != 0)
+    ERROR ("(-x1ABCDEF0,-256)");
 
   mpfr_set_prec (x, 2);
   res = mpfr_set_si_2exp (x, 7, 10, MPFR_RNDU);
-  if (mpfr_cmp_ui (x, 1<<13) || res <= 0)
+  if (mpfr_cmp_ui (x, 1<<13) != 0 || res <= 0)
     ERROR ("Prec 2 + si_2exp");
 
   res = mpfr_set_ui_2exp (x, 7, 10, MPFR_RNDU);
-  if (mpfr_cmp_ui (x, 1<<13) || res <= 0)
+  if (mpfr_cmp_ui (x, 1<<13) != 0 || res <= 0)
     ERROR ("Prec 2 + ui_2exp");
 
   mpfr_clear_flags ();
@@ -96,6 +96,9 @@ test_macros (void)
   mpfr_ptr p;
   int r;
 
+  /* Note: the ++'s below allow one to check that the corresponding
+     arguments are evaluated only once by the macros. */
+
   mpfr_inits (x[0], x[1], x[2], (mpfr_ptr) 0);
   p = x[0];
   r = 0;
@@ -299,7 +302,7 @@ main (int argc, char *argv[])
 
   mpfr_init2 (x, 100);
 
-  N = (argc==1) ? 100000 : atol (argv[1]);
+  N = (argc == 1) ? 100000 : atol (argv[1]);
 
   for (k = 1; k <= N; k++)
     {
diff --git a/tests/tset_sj.c b/tests/tset_sj.c
index d8a039b22..a7b8299d2 100644
--- a/tests/tset_sj.c
+++ b/tests/tset_sj.c
@@ -39,7 +39,7 @@ main (void)
 
 #else
 
-#define ERROR(str) {printf("Error for "str"\n"); exit(1);}
+#define ERROR(str) do { printf ("Error for " str "\n"); exit (1); } while (0)
 
 static int
 inexact_sign (int x)
@@ -57,7 +57,7 @@ check_set_uj (mpfr_prec_t pmin, mpfr_prec_t pmax, int N)
 
   mpfr_inits2 (pmax, x, y, (mpfr_ptr) 0);
 
-  for ( p = pmin ; p < pmax ; p++)
+  for (p = pmin ; p < pmax ; p++)
     {
       mpfr_set_prec (x, p);
       mpfr_set_prec (y, p);
@@ -91,7 +91,7 @@ check_set_uj (mpfr_prec_t pmin, mpfr_prec_t pmax, int N)
     ERROR ("inexact / UINTMAX_MAX");
   inex1 = mpfr_add_ui (x, x, 1, MPFR_RNDN);
   if (inex1 != 0 || !mpfr_powerof2_raw (x)
-      || MPFR_EXP (x) != (sizeof(uintmax_t)*CHAR_BIT+1) )
+      || MPFR_EXP (x) != sizeof(uintmax_t) * CHAR_BIT + 1)
     ERROR ("power of 2");
   mpfr_set_uj (x, 0, MPFR_RNDN);
   if (!MPFR_IS_ZERO (x))
@@ -110,22 +110,22 @@ check_set_uj_2exp (void)
 
   inex = mpfr_set_uj_2exp (x, 1, 0, MPFR_RNDN);
   if (inex || mpfr_cmp_ui(x, 1))
-    ERROR("(1U,0)");
+    ERROR ("(1U,0)");
 
   inex = mpfr_set_uj_2exp (x, 1024, -10, MPFR_RNDN);
   if (inex || mpfr_cmp_ui(x, 1))
-    ERROR("(1024U,-10)");
+    ERROR ("(1024U,-10)");
 
   inex = mpfr_set_uj_2exp (x, 1024, 10, MPFR_RNDN);
   if (inex || mpfr_cmp_ui(x, 1024L * 1024L))
-    ERROR("(1024U,+10)");
+    ERROR ("(1024U,+10)");
 
   inex = mpfr_set_uj_2exp (x, MPFR_UINTMAX_MAX, 1000, MPFR_RNDN);
   inex |= mpfr_div_2ui (x, x, 1000, MPFR_RNDN);
   inex |= mpfr_add_ui (x, x, 1, MPFR_RNDN);
   if (inex || !mpfr_powerof2_raw (x)
-      || MPFR_EXP (x) != (sizeof(uintmax_t)*CHAR_BIT+1) )
-    ERROR("(UINTMAX_MAX)");
+      || MPFR_EXP (x) != sizeof(uintmax_t) * CHAR_BIT + 1)
+    ERROR ("(UINTMAX_MAX)");
 
   inex = mpfr_set_uj_2exp (x, MPFR_UINTMAX_MAX, MPFR_EMAX_MAX-10, MPFR_RNDN);
   if (inex == 0 || !mpfr_inf_p (x))
@@ -149,8 +149,8 @@ check_set_sj (void)
   inex = mpfr_set_sj (x, -MPFR_INTMAX_MAX, MPFR_RNDN);
   inex |= mpfr_add_si (x, x, -1, MPFR_RNDN);
   if (inex || mpfr_sgn (x) >=0 || !mpfr_powerof2_raw (x)
-      || MPFR_EXP (x) != (sizeof(intmax_t)*CHAR_BIT) )
-    ERROR("set_sj (-INTMAX_MAX)");
+      || MPFR_EXP (x) != sizeof(intmax_t) * CHAR_BIT)
+    ERROR ("set_sj (-INTMAX_MAX)");
 
   inex = mpfr_set_sj (x, 1742, MPFR_RNDN);
   if (inex || mpfr_cmp_ui (x, 1742))
@@ -169,8 +169,8 @@ check_set_sj_2exp (void)
 
   inex = mpfr_set_sj_2exp (x, MPFR_INTMAX_MIN, 1000, MPFR_RNDN);
   if (inex || mpfr_sgn (x) >=0 || !mpfr_powerof2_raw (x)
-      || MPFR_EXP (x) != (sizeof(intmax_t)*CHAR_BIT+1000) )
-    ERROR("set_sj_2exp (INTMAX_MIN)");
+      || MPFR_EXP (x) != sizeof(intmax_t) * CHAR_BIT + 1000)
+    ERROR ("set_sj_2exp (INTMAX_MIN)");
 
   mpfr_clear (x);
 }
diff --git a/tests/tsi_op.c b/tests/tsi_op.c
index f7c94bc88..6442a6663 100644
--- a/tests/tsi_op.c
+++ b/tests/tsi_op.c
@@ -27,14 +27,16 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
    tgeneric_ui.c should probably be replaced by tgeneric.c,
    with some changes, since tgeneric.c does more checks. */
 
-#define ERROR1(s, i, z, exp) \
-{\
-  printf("Error for "s" and i=%d\n", i);\
-  printf("Expected %s\n", exp);\
-  printf("Got      "); mpfr_out_str (stdout, 16, 0, z, MPFR_RNDN);\
-  putchar ('\n');\
-  exit(1);\
-}
+#define ERROR1(s,i,z,exp)                                               \
+  do                                                                    \
+    {                                                                   \
+      printf ("Error for " s " and i=%d\n", i);                         \
+      printf ("Expected %s\n", exp);                                    \
+      printf ("Got      "); mpfr_out_str (stdout, 16, 0, z, MPFR_RNDN); \
+      putchar ('\n');                                                   \
+      exit(1);                                                          \
+    }                                                                   \
+  while (0)
 
 const struct {
   const char * op1;
diff --git a/tests/tsqrt.c b/tests/tsqrt.c
index f70b0d735..533d5420d 100644
--- a/tests/tsqrt.c
+++ b/tests/tsqrt.c
@@ -66,20 +66,20 @@ check3 (const char *as, mpfr_rnd_t rnd_mode, const char *qs)
 }
 
 static void
-check4 (const char *as, mpfr_rnd_t rnd_mode, const char *Qs)
+check4 (const char *as, mpfr_rnd_t rnd_mode, const char *qs)
 {
   mpfr_t q;
 
   mpfr_init2 (q, 53);
   mpfr_set_str1 (q, as);
   test_sqrt (q, q, rnd_mode);
-  if (mpfr_cmp_str (q, Qs, 16, MPFR_RNDN))
+  if (mpfr_cmp_str (q, qs, 16, MPFR_RNDN))
     {
       printf ("mpfr_sqrt failed for a=%s, rnd_mode=%s\n",
-              as, mpfr_print_rnd_mode(rnd_mode));
-      printf ("expected ");
+              as, mpfr_print_rnd_mode (rnd_mode));
+      printf ("expected %s\ngot      ", qs);
       mpfr_out_str (stdout, 16, 0, q, MPFR_RNDN);
-      printf ("\ngot      %s\n", Qs);
+      printf ("\n");
       mpfr_clear (q);
       exit (1);
     }
diff --git a/tests/tversion.c b/tests/tversion.c
index 45eed5042..723791b47 100644
--- a/tests/tversion.c
+++ b/tests/tversion.c
@@ -66,6 +66,81 @@ main (void)
   printf (COMP2 "%s\n", __VERSION__);
 #endif
 
+  /************** More information about the C implementation **************/
+
+  /* The following macros are currently used by src/mpfr-cvers.h and/or
+     src/mpfr-impl.h; they may have an influcence on how MPFR is compiled. */
+
+#if defined(__STDC__) || defined(__STDC_VERSION__)
+  printf ("[tversion] C standard: __STDC__ = "
+#if defined(__STDC__)
+          MAKE_STR(__STDC__)
+#else
+          "undef"
+#endif
+          ", __STDC_VERSION__ = "
+#if defined(__STDC_VERSION__)
+          MAKE_STR(__STDC_VERSION__)
+#else
+          "undef"
+#endif
+          "\n");
+#endif
+
+#if defined(__GNUC__)
+  printf ("[tversion] __GNUC__ = " MAKE_STR(__GNUC__) ", __GNUC_MINOR__ = "
+#if defined(__GNUC_MINOR__)
+          MAKE_STR(__GNUC_MINOR__)
+#else
+          "undef"
+#endif
+          "\n");
+#endif
+
+#if defined(__ICC) || defined(__INTEL_COMPILER)
+  printf ("[tversion] Intel compiler: __ICC = "
+#if defined(__ICC)
+          MAKE_STR(__ICC)
+#else
+          "undef"
+#endif
+          ", __INTEL_COMPILER = "
+#if defined(__INTEL_COMPILER)
+          MAKE_STR(__INTEL_COMPILER)
+#else
+          "undef"
+#endif
+          "\n");
+#endif
+
+#if defined(_WIN32) || defined(_MSC_VER)
+  printf ("[tversion] MS Windows: _WIN32 = "
+#if defined(_WIN32)
+          MAKE_STR(_WIN32)
+#else
+          "undef"
+#endif
+          ", _MSC_VER = "
+#if defined(_MSC_VER)
+          MAKE_STR(_MSC_VER)
+#else
+          "undef"
+#endif
+          "\n");
+#endif
+
+#if defined(__GLIBC__)
+  printf ("[tversion] __GLIBC__ = " MAKE_STR(__GLIBC__) ", __GLIBC_MINOR__ = "
+#if defined(__GLIBC_MINOR__)
+          MAKE_STR(__GLIBC_MINOR__)
+#else
+          "undef"
+#endif
+          "\n");
+#endif
+
+  /*************************************************************************/
+
 #ifdef __MPIR_VERSION
   printf ("[tversion] MPIR: header %d.%d.%d, library %s\n",
           __MPIR_VERSION, __MPIR_VERSION_MINOR, __MPIR_VERSION_PATCHLEVEL,
diff --git a/tests/tzeta.c b/tests/tzeta.c
index 4362127e1..9fb5a640e 100644
--- a/tests/tzeta.c
+++ b/tests/tzeta.c
@@ -415,7 +415,9 @@ main (int argc, char *argv[])
   mpfr_clear (y);
   mpfr_clear (z);
 
-  test_generic (MPFR_PREC_MIN, 70, 5);
+  /* FIXME: change the last argument back to 5 once the mpfr_zeta issue
+     has been found (see TODO). */
+  test_generic (MPFR_PREC_MIN, 70, 1);
   test2 ();
 
   tests_end_mpfr ();
diff --git a/tools/coverage b/tools/coverage
index 8d897db36..993aaf1ce 100755
--- a/tools/coverage
+++ b/tools/coverage
@@ -3,6 +3,13 @@
 # to compute the coverage of mpfr-x.y.z, just copy this script
 # into mpfr-x.y.z/tools and run it
 
+# Warning! Do not run this script on a machine shared with other users,
+# otherwise your account can easily be compromised:
+#   http://debbugs.gnu.org/cgi/bugreport.cgi?bug=21951
+#
+# To avoid being affected by this libtool bug, this script could be modified
+# to use a subdirectory of /tmp/ompfr-gcov for the MPFR build.
+
 # Set up the right directoy
 cd $(dirname $0)/..
 
diff --git a/tools/mbench/mfv5-mpfr.cc b/tools/mbench/mfv5-mpfr.cc
index bdc8018aa..c36ed029d 100644
--- a/tools/mbench/mfv5-mpfr.cc
+++ b/tools/mbench/mfv5-mpfr.cc
@@ -128,19 +128,23 @@ public:
   }
 };
 
+#ifdef mpfr_fmma
 class mpfr_fmma_test {
 public:
   int func(mpfr_ptr a, mpfr_srcptr b, mpfr_srcptr c, mpfr_srcptr d, mpfr_srcptr e, mp_rnd_t r) {
     return mpfr_fmma (a,b,c,d,e,r);
   }
 };
+#endif
 
+#ifdef mpfr_fmms
 class mpfr_fmms_test {
 public:
   int func(mpfr_ptr a, mpfr_srcptr b, mpfr_srcptr c, mpfr_srcptr d, mpfr_srcptr e, mp_rnd_t r) {
     return mpfr_fmms (a,b,c,d,e,r);
   }
 };
+#endif
 
 class mpfr_div_test {
 public:
@@ -264,8 +268,12 @@ static mpfr_test<mpfr_sub_test> test2 ("mpfr_sub");
 static mpfr_test<mpfr_mul_test> test3 ("mpfr_mul");
 static mpfr_test3<mpfr_fma_test> test10 ("mpfr_fma");
 static mpfr_test3<mpfr_fms_test> test11 ("mpfr_fms");
+#ifdef mpfr_fmma
 static mpfr_test4<mpfr_fmma_test> test12 ("mpfr_fmma");
+#endif
+#ifdef mpfr_fmms
 static mpfr_test4<mpfr_fmms_test> test13 ("mpfr_fmms");
+#endif
 static mpfr_test<mpfr_div_test> test4 ("mpfr_div");
 static mpfr_test<mpfr_set_test> test5 ("mpfr_set");
 
diff --git a/tools/mbench/mfv5.cc b/tools/mbench/mfv5.cc
index 4612b08da..b3588426a 100644
--- a/tools/mbench/mfv5.cc
+++ b/tools/mbench/mfv5.cc
@@ -27,7 +27,7 @@ http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
 
 #define USAGE                                                           \
  "Bench functions for Pentium (V5++).\n"                                \
- __FILE__" " __DATE__" " __TIME__" GCC "__VERSION__ "\n"                \
+ __FILE__ " " __DATE__ " " __TIME__ " GCC " __VERSION__ "\n"		\
  "Usage: mfv5 [-pPREC] [-sSEED] [-mSIZE] [-iPRIO] [-lLIST] [-xEXPORT_BASE] [-XIMPORT_BASE] [-rROUNDING_MODE] tests ...\n"
 
 using namespace std;
diff --git a/tools/mpfrlint b/tools/mpfrlint
index 30538fea5..3cd884770 100755
--- a/tools/mpfrlint
+++ b/tools/mpfrlint
@@ -185,6 +185,26 @@ do
     }' $file
 done
 
+# Macros of the form:
+#   #define FOO { ... }
+# may be unsafe and could yield obscure failures where writing "FOO;" as
+# this is here a block followed by a null statement. The following form
+# is preferred in most of the cases:
+#   #define FOO do { ... } while (0)
+# so that "FOO;" is a single statement.
+# To avoid false positives, a comment can be inserted, e.g.:
+#   #define FOO /* initializer */ { 0, 17 }
+for file in $srctests
+do
+  err-if-output "Missing 'do ... while (0)'" perl -e '
+    while (<>) {
+      my $s = $_;
+      while ($s =~ s/\\\n//) { $s .= <> }
+      $s =~ /^#\s*define\s+\w+(\([^)]*\))?\s*{/
+        and $s =~ tr/ \t/ /s, print "$ARGV: $s";
+    }' $file
+done
+
 # Do not use snprintf as it is not available in ISO C90.
 # Even on platforms where it is available, the prototype
 # may not be included (e.g. with gcc -ansi), so that the
@@ -221,6 +241,14 @@ grep '(mp_limb_t) *-\?[01]' $srctests | grep -v '#define MPFR_LIMB_' | \
   err-if-output -t "Use simple mp_limb_t constants" \
     grep -v MPFR_STAT_STATIC_ASSERT
 
+# Code possibly wrong with some C/GMP implementations. In short, if the
+# right operand of a shift depends on GMP_NUMB_BITS, then the left operand
+# should be of type mp_limb_t. There might be false positives.
+err-if-output \
+  --msg="Use a constant of type mp_limb_t instead of unsigned long?" \
+  -t "Suspicious code" \
+  grep -E '[^A_Za-z_][0-9]+[UL]* *<<.*GMP_NUMB_BITS' src/*.c
+
 for file in $srctests */Makefile.am acinclude.m4 configure.ac
 do
   # Note: this is one less that the POSIX minimum limit in case
@@ -259,7 +287,7 @@ grep __gmp_ tests/*.c | \
   err-if-output -t "__gmp_" grep -v '^tests/tabort_defalloc'
 
 grep -E '[^a-z_](m|re)alloc *\(' tests/*.c | \
-  err-if-output -t "alloc" grep -v '^tests/memory.c:'
+  err-if-output -t "alloc" grep -Ev '^tests/(memory|talloc-cache).c:'
 
 err-if-output --dir=doc "check-typography" ./check-typography
author	vlefevre <vlefevre@280ebfd0-de03-0410-8827-d642c229c3f4>	2016-09-02 07:13:38 +0000
committer	vlefevre <vlefevre@280ebfd0-de03-0410-8827-d642c229c3f4>	2016-09-02 07:13:38 +0000
commit	b4df47ef18e9e516e90a97d2e6626e4dc493d813 (patch)
tree	a39018105250ca3eda936cdd57b32fbfe1fef45e
parent	61816b69da7f34b565572ec787a6302205d3256d (diff)
download	mpfr-b4df47ef18e9e516e90a97d2e6626e4dc493d813.tar.gz