Copyright 1999-2016 Free Software Foundation, Inc. Contributed by the AriC and Caramba projects, INRIA. This file is part of the GNU MPFR Library. The GNU MPFR Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. The GNU MPFR Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU MPFR Library; see the file COPYING.LESSER. If not, see http://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. Table of contents: 1. Documentation 2. Compiler/library detection 3. Changes in existing functions 4. New functions to implement 5. Efficiency 6. Miscellaneous 7. Portability ############################################################################## 1. Documentation ############################################################################## - add a description of the algorithms used and a proof of correctness ############################################################################## 2. Compiler/library detection ############################################################################## - if we want to distinguish GMP and MPIR, we can check at configure time the following symbols which are only defined in MPIR: #define __MPIR_VERSION 0 #define __MPIR_VERSION_MINOR 9 #define __MPIR_VERSION_PATCHLEVEL 0 There is also a library symbol mpir_version, which should match VERSION, set by configure, for example 0.9.0. - update ICC detection. * Use only __INTEL_COMPILER instead of the obsolete macro __ICC? * Possibly enable some features with ICC. For instance, removing "&& !defined(__ICC)" from src/mpfr.h works with ICC 15.0.0 20140723. ############################################################################## 3. Changes in existing functions ############################################################################## - export mpfr_overflow and mpfr_underflow as public functions - many functions currently taking into account the precision of the *input* variable to set the initial working precison (acosh, asinh, cosh, ...). This is nonsense since the "average" working precision should only depend on the precision of the *output* variable (and maybe on the *value* of the input in case of cancellation). -> remove those dependencies from the input precision. - mpfr_can_round: change the meaning of the 2nd argument (err). Currently the error is at most 2^(MPFR_EXP(b)-err), i.e. err is the relative shift wrt the most significant bit of the approximation. I propose that the error is now at most 2^err ulps of the approximation, i.e. 2^(MPFR_EXP(b)-MPFR_PREC(b)+err). - mpfr_set_q first tries to convert the numerator and the denominator to mpfr_t. But this conversion may fail even if the correctly rounded result is representable. New way to implement: Function q = a/b. nq = PREC(q) na = PREC(a) nb = PREC(b) If na < nb a <- a*2^(nb-na) n <- na-nb+ (HIGH(a,nb) >= b) if (n >= nq) bb <- b*2^(n-nq) a = q*bb+r --> q has exactly n bits. else aa <- a*2^(nq-n) aa = q*b+r --> q has exactly n bits. If RNDN, takes nq+1 bits. (See also the new division function). - revisit the conversion functions between a MPFR number and a native floating-point value. * Consequences if some exception is trapped? * Specify under which conditions (current rounding direction and precision of the FPU, whether a format has been recognized...), correct rounding is guaranteed. Fix the code if need be. Do not forget subnormals. * Provide mpfr_buildopt_* functions to tell whether the format of a native type (float / double / long double) has been recognized and which format it is? * For functions that return a native floating-point value (mpfr_get_flt, mpfr_get_d, mpfr_get_ld, mpfr_get_decimal64), raise exception flags with feraiseexcept(), when supported. * For testing the lack of subnormal support: see the -mfpu GCC option for ARM and https://en.wikipedia.org/wiki/Denormal_number#Disabling_denormal_floats_at_the_code_level ############################################################################## 4. New functions to implement ############################################################################## - implement mpfr_get_decimal128 and mpfr_set_decimal128 - implement mpfr_log_ui to compute log(n) for an unsigned long n. We can write for argument reduction n = 2^k * n/2^k, where 2/3 <= n/2^k < 4/3, i.e., k = floor(log2(3n))-1, thus log(n) = k*log(2) + log(n/2^k), and we can use binary splitting on the Taylor expansion of log(1+x) to compute log(n/2^k), where at most p*log(2)/log(3) terms are needed for precision p. Other idea (from Fredrik Johansson): compute log(m) + log(n/m) where m=2^a*3^b*5^c*7^d and m is close to n. - implement mpfr_q_sub, mpfr_z_div, mpfr_q_div? - implement mpfr_pow_q and variants with two integers (native or mpz) instead of a rational? See IEEE P1788. - implement functions for random distributions, see for example https://sympa.inria.fr/sympa/arc/mpfr/2010-01/msg00034.html (suggested by Charles Karney , 18 Jan 2010): * a Bernoulli distribution with prob p/q (exact) * a general discrete distribution (i with prob w[i]/sum(w[i]) (Walker algorithm, but make it exact) * a uniform distribution in (a,b) * exponential distribution (mean lambda) (von Neumann's method?) * normal distribution (mean m, s.d. sigma) (ratio method?) - wanted for Magma [John Cannon , Tue, 19 Apr 2005]: HypergeometricU(a,b,s) = 1/gamma(a)*int(exp(-su)*u^(a-1)*(1+u)^(b-a-1), u=0..infinity) JacobiThetaNullK PolylogP, PolylogD, PolylogDold: see http://arxiv.org/abs/math.CA/0702243 and the references herein. JBessel(n, x) = BesselJ(n+1/2, x) KBessel, KBessel2 [2nd kind] JacobiTheta LogIntegral ExponentialIntegralEn (formula 5.1.4 of Abramowitz and Stegun) DawsonIntegral GammaD(x) = Gamma(x+1/2) - new functions of IEEE 754-2008, and more generally functions of the C binding draft TS 18661-4: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1946.pdf - functions defined in the LIA-2 standard + minimum and maximum (5.2.2): max, min, max_seq, min_seq, mmax_seq and mmin_seq (mpfr_min and mpfr_max correspond to mmin and mmax); + rounding_rest, floor_rest, ceiling_rest (5.2.4); + remr (5.2.5): x - round(x/y) y; + error functions from 5.2.7 (if useful in MPFR); + power1pm1 (5.3.6.7): (1 + x)^y - 1; + logbase (5.3.6.12): \log_x(y); + logbase1p1p (5.3.6.13): \log_{1+x}(1+y); + rad (5.3.9.1): x - round(x / (2 pi)) 2 pi = remr(x, 2 pi); + axis_rad (5.3.9.1) if useful in MPFR; + cycle (5.3.10.1): rad(2 pi x / u) u / (2 pi) = remr(x, u); + axis_cycle (5.3.10.1) if useful in MPFR; + sinu, cosu, tanu, cotu, secu, cscu, cossinu, arcsinu, arccosu, arctanu, arccotu, arcsecu, arccscu (5.3.10.{2..14}): sin(x 2 pi / u), etc.; [from which sinpi(x) = sin(Pi*x), ... are trivial to implement, with u=2.] + arcu (5.3.10.15): arctan2(y,x) u / (2 pi); + rad_to_cycle, cycle_to_rad, cycle_to_cycle (5.3.11.{1..3}). - From GSL, missing special functions (if useful in MPFR): (cf http://www.gnu.org/software/gsl/manual/gsl-ref.html#Special-Functions) + The Airy functions Ai(x) and Bi(x) defined by the integral representations: * Ai(x) = (1/\pi) \int_0^\infty \cos((1/3) t^3 + xt) dt * Bi(x) = (1/\pi) \int_0^\infty (e^(-(1/3) t^3) + \sin((1/3) t^3 + xt)) dt * Derivatives of Airy Functions + The Bessel functions for n integer and n fractional: * Regular Modified Cylindrical Bessel Functions I_n * Irregular Modified Cylindrical Bessel Functions K_n * Regular Spherical Bessel Functions j_n: j_0(x) = \sin(x)/x, j_1(x)= (\sin(x)/x-\cos(x))/x & j_2(x)= ((3/x^2-1)\sin(x)-3\cos(x)/x)/x Note: the "spherical" Bessel functions are solutions of x^2 y'' + 2 x y' + [x^2 - n (n+1)] y = 0 and satisfy j_n(x) = sqrt(Pi/(2x)) J_{n+1/2}(x). They should not be mixed with the classical Bessel Functions, also noted j0, j1, jn, y0, y1, yn in C99 and mpfr. Cf https://en.wikipedia.org/wiki/Bessel_function#Spherical_Bessel_functions *Irregular Spherical Bessel Functions y_n: y_0(x) = -\cos(x)/x, y_1(x)= -(\cos(x)/x+\sin(x))/x & y_2(x)= (-3/x^3+1/x)\cos(x)-(3/x^2)\sin(x) * Regular Modified Spherical Bessel Functions i_n: i_l(x) = \sqrt{\pi/(2x)} I_{l+1/2}(x) * Irregular Modified Spherical Bessel Functions: k_l(x) = \sqrt{\pi/(2x)} K_{l+1/2}(x). + Clausen Function: Cl_2(x) = - \int_0^x dt \log(2 \sin(t/2)) Cl_2(\theta) = \Im Li_2(\exp(i \theta)) (dilogarithm). + Dawson Function: \exp(-x^2) \int_0^x dt \exp(t^2). + Debye Functions: D_n(x) = n/x^n \int_0^x dt (t^n/(e^t - 1)) + Elliptic Integrals: * Definition of Legendre Forms: F(\phi,k) = \int_0^\phi dt 1/\sqrt((1 - k^2 \sin^2(t))) E(\phi,k) = \int_0^\phi dt \sqrt((1 - k^2 \sin^2(t))) P(\phi,k,n) = \int_0^\phi dt 1/((1 + n \sin^2(t))\sqrt(1 - k^2 \sin^2(t))) * Complete Legendre forms are denoted by K(k) = F(\pi/2, k) E(k) = E(\pi/2, k) * Definition of Carlson Forms RC(x,y) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1) RD(x,y,z) = 3/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-3/2) RF(x,y,z) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2) RJ(x,y,z,p) = 3/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2) (t+p)^(-1) + Elliptic Functions (Jacobi) + N-relative exponential: exprel_N(x) = N!/x^N (\exp(x) - \sum_{k=0}^{N-1} x^k/k!) + exponential integral: E_2(x) := \Re \int_1^\infty dt \exp(-xt)/t^2. Ei_3(x) = \int_0^x dt \exp(-t^3) for x >= 0. Ei(x) := - PV(\int_{-x}^\infty dt \exp(-t)/t) + Hyperbolic/Trigonometric Integrals Shi(x) = \int_0^x dt \sinh(t)/t Chi(x) := Re[ \gamma_E + \log(x) + \int_0^x dt (\cosh[t]-1)/t] Si(x) = \int_0^x dt \sin(t)/t Ci(x) = -\int_x^\infty dt \cos(t)/t for x > 0 AtanInt(x) = \int_0^x dt \arctan(t)/t [ \gamma_E is the Euler constant ] + Fermi-Dirac Function: F_j(x) := (1/r\Gamma(j+1)) \int_0^\infty dt (t^j / (\exp(t-x) + 1)) + Pochhammer symbol (a)_x := \Gamma(a + x)/\Gamma(a) : see [Smith01] in algorithms.bib logarithm of the Pochhammer symbol + Gegenbauer Functions + Laguerre Functions + Eta Function: \eta(s) = (1-2^{1-s}) \zeta(s) Hurwitz zeta function: \zeta(s,q) = \sum_0^\infty (k+q)^{-s}. + Lambert W Functions, W(x) are defined to be solutions of the equation: W(x) \exp(W(x)) = x. This function has multiple branches for x < 0 (2 funcs W0(x) and Wm1(x)) + Trigamma Function psi'(x). and Polygamma Function: psi^{(m)}(x) for m >= 0, x > 0. - functions from ISO/IEC 24747:2009 (Extensions to the C Library, to Support Mathematical Special Functions). Standard: http://www.iso.org/iso/catalogue_detail.htm?csnumber=38857 Draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1292.pdf Rationale: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1244.pdf See also: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3060.pdf (similar, for C++). Also check whether the functions that are already implemented in MPFR match this standard. - from gnumeric (www.gnome.org/projects/gnumeric/doc/function-reference.html): - beta (also incomplete beta function, see message from Martin Maechler on 18 Jan 2016, and Section 6.6 in Abramowitz & Stegun) - betaln - degrees - radians - sqrtpi - mpfr_inp_raw, mpfr_out_raw (cf mail "Serialization of mpfr_t" from Alexey and answer from Granlund on mpfr list, May 2007) - [maybe useful for SAGE] implement companion frac_* functions to the rint_* functions. For example mpfr_frac_floor(x) = x - floor(x). (The current mpfr_frac function corresponds to mpfr_rint_trunc.) - scaled erfc (https://sympa.inria.fr/sympa/arc/mpfr/2009-05/msg00054.html) - asec, acsc, acot, asech, acsch and acoth (mail from Björn Terelius on mpfr list, 18 June 2009) ############################################################################## 5. Efficiency ############################################################################## - for exp(x), Fredrik Johansson reports a 20% speed improvement starting from 4000 bits, and up to a 75% memory improvement in his Arb implementation, by using recursive instead of iterative binary splitting: https://github.com/fredrik-johansson/arb/blob/master/elefun/exp_sum_bs_powtab.c - improve mpfr_grandom using the algorithm in http://arxiv.org/abs/1303.6257 - use the src/x86_64/corei5/mparam.h file once GMP recognizes correctly the Core i5 processors (note that gcc -mtune=native gives __tune_corei7__ and not __tune_corei5__ on those processors) - implement a mpfr_sqrthigh algorithm based on Mulders' algorithm, with a basecase variant - use mpn_div_q to speed up mpfr_div. However mpn_div_q, which is new in GMP 5, is not documented in the GMP manual, thus we are not sure it guarantees to return the same quotient as mpn_tdiv_qr. Also mpfr_div uses the remainder computed by mpn_divrem. A workaround would be to first try with mpn_div_q, and if we cannot (easily) compute the rounding, then use the current code with mpn_divrem. - compute exp by using the series for cosh or sinh, which has half the terms (see Exercise 4.11 from Modern Computer Arithmetic, version 0.3) The same method can be used for log, using the series for atanh, i.e., atanh(x) = 1/2*log((1+x)/(1-x)). - improve mpfr_gamma (see https://code.google.com/p/fastfunlib/). A possible idea is to implement a fast algorithm for the argument reconstruction gamma(x+k): instead of performing k products by x+i, we could precompute x^2, ..., x^m for m ~ sqrt(k), and perform only sqrt(k) products. One could also use the series for 1/gamma(x), see for example http://dlmf.nist.gov/5/7/ or formula (36) from http://mathworld.wolfram.com/GammaFunction.html - improve the computation of Bernoulli numbers: instead of computing just one B[2n] at a time in mpfr_bernoulli_internal, we could compute several at a time, sharing the expensive computation of the 1/p^(2n) series. - fix regression with mpfr_mpz_root (from Keith Briggs, 5 July 2006), for example on 3Ghz P4 with gmp-4.2, x=12.345: prec=50000 k=2 k=3 k=10 k=100 mpz_root 0.036 0.072 0.476 7.628 mpfr_mpz_root 0.004 0.004 0.036 12.20 See also mail from Carl Witty on mpfr list, 09 Oct 2007. - for sparse input (say x=1 with 2 bits), mpfr_exp is not faster than for full precision when precision <= MPFR_EXP_THRESHOLD. The reason is that argument reduction kills sparsity. Maybe avoid argument reduction for sparse input? - speed up mpfr_atan for large arguments (to speed up mpc_log) see FR #6198 - improve mpfr_sin on values like ~pi (do not compute sin from cos, because of the cancellation). For instance, reduce the input modulo pi/2 in [-pi/4,pi/4], and define auxiliary functions for which the argument is assumed to be already reduced (so that the sin function can avoid unnecessary computations by calling the auxiliary cos function instead of the full cos function). This will require a native code for sin, for example using the reduction sin(3x)=3sin(x)-4sin(x)^3. See https://sympa.inria.fr/sympa/arc/mpfr/2007-08/msg00001.html and the following messages. - improve generic.c to work for number of terms <> 2^k - rewrite mpfr_greater_p... as native code. - mpf_t uses a scheme where the number of limbs actually present can be less than the selected precision, thereby allowing low precision values (for instance small integers) to be stored and manipulated in an mpf_t efficiently. Perhaps mpfr should get something similar, especially if looking to replace mpf with mpfr, though it'd be a major change. Alternately perhaps those mpfr routines like mpfr_mul where optimizations are possible through stripping low zero bits or limbs could check for that (this would be less efficient but easier). - try the idea of the paper "Reduced Cancellation in the Evaluation of Entire Functions and Applications to the Error Function" by W. Gawronski, J. Mueller and M. Reinhard, to be published in SIAM Journal on Numerical Analysis: to avoid cancellation in say erfc(x) for x large, they compute the Taylor expansion of erfc(x)*exp(x^2/2) instead (which has less cancellation), and then divide by exp(x^2/2) (which is simpler to compute). - replace the *_THRESHOLD macros by global (TLS) variables that can be changed at run time (via a function, like other variables)? One benefit is that users could use a single MPFR binary on several machines (e.g., a library provided by binary packages or shared via NFS) with different thresholds. On the default values, this would be a bit less efficient than the current code, but this isn't probably noticeable (this should be tested). Something like: long *mpfr_tune_get(void) to get the current values (the first value is the size of the array). int mpfr_tune_set(long *array) to set the tune values. int mpfr_tune_run(long level) to find the best values (the support for this feature is optional, this can also be done with an external function). - better distinguish different processors (for example Opteron and Core 2) and use corresponding default tuning parameters (as in GMP). This could be done in configure.ac to avoid hacking config.guess, for example define MPFR_HAVE_CORE2. Note (VL): the effect on cross-compilation (that can be a processor with the same architecture, e.g. compilation on a Core 2 for an Opteron) is not clear. The choice should be consistent with the build target (e.g. -march or -mtune value with gcc). Also choose better default values. For instance, the default value of MPFR_MUL_THRESHOLD is 40, while the best values that have been found are between 11 and 19 for 32 bits and between 4 and 10 for 64 bits! - during the Many Digits competition, we noticed that (our implantation of) Mulders short product was slower than a full product for large sizes. This should be precisely analyzed and fixed if needed. - for various functions, check the timings as a function of the magnitude of the input (and the input and/or output precisions?), and use better thresholds for asymptotic expansions. - improve the special case of mpfr_{add,sub} (x, x, y, ...) when |x| > |y| to do the addition in-place and have a complexity of O(prec(y)) in most cases. The mpfr_{add,sub}_{d,ui} functions should automatically benefit from this change. - in gmp_op.c, for functions with mpz_srcptr, check whether mpz_fits_slong_p is really useful in all cases (see TODO in this file). ############################################################################## 6. Miscellaneous ############################################################################## - [suggested by Tobias Burnus and Asher Langton , Wed, 01 Aug 2007] support quiet and signaling NaNs in mpfr: * functions to set/test a quiet/signaling NaN: mpfr_set_snan, mpfr_snan_p, mpfr_set_qnan, mpfr_qnan_p * correctly convert to/from double (if encoding of s/qNaN is fixed in 754R) Note: Signaling NaNs are not specified by the ISO C standard and may not be supported by the implementation. GCC needs the -fsignaling-nans option (but this does not affect the C library, which may or may not accept signaling NaNs). - check again coverage: on 2007-07-27, Patrick Pelissier reports that the following files are not tested at 100%: add1.c, atan.c, atan2.c, cache.c, cmp2.c, const_catalan.c, const_euler.c, const_log2.c, cos.c, gen_inverse.h, div_ui.c, eint.c, exp3.c, exp_2.c, expm1.c, fma.c, fms.c, lngamma.c, gamma.c, get_d.c, get_f.c, get_ld.c, get_str.c, get_z.c, inp_str.c, jn.c, jyn_asympt.c, lngamma.c, mpfr-gmp.c, mul.c, mul_ui.c, mulders.c, out_str.c, pow.c, print_raw.c, rint.c, root.c, round_near_x.c, round_raw_generic.c, set_d.c, set_ld.c, set_q.c, set_uj.c, set_z.c, sin.c, sin_cos.c, sinh.c, sqr.c, stack_interface.c, sub1.c, sub1sp.c, subnormal.c, uceil_exp2.c, uceil_log2.c, ui_pow_ui.c, urandomb.c, yn.c, zeta.c, zeta_ui.c. - check the constants mpfr_set_emin (-16382-63) and mpfr_set_emax (16383) in get_ld.c and the other constants, and provide a testcase for large and small numbers. - from Kevin Ryde : Also for pi.c, a pre-calculated compiled-in pi to a few thousand digits would be good value I think. After all, say 10000 bits using 1250 bytes would still be small compared to the code size! Store pi in round to zero mode (to recover other modes). - add other prototypes for round to nearest-away (mpfr_round_nearest_away only deals with the prototypes of say mpfr_sin) or implement it as a native rounding mode - add a new roundind mode: round to odd. If the result is not exactly representable, then round to the odd mantissa. This rounding has the nice property that for k > 1, if: y = round(x, p+k, TO_ODD) z = round(y, p, TO_NEAREST_EVEN), then z = round(x, p, TO_NEAREST_EVEN) so it avoids the double-rounding problem. VL: I prefer the (original?) term "sticky rounding", as used in J Strother Moore, Tom Lynch, Matt Kaufmann. A Mechanically Checked Proof of the Correctness of the Kernel of the AMD5K86 Floating-Point Division Algorithm. IEEE Transactions on Computers, 1996. and http://www.russinoff.com/libman/text/node26.html - new rounding mode MPFR_RNDE when the result is known to be exact? * In normal mode, this would allow MPFR to optimize using this information. * In debug mode, MPFR would check that the result is exact (i.e. that the ternary value is 0). - add tests of the ternary value for constants - When doing Extensive Check (--enable-assert=full), since all the functions use a similar use of MACROS (ZivLoop, ROUND_P), it should be possible to do such a scheme: For the first call to ROUND_P when we can round. Mark it as such and save the approximated rounding value in a temporary variable. Then after, if the mark is set, check if: - we still can round. - The rounded value is the same. It should be a complement to tgeneric tests. - in div.c, try to find a case for which cy != 0 after the line cy = mpn_sub_1 (sp + k, sp + k, qsize, cy); (which should be added to the tests), e.g. by having {vp, k} = 0, or prove that this cannot happen. - add a configure test for --enable-logging to ignore the option if it cannot be supported. Modify the "configure --help" description to say "on systems that support it". - add generic bad cases for functions that don't have an inverse function that is implemented (use a single Newton iteration). - add bad cases for the internal error bound (by using a dichotomy between a bad case for the correct rounding and some input value with fewer Ziv iterations?). - add an option to use a 32-bit exponent type (int) on LP64 machines, mainly for developers, in order to be able to test the case where the extended exponent range is the same as the default exponent range, on such platforms. Tests can be done with the exp-int branch (added on 2010-12-17, and many tests fail at this time). - test underflow/overflow detection of various functions (in particular mpfr_exp) in reduced exponent ranges, including ranges that do not contain 0. - add an internal macro that does the equivalent of the following? MPFR_IS_ZERO(x) || MPFR_GET_EXP(x) <= value - check whether __gmpfr_emin and __gmpfr_emax could be replaced by a constant (see README.dev). Also check the use of MPFR_EMIN_MIN and MPFR_EMAX_MAX. - add a test checking that no mpfr.h macros depend on mpfr-impl.h (the current tests cannot check that since mpfr-impl.h is always included). - move some macro definitions from acinclude.m4 to the m4 directory as suggested by the Automake manual? The reason is that the acinclude.m4 file is big and a bit difficult to read. - use symbol versioning. - check whether mpz_t caching is necessary. Timings with -static with details about the C / C library implementation should be put somewhere as a comment in the source or in the doc. Using -static is important because otherwise the cache saves the dynamic call to mpz_init and mpz_clear; so, what we're measuring is not clear. See thread: https://gmplib.org/list-archives/gmp-devel/2015-September/004147.html Summary: It will not be integrated in GMP because 1) This yields problems with threading (in MPFR, we have TLS variables, but this is not the case of GMP). 2) The gain (if confirmed with -static) would be due to a poor malloc implementation (timings would depend on the platform). 3) Applications would use more RAM. Additional notes [VL]: the major differences in the timings given by Patrick in 2014-01 were: Before: arccos(x) took 0.054689 ms (32767 eval in 1792 ms) arctan(x) took 0.042116 ms (32767 eval in 1380 ms) After: arccos(x) took 0.043580 ms (32767 eval in 1428 ms) arctan(x) took 0.035401 ms (32767 eval in 1160 ms) mpfr_acos doesn't use mpz, but calls mpfr_atan, so that the issue comes from mpfr_atan, which uses mpz a lot. But there are very few mpz_init and mpz_clear (just at the end). So, the differences in the timings are very surprising, even with a bad malloc implementation. Or is the reason due to that due to mpz_t caching, the mpz_t's are preallocated with a size that is large enough, thus avoiding internal reallocations in GMP (with memory copies)? In such a case, mpz_t caching would not be the solution, but mpz_init2 (see GMP manual, 3.11 "Efficiency"). - in tsum, add testcases for mpfr_sum triggering the bug fixed in r9722, that is, with a large error during the computation of the secondary term (when the TMD occurs). - add internal or public variants of some basic functions (+, -, *) with mpz_t as the exponent for correctly rounded polynomials (like fmma), in order to avoid internal overflow and underflow in extreme cases? Alternatively, for fmma, modify add*.c and sub*.c code to define mpfr_add_raw, which takes arrays of limbs and their precision, and the positive exponent delta (as mpfr_uexp_t to be always representable). The exponent delta should be sufficient for now since it can be bounded by MPFR_PREC_MAX+1 if need be. ############################################################################## 7. Portability ############################################################################## - add a web page with results of builds on different architectures - [Kevin about texp.c long strings] For strings longer than c99 guarantees, it might be cleaner to introduce a "tests_strdupcat" or something to concatenate literal strings into newly allocated memory. I thought I'd done that in a couple of places already. Arrays of chars are not much fun. - use https://gcc.gnu.org/viewcvs/gcc/trunk/config/stdint.m4 for mpfr-gmp.h - with MinGW, build with -D__USE_MINGW_ANSI_STDIO by default?