summaryrefslogtreecommitdiff
path: root/sysdeps/x86_64/fpu/multiarch
Commit message (Collapse)AuthorAgeFilesLines
* x86-64: Add vector exp10/exp10f implementation to libmvecSunil K Pandey2021-12-2918-0/+2330
| | | | | | | | Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp10/exp10f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector exp2/exp2f implementation to libmvecSunil K Pandey2021-12-2918-0/+2006
| | | | | | | | Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp2/exp2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector hypot/hypotf implementation to libmvecSunil K Pandey2021-12-2918-0/+1864
| | | | | | | | Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector hypot/hypotf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector asin/asinf implementation to libmvecSunil K Pandey2021-12-2918-0/+1902
| | | | | | | | Implement vectorized asin/asinf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector asin/asinf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector atan/atanf implementation to libmvecSunil K Pandey2021-12-2918-0/+1454
| | | | | | | | Implement vectorized atan/atanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan/atanf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector acos/acosf implementation to libmvecSunil K Pandey2021-12-2219-0/+2024
| | | | | | | | Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add sysdeps/x86_64/fpu/MakeconfigH.J. Lu2021-10-201-48/+20
| | | | | | | | | | 1. Add sysdeps/x86_64/fpu/Makeconfig to auto-generate libmvec.mk, which contains libmvec ABI test dependencies and CFLAGS, in the build directory. 2. Include libmvec.mk for libmvec ABI test dependencies and CFLAGS. Tested on SSE4, AVX, AVX2 and AVX512 machines. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* Add narrowing fma functionsJoseph Myers2021-09-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the narrowing fused multiply-add functions from TS 18661-1 / TS 18661-3 / C2X to glibc's libm: ffma, ffmal, dfmal, f32fmaf64, f32fmaf32x, f32xfmaf64 for all configurations; f32fmaf64x, f32fmaf128, f64fmaf64x, f64fmaf128, f32xfmaf64x, f32xfmaf128, f64xfmaf128 for configurations with _Float64x and _Float128; __f32fmaieee128 and __f64fmaieee128 aliases in the powerpc64le case (for calls to ffmal and dfmal when long double is IEEE binary128). Corresponding tgmath.h macro support is also added. The changes are mostly similar to those for the other narrowing functions previously added, especially that for sqrt, so the description of those generally applies to this patch as well. As with sqrt, I reused the same test inputs in auto-libm-test-in as for non-narrowing fma rather than adding extra or separate inputs for narrowing fma. The tests in libm-test-narrow-fma.inc also follow those for non-narrowing fma. The non-narrowing fma has a known bug (bug 6801) that it does not set errno on errors (overflow, underflow, Inf * 0, Inf - Inf). Rather than fixing this or having narrowing fma check for errors when non-narrowing does not (complicating the cases when narrowing fma can otherwise be an alias for a non-narrowing function), this patch does not attempt to check for errors from narrowing fma and set errno; the CHECK_NARROW_FMA macro is still present, but as a placeholder that does nothing, and this missing errno setting is considered to be covered by the existing bug rather than needing a separate open bug. missing-errno annotations are duly added to many of the auto-libm-test-in test inputs for fma. This completes adding all the new functions from TS 18661-1 to glibc, so will be followed by corresponding stdc-predef.h changes to define __STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__, as the support for TS 18661-1 will be at a similar level to that for C standard floating-point facilities up to C11 (pragmas not implemented, but library functions done). (There are still further changes to be done to implement changes to the types of fromfp functions from N2548.) Tested as followed: natively with the full glibc testsuite for x86_64 (GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC 11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32 hard float, mips64 (all three ABIs, both hard and soft float). The different GCC versions are to cover the different cases in tgmath.h and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in glibc headers, GCC 7 has proper _Float* support, GCC 8 adds __builtin_tgmath).
* Redirect fma calls to __fma in libmJoseph Myers2021-09-152-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | include/math.h has a mechanism to redirect internal calls to various libm functions, that can often be inlined by the compiler, to call non-exported __* names for those functions in the case when the calls aren't inlined, with the redirection being disabled when NO_MATH_REDIRECT. Add fma to the functions to which this mechanism is applied. At present, libm-internal fma calls (generally to __builtin_fma* functions) are only done when it's known the call will be inlined, with alternative code not relying on an fma operation being used in the caller otherwise. This patch is in preparation for adding the TS 18661 / C2X narrowing fma functions to glibc; it will be natural for the narrowing function implementations to call the underlying fma functions unconditionally, with this either being inlined or resulting in an __fma* call. (Using two levels of round-to-odd computation like that, in the case where there isn't an fma hardware instruction, isn't optimal but is certainly a lot simpler for the initial implementation than writing different narrowing fma implementations for all the various pairs of formats.) Tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by the patch (using <https://sourceware.org/pipermail/libc-alpha/2021-September/130991.html> to fix installed library stripping in build-many-glibcs.py). Also tested for x86_64.
* Remove "Contributed by" linesSiddhesh Poyarekar2021-09-039-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | We stopped adding "Contributed by" or similar lines in sources in 2012 in favour of git logs and keeping the Contributors section of the glibc manual up to date. Removing these lines makes the license header a bit more consistent across files and also removes the possibility of error in attribution when license blocks or files are copied across since the contributed-by lines don't actually reflect reality in those cases. Move all "Contributed by" and similar lines (Written by, Test by, etc.) into a new file CONTRIBUTED-BY to retain record of these contributions. These contributors are also mentioned in manual/contrib.texi, so we just maintain this additional record as a courtesy to the earlier developers. The following scripts were used to filter a list of files to edit in place and to clean up the CONTRIBUTED-BY file respectively. These were not added to the glibc sources because they're not expected to be of any use in future given that this is a one time task: https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* x86-64: Remove assembler AVX512DQ checkH.J. Lu2021-08-2412-96/+0
| | | | | The minimum GNU binutils requirement is 2.25 which supports AVX512DQ. Remove assembler AVX512DQ check.
* x86-64: Optimize load of all bits set into ZMM register [BZ #28252]H.J. Lu2021-08-2210-64/+11
| | | | | | | | | | | | | | | | Optimize loads of all bits set into ZMM register in AVX512 SVML codes by replacing vpbroadcastq .L_2il0floatpacket.16(%rip), %zmmX and vmovups .L_2il0floatpacket.13(%rip), %zmmX with vpternlogd $0xff, %zmmX, %zmmX, %zmmX This fixes BZ #28252.
* x86_64: roundeven with sse4.1 supportShen-Ta Hsieh2021-06-277-2/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the sse4.1 hardware floating point roundeven. Here is some benchmark results on my systems: =AMD Ryzen 9 3900X 12-Core Processor= * benchmark result before this commit | | roundeven | roundevenf | |------------|--------------|--------------| | duration | 3.75587e+09 | 3.75114e+09 | | iterations | 3.93053e+08 | 4.35402e+08 | | max | 52.592 | 58.71 | | min | 7.98 | 7.22 | | mean | 9.55563 | 8.61535 | * benchmark result after this commit | | roundeven | roundevenf | |------------|---------------|--------------| | duration | 3.73815e+09 | 3.73738e+09 | | iterations | 5.82692e+08 | 5.91498e+08 | | max | 56.468 | 51.642 | | min | 6.27 | 6.156 | | mean | 6.41532 | 6.3185 | =Intel(R) Pentium(R) CPU D1508 @ 2.20GHz= * benchmark result before this commit | | roundeven | roundevenf | |------------|--------------|--------------| | duration | 2.18208e+09 | 2.18258e+09 | | iterations | 2.39932e+08 | 2.46924e+08 | | max | 96.378 | 98.035 | | min | 6.776 | 5.94 | | mean | 9.09456 | 8.83907 | * benchmark result after this commit | | roundeven | roundevenf | |------------|--------------|--------------| | duration | 2.17415e+09 | 2.17005e+09 | | iterations | 3.56193e+08 | 4.09824e+08 | | max | 51.693 | 97.192 | | min | 5.926 | 5.093 | | mean | 6.10385 | 5.29507 | Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* math: Remove mpa files [BZ #15267]Wilco Dijkstra2021-03-1118-183/+3
| | | | | | | Finally remove all mpa related files, headers, declarations, probes, unused tables and update makefiles. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-02153-153/+153
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* ieee754: Remove unused __sin32 and __cos32Anssi Hannula2020-12-184-8/+0
| | | | | The __sin32 and __cos32 functions were only used in the now removed slow path of asin and acos.
* x86-64: Fix FMA4 detection in ifunc [BZ #26534]Ondřej Hošek2020-09-021-1/+1
| | | | | | A typo in commit 107e6a3c2212ba7a3a4ec7cae8d82d73f7c95d0b causes the FMA4 code path to be taken on systems that support FMA, even if they do not support FMA4. Fix this to detect FMA4.
* x86: Support usable check for all CPU featuresH.J. Lu2020-07-139-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support usable check for all CPU features with the following changes: 1. Change struct cpu_features to struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; so that there is a usable bit for each cpuid bit. 2. After the cpuid bits have been initialized, copy the known bits to the usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for CPU feature detection. 3. Clear the usable bits which require OS support. 4. If the feature is supported by OS, copy its cpuid bit to its usable bit. 5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE and CPU_FEATURE_USABLE_P to check if a feature is usable. 6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13. 7. Unset MPX feature since it has been deprecated. The results are 1. If the feature is known and doesn't requre OS support, its usable bit is copied from the cpuid bit. 2. Otherwise, its usable bit is copied from the cpuid bit only if the feature is known to supported by OS. 3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the feature can be used. 4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports the feature.
* Add libm_alias_finite for _finite symbolsWilco Dijkstra2020-01-0310-22/+28
| | | | | | | | | | | | | | | | | | | This patch adds a new macro, libm_alias_finite, to define all _finite symbol. It sets all _finite symbol as compat symbol based on its first version (obtained from the definition at built generated first-versions.h). The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need special treatment in code that is shared between long double and float128. It is done by adding a list, similar to internal symbol redifinition, on sysdeps/ieee754/float128/float128_private.h. Alpha also needs some tricky changes to ensure we still emit 2 compat symbols for sqrt(f). Passes buildmanyglibc. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Update copyright dates with scripts/update-copyrights.Joseph Myers2020-01-01153-153/+153
|
* Always use wordsize-64 version of s_trunc.c.Stefan Liebler2019-12-111-1/+1
| | | | | | | | | | This patch replaces s_trunc.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 and sparc64 files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Always use wordsize-64 version of s_ceil.c.Stefan Liebler2019-12-111-1/+1
| | | | | | | | | | This patch replaces s_ceil.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 and sparc64 files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Always use wordsize-64 version of s_floor.c.Stefan Liebler2019-12-111-1/+1
| | | | | | | | | | This patch replaces s_floor.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 and sparc64 files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Always use wordsize-64 version of s_rint.c.Stefan Liebler2019-12-111-1/+1
| | | | | | | | | | This patch replaces s_rint.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 file. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Always use wordsize-64 version of s_nearbyint.c.Stefan Liebler2019-12-111-1/+1
| | | | | | | | | | This patch replaces s_nearbyint.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 file. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Remove x64 _finite tests and referencesWilco Dijkstra2019-10-2118-48/+48
| | | | | | | | | Remove _finite tests and references from x86_64. Rather than calling __exp_finite, use exp directly (since it's the same entry point). x86_64 builds and passes testsuite. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Prefer https to http for gnu.org and fsf.org URLsPaul Eggert2019-09-07153-153/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '*.po' \ ! -name 'ChangeLog*' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
* Update copyright dates with scripts/update-copyrights.Joseph Myers2019-01-01153-153/+153
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* x86-64: Remove s_sincosf-sse2.SH.J. Lu2018-12-262-2/+2
| | | | | | | | | | | | | | The current s_sincosf.c is faster than s_sincosf-sse2.S. On Broadwell with FMA disabled, bench-sincosf shows: Before After Improvement max 154.032 114.517 34% min 6.25 5.609 11% mean 14.8728 12.8589 15% * sysdeps/x86_64/fpu/s_sincosf.S: Removed. * sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.c: New file.
* x86-64: Vectorize sincosf_poly and update s_sincosf-fma.cH.J. Lu2018-12-261-270/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add <sincosf_poly.h> and include it in s_sincosf.h to allow vectorized sincosf_poly. Add x86 sincosf_poly.h to vectorize sincosf_poly. On Broadwell, bench-sincosf shows: Before After Improvement max 160.273 114.198 40% min 6.25 5.625 11% mean 13.0325 10.6462 22% Vectorized sincosf_poly shows Before After Improvement max 138.653 114.198 21% min 5.004 5.625 -11% mean 11.5934 10.6462 9% Tested on x86-64 and i686 as well as with build-many-glibcs.py. * sysdeps/ieee754/flt-32/s_sincosf.h: Include <sincosf_poly.h>. (sincos_t, sincosf_poly, sinf_poly): Moved to ... * sysdeps/ieee754/flt-32/sincosf_poly.h: Here. New file. * sysdeps/x86/fpu/s_sincosf_data.c: New file. * sysdeps/x86/fpu/sincosf_poly.h: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Just include <sysdeps/ieee754/flt-32/s_sincosf.c>.
* Remove the error handling wrapper from powSzabolcs Nagy2018-11-214-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new pow symbol version that doesn't do SVID compatible error handling. The standard errno and fp exception based error handling is inline in the new code and does not have significant overhead. The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty w_pow.c and enabled for targets with their own pow implementation or ifunc dispatch on __ieee754_pow by including math/w_pow.c. The compatibility symbol version still uses the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). On targets where previously powl was an alias of pow, now it points to the compatibility symbol with the wrapper, because it still need the SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g. arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well. The __pow_finite symbol is now an alias of pow. Both __pow_finite and pow set errno and thus not const functions. The ia64 asm is changed so the compat and new symbol versions map to the same address. On x86_64 #include <math.h> was added before macro definitions that may affect that header. Tested with build-many-glibcs.py. * math/Versions (GLIBC_2.29): Add pow. * math/w_pow_compat.c (__pow_compat): Change to versioned compat symbol. * math/w_pow.c: New file. * sysdeps/i386/fpu/w_pow.c: New file. * sysdeps/ia64/fpu/e_pow.S: Add versioned symbols. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Rename to __pow and add necessary aliases. * sysdeps/ieee754/dbl-64/w_pow.c: New file. * sysdeps/m68k/m680x0/fpu/w_pow.c: New file. * sysdeps/mach/hurd/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Update. * sysdeps/unix/sysv/linux/arm/libm.abilist: Update. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update. * sysdeps/unix/sysv/linux/sh/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__ieee754_pow): Rename to __pow. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__ieee754_pow): Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow.c (__ieee754_pow): Likewise. * sysdeps/x86_64/fpu/multiarch/w_pow.c: New file.
* Remove the error handling wrapper from logSzabolcs Nagy2018-11-215-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new log symbol version that doesn't do SVID compatible error handling. The standard errno and fp exception based error handling is inline in the new code and does not have significant overhead. The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty w_log.c and enabled for targets with their own log implementation by including math/w_log.c. The compatibility symbol version still uses the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). On targets where previously logl was an alias of log, now it points to the compatibility symbol with the wrapper, because it still need the SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g. arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well. The __log_finite symbol is now an alias of log. Both __log_finite and log set errno and thus not const functions. The ia64 asm is changed so the compat and new symbol versions map to the same address. On x86_64 #include <math.h> was added before macro definitions that may affect that header. Tested with build-many-glibcs.py. * math/Versions (GLIBC_2.29): Add log. * math/w_log_compat.c (__log_compat): Change to versioned compat symbol. * math/w_log.c: New file. * sysdeps/i386/fpu/w_log.c: New file. * sysdeps/ia64/fpu/e_log.S: Update. * sysdeps/ieee754/dbl-64/e_log.c (__ieee754_log): Rename to __log and add necessary aliases. * sysdeps/ieee754/dbl-64/w_log.c: New file. * sysdeps/m68k/m680x0/fpu/w_log.c: New file. * sysdeps/mach/hurd/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Update. * sysdeps/unix/sysv/linux/arm/libm.abilist: Update. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update. * sysdeps/unix/sysv/linux/sh/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update. * sysdeps/x86_64/fpu/multiarch/e_log-avx.c (__ieee754_log): Rename to __log. * sysdeps/x86_64/fpu/multiarch/e_log-fma.c (__ieee754_log): Likewise. * sysdeps/x86_64/fpu/multiarch/e_log-fma4.c (__ieee754_log): Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c (__ieee754_log): Likewise. * sysdeps/x86_64/fpu/multiarch/w_log.c: New file.
* Remove the error handling wrapper from exp and exp2Szabolcs Nagy2018-11-215-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new exp and exp2 symbol version that don't do SVID compatible error handling. The standard errno and fp exception based error handling is inline in the new code and does not have significant overhead. The double precision wrappers are disabled for sysdeps/ieee754/dbl-64 by using empty w_exp.c and w_exp2.c files, the math/w_exp.c and math/w_exp2.c files use the wrapper template and can be included by targets that have their own exp and exp2 implementations or use ifunc on the glibc internal __ieee754_exp symbol. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). On targets where previously expl and exp2l were aliases of exp and exp2, now they point to the compatibility symbols with the wrapper, because they still need the SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well. The _finite symbols are now aliases of the standard symbols (they have no performance advantage anymore). Both the standard symbols and _finite symbols set errno and thus not const functions. The ia64 asm is changed so the compat and new symbol versions map to the same address. On x86_64 #include <math.h> was added before macro definitions that may affect that header (the new macro name is __exp instead of __ieee754_exp which breaks some math.h macros). Tested with build-many-glibcs.py. * math/Versions (GLIBC_2.29): Add exp and exp2. * math/w_exp2_compat.c (__exp2_compat): Change to versioned compat symbol, handle NO_LONG_DOUBLE and LONG_DOUBLE_COMPAT explicitly. * math/w_exp_compat.c (__exp_compat): Likewise. * math/w_exp.c: New file. * math/w_exp2.c: New file. * sysdeps/i386/fpu/w_exp.c: New file. * sysdeps/i386/fpu/w_exp2.c: New file. * sysdeps/ia64/fpu/e_exp.S: Add versioned symbols. * sysdeps/ia64/fpu/e_exp2.S: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Rename to __exp and add necessary aliases. * sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Rename to __exp2 and add necessary aliases. * sysdeps/ieee754/dbl-64/w_exp.c: New file. * sysdeps/ieee754/dbl-64/w_exp2.c: New file. * sysdeps/m68k/m680x0/fpu/w_exp.c: New file. * sysdeps/m68k/m680x0/fpu/w_exp2.c: New file. * sysdeps/mach/hurd/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Update. * sysdeps/unix/sysv/linux/arm/libm.abilist: Update. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update. * sysdeps/unix/sysv/linux/sh/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__exp1): Remove. (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__exp1): Remove. (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__exp1): Remove. (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/e_exp.c (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/w_exp.c: New file.
* Use trunc functions not __trunc functions in glibc libm.Joseph Myers2018-09-202-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __trunc functions to call the corresponding trunc names instead, with asm redirection to __trunc when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_truncf.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise. * sysdeps/ieee754/float128/s_truncf128.c: Likewise. * sysdeps/ieee754/dbl-64/s_trunc.c: Likewise. * sysdeps/ieee754/flt-32/s_truncf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise. (ceil): Redirect to __ceil. (floor): Redirect to __floor. (trunc): Redirect to __trunc. (__truncl): Call trunc instead of __trunc. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc): Remove macro. [_ARCH_PWR5X] (__truncf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use trunc functions instead of __trunc variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise.
* Add new pow implementationSzabolcs Nagy2018-09-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The algorithm is exp(y * log(x)), where log(x) is computed with about 1.3*2^-68 relative error (1.5*2^-68 without fma), returning the result in two doubles, and the exp part uses the same algorithm (and lookup tables) as exp, but takes the input as two doubles and a sign (to handle negative bases with odd integer exponent). The __exp1 internal symbol is no longer necessary. There is separate code path when fma is not available but the worst case error is about 0.54 ULP in both cases. The lookup table and consts for log are 4168 bytes. The .rodata+.text is decreased by 37908 bytes on aarch64. The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: pow thruput: 2.40x in [0.01 11.1]x[0.01 11.1] pow latency: 1.84x in [0.01 11.1]x[0.01 11.1] Tested on aarch64-linux-gnu (defined __FP_FAST_FMA, TOINT_INTRINSICS) and arm-linux-gnueabihf (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and x86_64-linux-gnu (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and powerpc64le-linux-gnu (defined __FP_FAST_FMA, !TOINT_INTRINSICS) targets. * NEWS: Mention pow improvements. * math/Makefile (type-double-routines): Add e_pow_log_data. * sysdeps/generic/math_private.h (__exp1): Remove. * sysdeps/i386/fpu/e_pow_log_data.c: New file. * sysdeps/ia64/fpu/e_pow_log_data.c: New file. * sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma contraction. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove. (exp_inline): Remove. (__ieee754_exp): Only single double input is handled. * sysdeps/ieee754/dbl-64/e_pow.c: Rewrite. * sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file. * sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define. (__pow_log_data): Define. * sysdeps/ieee754/dbl-64/upow.h: Remove. * sysdeps/ieee754/dbl-64/upow.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file. * sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma contraction. (CFLAGS-e_pow-fma4.c): Likewise.
* Use ceil functions not __ceil functions in glibc libm.Joseph Myers2018-09-172-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __ceil functions to call the corresponding ceil names instead, with asm redirection to __ceil when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_ceilf.c: Likewise. * sysdeps/ieee754/dbl-64/s_ceil.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise. * sysdeps/ieee754/float128/s_ceilf128.c: Likewise. * sysdeps/ieee754/flt-32/s_ceilf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil): Remove macro. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil functions instead of __ceil variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
* Use rint functions not __rint functions in glibc libm.Joseph Myers2018-09-142-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __rint functions to call the corresponding rint names instead, with asm redirection to __rint when the calls are not inlined. The x86_64 math_private.h is removed as no longer useful after this patch. This patch is relative to a tree with my floor patch <https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied, and much the same considerations arise regarding possibly replacing an IFUNC call with a direct inline expansion. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_rintf.c: Likewise. * sysdeps/alpha/fpu/s_rint.c: Likewise. * sysdeps/alpha/fpu/s_rintf.c: Likewise. * sysdeps/i386/fpu/s_rintl.c: Likewise. * sysdeps/ieee754/dbl-64/s_rint.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise. * sysdeps/ieee754/float128/s_rintf128.c: Likewise. * sysdeps/ieee754/flt-32/s_rintf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise. * sysdeps/powerpc/fpu/s_rint.c: Likewise. * sysdeps/powerpc/fpu/s_rintf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_rint.c: Likewise. * sysdeps/riscv/rvf/s_rintf.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/math_private.h: Remove file. * math/e_scalb.c (invalid_fn): Use rint functions instead of __rint variants. * math/e_scalbf.c (invalid_fn): Likewise. * math/e_scalbl.c (invalid_fn): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Likewise. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise. * sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
* Use floor functions not __floor functions in glibc libm.Joseph Myers2018-09-142-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to the changes that were made to call sqrt functions directly in glibc, instead of __ieee754_sqrt variants, so that the compiler could inline them automatically without needing special inline definitions in lots of math_private.h headers, this patch makes libm code call floor functions directly instead of __floor variants, removing the inlines / macros for x86_64 (SSE4.1) and powerpc (POWER5). The redirection used to ensure that __ieee754_sqrt does still get called when the compiler doesn't inline a built-in function expansion is refactored so it can be applied to other functions; the refactoring is arranged so it's not limited to unary functions either (it would be reasonable to use this mechanism for copysign - removing the inline in math_private_calls.h but also eliminating unnecessary local PLT entry use in the cases (powerpc soft-float and e500v1, for IBM long double) where copysign calls don't get inlined). The point of this change is that more architectures can get floor calls inlined where they weren't previously (AArch64, for example), without needing special inline definitions in their math_private.h, and existing such definitions in math_private.h headers can be removed. Note that it's possible that in some cases an inline may be used where an IFUNC call was previously used - this is the case on x86_64, for example. I think the direct calls to floor are still appropriate; if there's any significant performance cost from inline SSE2 floor instead of an IFUNC call ending up with SSE4.1 floor, that indicates that either the function should be doing something else that's faster than using floor at all, or it should itself have IFUNC variants, or that the compiler choice of inlining for generic tuning should change to allow for the possibility that, by not inlining, an SSE4.1 IFUNC might be called at runtime - but not that glibc should avoid calling floor internally. (After all, all the same considerations would apply to any user program calling floor, where it might either be inlined or left as an out-of-line call allowing for a possible IFUNC.) Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT): New macro. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (floor): Likewise. * sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_floorf.c: Likewise. * sysdeps/ieee754/dbl-64/s_floor.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise. * sysdeps/ieee754/float128/s_floorf128.c: Likewise. * sysdeps/ieee754/flt-32/s_floorf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor): Remove macro. [_ARCH_PWR5X] (__floorf): Likewise. * sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove inline function. [__SSE4_1__] (__floorf): Likewise. * math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions instead of __floor variants. * math/w_lgamma_r_compat.c (__lgamma_r): Likewise. * math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise. * math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise. * math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise. * math/w_lgammal_r_compat.c (__lgammal_r): Likewise. * math/w_tgamma_compat.c (__tgamma): Likewise. * math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise. * math/w_tgammaf_compat.c (__tgammaf): Likewise. * math/w_tgammal_compat.c (__tgammal): Likewise. * sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise. * sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2): Likewise. * sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise. * sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise. * sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise. * sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise. * sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
* Improve performance of sinf and cosfWilco Dijkstra2018-08-141-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | The second patch improves performance of sinf and cosf using the same algorithms and polynomials. The returned values are identical to sincosf for the same input. ULP definitions for AArch64 and x64 are updated. sinf/cosf througput gains on Cortex-A72: * |x| < 0x1p-12 : 1.2x * |x| < M_PI_4 : 1.8x * |x| < 2 * M_PI: 1.7x * |x| < 120.0 : 2.3x * |x| < Inf : 3.0x * NEWS: Mention sinf, cosf, sincosf. * sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf. * sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of constants rather than including generic sincosf.h. * sysdeps/x86_64/fpu/s_sincosf_data.c: Remove. * sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite. * sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove. (reduced_cos): Remove. (sinf_poly): New function. * sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
* x86: Don't include <init-arch.h> in assembly codesH.J. Lu2018-08-032-2/+0
| | | | | | | | | | | | There is no need to include <init-arch.h> in assembly codes since all x86 IFUNC selector functions are written in C. Tested on i686 and x86-64. There is no code change in libc.so, ld.so and libmvec.so. * sysdeps/i386/i686/multiarch/bzero-ia32.S: Don't include <init-arch.h>. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise.
* Remove mplog and mpexpWilco Dijkstra2018-02-1510-75/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove the now unused mplog and mpexp files. * math/Makefile: Remove mpexp.c and mplog.c * sysdeps/i386/fpu/mpexp.c: Delete file. * sysdeps/i386/fpu/mplog.c: Likewise. * sysdeps/ia64/fpu/mpexp.c: Likewise. * sysdeps/ia64/fpu/mplog.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c: Remove mention of mpexp and mplog. * sysdeps/ieee754/dbl-64/mpa.h (__pow_mp): Remove unused function. * sysdeps/ieee754/dbl-64/mpexp.c: Delete file. * sysdeps/ieee754/dbl-64/mplog.c: Likewise. * sysdeps/m68k/m680x0/fpu/mpexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/mplog.c: Likewise. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove mpexp* and mplog*. * sysdeps/x86_64/fpu/multiarch/e_log-avx.c: Remove unused defines. * sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpexp-avx.c: Delete file. * sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpexp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-fma4.c: Likewise.
* Remove slow paths from expSzabolcs Nagy2018-02-127-36/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Remove the __slowexp code, so exp is no longer correctly rounded. The result is computed to about 70 bits precision so the worst case ulp error is about 0.500007 in nearest rounding mode. * manual/probes.texi: Remove slowexp probes. * math/Makefile: Remove slowexp. * sysdeps/generic/math_private.h (__slowexp): Remove. * sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Remove __slowexp and document error bounds. * sysdeps/i386/fpu/slowexp.c: Remove. * sysdeps/ia64/fpu/slowexp.c: Remove. * sysdeps/ieee754/dbl-64/slowexp.c: Remove. * sysdeps/ieee754/dbl-64/uexp.h (err_0): Remove. * sysdeps/m68k/m680x0/fpu/slowexp.c: Remove. * sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c): Remove. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowexp-fma. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Remove. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Remove. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Remove. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Remove. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Remove. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Remove.
* Remove slow paths from powWilco Dijkstra2018-02-127-40/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the slow paths from pow. Like several other double precision math functions, pow is exactly rounded. This is not required from math functions and causes major overheads as it requires multiple fallbacks using higher precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns of up to 100000x have been reported when the highest precision path triggers. All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1). The worst case error is ~0.506ULP. A simple test over a few hundred million values shows pow is 10% faster on average. This fixes BZ #13932. [BZ #13932] * sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove. * benchtests/pow-inputs: Update comment for slow path cases. * manual/probes.texi (slowpow_p10): Delete removed probe. (slowpow_p10): Likewise. * math/Makefile: Remove halfulp.c and slowpow.c. * sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1. * sysdeps/generic/math_private.h (__exp1): Remove error argument. (__halfulp): Remove. (__slowpow): Remove. * sysdeps/i386/fpu/halfulp.c: Delete file. * sysdeps/i386/fpu/slowpow.c: Likewise. * sysdeps/ia64/fpu/halfulp.c: Likewise. * sysdeps/ia64/fpu/slowpow.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument, improve comments and add error analysis. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis. (power1): Remove function: (log1): Remove error argument, add error analysis. (my_log2): Remove function. * sysdeps/ieee754/dbl-64/halfulp.c: Delete file. * sysdeps/ieee754/dbl-64/slowpow.c: Likewise. * sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise. * sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c. * sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c, slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file. * sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
* x86-64: Add sincosf with vector FMAH.J. Lu2018-01-084-2/+273
| | | | | | | | | | | | | | | | | | | | | | | Since the x86-64 assembly version of sincosf is higly optimized with vector instructions, there isn't much room for improvement. However s_sincosf.c written in C with vector math and intrinsics can be optimized by GCC with FMA. On Skylake, bench-sincosf reports performance improvement: Assembly FMA improvement max 104.042 101.008 3% min 9.426 8.586 10% mean 20.6209 18.2238 13% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_sincosf-sse2 and s_sincosf-fma. (CFLAGS-s_sincosf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sincosf.c: Likewise. * sysdeps/x86_64/fpu/s_sincosf.S: Don't add alias if __sincosf is defined.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2018-01-01152-152/+152
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* Revert exp reimplementation (causes test failures).Joseph Myers2017-12-197-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert: 2017-12-19 Joseph Myers <joseph@codesourcery.com> * sysdeps/x86_64/fpu/libm-test-ulps: Update. 2017-12-19 Patrick McGehearty <patrick.mcgehearty@oracle.com> * sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and <errno.h>. Include "eexp.tbl". (half): New constant. (one): Likewise. (__ieee754_exp): Rewrite. (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/eexp.tbl: New file. * sysdeps/ieee754/dbl-64/slowexp.c: Remove file. * sysdeps/i386/fpu/slowexp.c: Likewise. * sysdeps/ia64/fpu/slowexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise. * sysdeps/generic/math_private.h (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in comment. * sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math] (CPPFLAGS-slowexp.c): Remove variable. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove slowexp-fma, slowexp-fma4 and slowexp-avx. (CFLAGS-slowexp-fma.c): Remove variable. (CFLAGS-slowexp-fma4.c): Likewise. (CFLAGS-slowexp-avx.c): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not define as macro. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise. * math/Makefile (type-double-routines): Remove slowexp. * manual/probes.texi (slowexp_p6): Remove. (slowexp_p32): Likewise.
* Improve __ieee754_exp() performance by greater than 5x on sparc/x86.Patrick McGehearty2017-12-197-36/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These changes will be active for all platforms that don't provide their own exp() routines. They will also be active for ieee754 versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and erf. Typical performance gains is typically around 5x when measured on Sparc s7 for common values between exp(1) and exp(40). Using the glibc perf tests on sparc, sparc (nsec) x86 (nsec) old new old new max 17629 395 5173 144 min 399 54 15 13 mean 5317 200 1349 23 The extreme max times for the old (ieee754) exp are due to the multiprecision computation in the old algorithm when the true value is very near 0.5 ulp away from an value representable in double precision. The new algorithm does not take special measures for those cases. The current glibc exp perf tests overrepresent those values. Informal testing suggests approximately one in 200 cases might invoke the high cost computation. The performance advantage of the new algorithm for other values is still large but not as large as indicated by the chart above. Glibc correctness tests for exp() and expf() were run. Within the test suite 3 input values were found to cause 1 bit differences (ulp) when "FE_TONEAREST" rounding mode is set. No differences in exp() were seen for the tested values for the other rounding modes. Typical example: exp(-0x1.760cd2p+0) (-1.46113312244415283203125) new code: 2.31973271630014299393707e-01 0x1.db14cd799387ap-3 old code: 2.31973271630014271638132e-01 0x1.db14cd7993879p-3 exp = 2.31973271630014285508337 (high precision) Old delta: off by 0.49 ulp New delta: off by 0.51 ulp In addition, because ieee754_exp() is used by other routines, cexp() showed test results with very small imaginary input values where the imaginary portion of the result was off by 3 ulp when in upward rounding mode, but not in the other rounding modes. For x86, tgamma showed a few values where the ulp increased to 6 (max ulp for tgamma is 5). Sparc tgamma did not show these failures. I presume the tgamma differences are due to compiler optimization differences within the gamma function.The gamma function is known to be difficult to compute accurately. * sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and <errno.h>. Include "eexp.tbl". (half): New constant. (one): Likewise. (__ieee754_exp): Rewrite. (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/eexp.tbl: New file. * sysdeps/ieee754/dbl-64/slowexp.c: Remove file. * sysdeps/i386/fpu/slowexp.c: Likewise. * sysdeps/ia64/fpu/slowexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise. * sysdeps/generic/math_private.h (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in comment. * sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math] (CPPFLAGS-slowexp.c): Remove variable. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove slowexp-fma, slowexp-fma4 and slowexp-avx. (CFLAGS-slowexp-fma.c): Remove variable. (CFLAGS-slowexp-fma4.c): Likewise. (CFLAGS-slowexp-avx.c): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not define as macro. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise. * math/Makefile (type-double-routines): Remove slowexp. * manual/probes.texi (slowexp_p6): Remove. (slowexp_p32): Likewise.
* x86-64: Add cosf with FMAH.J. Lu2017-12-124-2/+35
| | | | | | | | | | | | | | | | | | On Skylake, bench-cosf reports performance improvement: Before After Improvement max 135.362 94.552 43% min 8.532 7.688 11% mean 17.1446 11.8128 45% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_cosf-sse2 and s_cosf-fma. (CFLAGS-s_cosf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/s_cosf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/s_cosf-sse2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_cosf.c: Likewise. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86-64: Add sinf with FMAH.J. Lu2017-12-074-1/+36
| | | | | | | | | | | | | | | | On Skylake, bench-sinf reports performance improvement: Before After Improvement max 153.996 100.094 54% min 8.546 6.852 25% mean 18.1223 11.802 54% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_sinf-sse2 and s_sinf-fma. (CFLAGS-s_sinf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/s_sinf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/s_sinf-sse2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sinf.c: Likewise.
* Use libm_alias_float for x86_64.Joseph Myers2017-11-2911-11/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the preparation for additional _FloatN / _FloatNx function aliases, this patch makes x86_64 libm function implementations use libm_alias_float to define function aliases, or libm_alias_float_other where the main name is defined with versioned_symbol. Tested with the glibc testsuite for x86_64, and tested with build-many-glibcs.py for all its x86_64 configurations that installed stripped shared libraries are unchanged by the patch. * sysdeps/x86_64/fpu/multiarch/e_exp2f.c: Include <libm-alias-float.h>. (exp2f): Define using libm_alias_float, or libm_alias_float_other if [SHARED]. * sysdeps/x86_64/fpu/multiarch/e_expf.c: Include <libm-alias-float.h>. (exp2f): Define using libm_alias_float, or libm_alias_float_other if [SHARED]. * sysdeps/x86_64/fpu/multiarch/e_log2f.c: Include <libm-alias-float.h>. (exp2f): Define using libm_alias_float, or libm_alias_float_other if [SHARED]. * sysdeps/x86_64/fpu/multiarch/e_logf.c: Include <libm-alias-float.h>. (exp2f): Define using libm_alias_float, or libm_alias_float_other if [SHARED]. * sysdeps/x86_64/fpu/multiarch/e_powf.c: Include <libm-alias-float.h>. (exp2f): Define using libm_alias_float, or libm_alias_float_other if [SHARED]. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Include <libm-alias-float.h>. (ceilf): Define using libm_alias_float. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Include <libm-alias-float.h>. (floorf): Define using libm_alias_float. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Include <libm-alias-float.h>. (fmaf): Define using libm_alias_float. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.c: Include <libm-alias-float.h>. (nearbyintf): Define using libm_alias_float. * sysdeps/x86_64/fpu/multiarch/s_rintf.c: Include <libm-alias-float.h>. (rintf): Define using libm_alias_float. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Include <libm-alias-float.h>. (truncf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_copysignf.S: Include <libm-alias-float.h>. (copysignf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_cosf.S: Include <libm-alias-float.h>. (cosf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_fabsf.c: Include <libm-alias-float.h>. (fabsf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_fmaxf.S: Include <libm-alias-float.h>. (fmaxf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_fminf.S: Include <libm-alias-float.h>. (fminf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_llrintf.S: Include <libm-alias-float.h>. (llrintf): Define using libm_alias_float. [!__ILP32__] (lrintf): Likewise. * sysdeps/x86_64/fpu/s_sincosf.S: Include <libm-alias-float.h>. (sincosf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_sinf.S: Include <libm-alias-float.h>. (sinf): Define using libm_alias_float. * sysdeps/x86_64/x32/fpu/s_lrintf.S: Include <libm-alias-float.h>. (lrintf): Define using libm_alias_float.