| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86-64 memset-vec-unaligned-erms.S aligns many jump targets, which
increases code sizes, but not necessarily improve performance. As
memset benchtest data of align vs no align on various Intel and AMD
processors
https://sourceware.org/bugzilla/attachment.cgi?id=9277
shows that aligning jump targets isn't necessary.
[BZ #20115]
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__memset):
Remove alignments on jump targets.
|
|
|
|
|
|
|
|
|
|
| |
In static executable, since init_cpu_features is called early from
__libc_start_main, there is no need to call it again in dl_platform_init.
[BZ #20072]
* sysdeps/i386/dl-machine.h (dl_platform_init): Call
init_cpu_features only if SHARED is defined.
* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Merge x86 ifunc-defines.sym with x86 cpu-features-offsets.sym. Remove
x86 ifunc-defines.sym and rtld-global-offsets.sym. No code changes on
i686 and x86-64.
* sysdeps/i386/i686/multiarch/Makefile (gen-as-const-headers):
Remove ifunc-defines.sym.
* sysdeps/x86_64/multiarch/Makefile (gen-as-const-headers):
Likewise.
* sysdeps/i386/i686/multiarch/ifunc-defines.sym: Removed.
* sysdeps/x86/rtld-global-offsets.sym: Likewise.
* sysdeps/x86_64/multiarch/ifunc-defines.sym: Likewise.
* sysdeps/x86/Makefile (gen-as-const-headers): Remove
rtld-global-offsets.sym.
* sysdeps/x86_64/multiarch/ifunc-defines.sym: Merged with ...
* sysdeps/x86/cpu-features-offsets.sym: This.
* sysdeps/x86/cpu-features.h: Include <cpu-features-offsets.h>
instead of <ifunc-defines.h> and <rtld-global-offsets.h>.
|
|
|
|
|
|
|
|
|
|
| |
Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86. No code changes on x86
and x86_64.
* sysdeps/i386/cacheinfo.c: Include <sysdeps/x86/cacheinfo.c>
instead of <sysdeps/x86_64/cacheinfo.c>.
* sysdeps/x86_64/cacheinfo.c: Moved to ...
* sysdeps/x86/cacheinfo.c: Here.
|
|
|
|
|
|
|
|
|
| |
Since x86-64 doesn't use memory copy functions, add dummy memcopy.h and
wordcopy.c to reduce code size. It reduces the size of libc.so by about
1 KB.
* sysdeps/x86_64/memcopy.h: New file.
* sysdeps/x86_64/wordcopy.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones,
we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with
the new ones.
No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used
before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2
memcpy/memmove optimized with Enhanced REP MOVSB will be used for
processors with ERMS. The new AVX512 memcpy/memmove will be used for
processors with AVX512 which prefer vzeroupper.
Since the new SSE2 memcpy/memmove are faster than the previous default
memcpy/memmove used in libc.a and ld.so, we also remove the previous
default memcpy/memmove and make them the default memcpy/memmove, except
that non-temporal store isn't used in ld.so.
Together, it reduces the size of libc.so by about 6 KB and the size of
ld.so by about 2 KB.
[BZ #19776]
* sysdeps/x86_64/memcpy.S: Make it dummy.
* sysdeps/x86_64/mempcpy.S: Likewise.
* sysdeps/x86_64/memmove.S: New file.
* sysdeps/x86_64/memmove_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memmove.S: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.S: Likewise.
* sysdeps/x86_64/memmove.c: Removed.
* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
memcpy-sse2-unaligned, memmove-avx-unaligned,
memcpy-avx-unaligned and memmove-sse2-unaligned-erms.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Replace
__memmove_chk_avx512_unaligned_2 with
__memmove_chk_avx512_unaligned. Remove
__memmove_chk_avx_unaligned_2. Replace
__memmove_chk_sse2_unaligned_2 with
__memmove_chk_sse2_unaligned. Remove __memmove_chk_sse2 and
__memmove_avx_unaligned_2. Replace __memmove_avx512_unaligned_2
with __memmove_avx512_unaligned. Replace
__memmove_sse2_unaligned_2 with __memmove_sse2_unaligned.
Remove __memmove_sse2. Replace __memcpy_chk_avx512_unaligned_2
with __memcpy_chk_avx512_unaligned. Remove
__memcpy_chk_avx_unaligned_2. Replace
__memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned.
Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_2.
Replace __memcpy_avx512_unaligned_2 with
__memcpy_avx512_unaligned. Remove __memcpy_sse2_unaligned_2
and __memcpy_sse2. Replace __mempcpy_chk_avx512_unaligned_2
with __mempcpy_chk_avx512_unaligned. Remove
__mempcpy_chk_avx_unaligned_2. Replace
__mempcpy_chk_sse2_unaligned_2 with
__mempcpy_chk_sse2_unaligned. Remove __mempcpy_chk_sse2.
Replace __mempcpy_avx512_unaligned_2 with
__mempcpy_avx512_unaligned. Remove __mempcpy_avx_unaligned_2.
Replace __mempcpy_sse2_unaligned_2 with
__mempcpy_sse2_unaligned. Remove __mempcpy_sse2.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support
__memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned.
Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms
if processor has ERMS. Default to __memcpy_sse2_unaligned.
(ENTRY): Removed.
(END): Likewise.
(ENTRY_CHK): Likewise.
(libc_hidden_builtin_def): Likewise.
Don't include ../memcpy.S.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support
__memcpy_chk_avx512_unaligned_erms and
__memcpy_chk_avx512_unaligned. Use
__memcpy_chk_avx_unaligned_erms and
__memcpy_chk_sse2_unaligned_erms if if processor has ERMS.
Default to __memcpy_chk_sse2_unaligned.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
Change function suffix from unaligned_2 to unaligned.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support
__mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned.
Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms
if processor has ERMS. Default to __mempcpy_sse2_unaligned.
(ENTRY): Removed.
(END): Likewise.
(ENTRY_CHK): Likewise.
(libc_hidden_builtin_def): Likewise.
Don't include ../mempcpy.S.
(mempcpy): New. Add a weak alias.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support
__mempcpy_chk_avx512_unaligned_erms and
__mempcpy_chk_avx512_unaligned. Use
__mempcpy_chk_avx_unaligned_erms and
__mempcpy_chk_sse2_unaligned_erms if if processor has ERMS.
Default to __mempcpy_chk_sse2_unaligned.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the new SSE2/AVX2 memsets are faster than the previous ones, we
can remove the previous SSE2/AVX2 memsets and replace them with the
new ones. This reduces the size of libc.so by about 900 bytes.
No change in IFUNC selection if SSE2 and AVX2 memsets weren't used
before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset
optimized with Enhanced REP STOSB will be used for processors with
ERMS. The new AVX512 memset will be used for processors with AVX512
which prefer vzeroupper.
[BZ #19881]
* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded
into ...
* sysdeps/x86_64/memset.S: This.
(__bzero): Removed.
(__memset_tail): Likewise.
(__memset_chk): Likewise.
(memset): Likewise.
(MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't
defined.
(MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined.
* sysdeps/x86_64/multiarch/memset-avx2.S: Removed.
(__memset_zero_constant_len_parameter): Check SHARED instead of
PIC.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
memset-avx2 and memset-sse2-unaligned-erms.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Remove __memset_chk_sse2,
__memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
(__bzero): Enabled.
* sysdeps/x86_64/multiarch/memset.S (memset): Replace
__memset_sse2 and __memset_avx2 with __memset_sse2_unaligned
and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms
or __memset_avx2_unaligned_erms if processor has ERMS. Support
__memset_avx512_unaligned_erms and __memset_avx512_unaligned.
(memset): Removed.
(__memset_chk): Likewise.
(MEMSET_SYMBOL): New.
(libc_hidden_builtin_def): Replace __memset_sse2 with
__memset_sse2_unaligned.
* sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace
__memset_chk_sse2 and __memset_chk_avx2 with
__memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms.
Use __memset_chk_sse2_unaligned_erms or
__memset_chk_avx2_unaligned_erms if processor has ERMS. Support
__memset_chk_avx512_unaligned_erms and
__memset_chk_avx512_unaligned.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The large memcpy micro benchmark in glibc shows that there is a
regression with large data on Haswell machine. non-temporal store in
memcpy on large data can improve performance significantly. This
patch adds a threshold to use non temporal store which is 6 times of
shared cache size. When size is above the threshold, non temporal
store will be used, but avoid non-temporal store if there is overlap
between destination and source since destination may be in cache when
source is loaded.
For size below 8 vector register width, we load all data into registers
and store them together. Only forward and backward loops, which move 4
vector registers at a time, are used to support overlapping addresses.
For forward loop, we load the last 4 vector register width of data and
the first vector register width of data into vector registers before the
loop and store them after the loop. For backward loop, we load the first
4 vector register width of data and the last vector register width of
data into vector registers before the loop and store them after the loop.
[BZ #19928]
* sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold):
New.
(init_cacheinfo): Set __x86_shared_non_temporal_threshold to
6 times of shared cache size.
* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S
(VMOVNT): New.
* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S
(VMOVNT): Likewise.
* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S
(VMOVNT): Likewise.
(VMOVU): Changed to movups for smaller code sizes.
(VMOVA): Changed to movaps for smaller code sizes.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update
comments.
(PREFETCH): New.
(PREFETCH_SIZE): Likewise.
(PREFETCHED_LOAD_SIZE): Likewise.
(PREFETCH_ONE_SET): Likewise.
Rewrite to use forward and backward loops, which move 4 vector
registers at a time, to support overlapping addresses and use
non temporal store if size is above the threshold and there is
no overlap between destination and source.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prepare memmove-vec-unaligned-erms.S to make the SSE2 version as the
default memcpy, mempcpy and memmove.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
(MEMCPY_SYMBOL): New.
(MEMPCPY_SYMBOL): Likewise.
(MEMMOVE_CHK_SYMBOL): Likewise.
Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk
symbols. Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on
__mempcpy symbols. Provide alias for __memcpy_chk in libc.a.
Provide alias for memcpy in libc.a and ld.so.
(cherry picked from commit a7d1c51482d15ab6c07e2ee0ae5e007067b18bfb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prepare memset-vec-unaligned-erms.S to make the SSE2 version as the
default memset.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
(MEMSET_CHK_SYMBOL): New. Define if not defined.
(__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH.
Disabled fro now.
Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk
symbols. Properly check USE_MULTIARCH on __memset symbols.
(cherry picked from commit 4af1bb06c59d24f35bf8dc55897838d926c05892)
|
|
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force
32-bit displacement to avoid long nop between instructions.
(cherry picked from commit ec0cac9a1f4094bd0db6f77c1b329e7a40eecc10)
|
|
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add
a comment on VMOVU and VMOVA.
(cherry picked from commit 696ac774847b80cf994438739478b0c3003b5958)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX
and AVX512 memmove and memset in ld.so.
* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip
if not in libc.
* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
Likewise.
(cherry picked from commit 5cd7af016d8587ff53b20ba259746f97edbddbf7)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
__mempcpy_erms and __memmove_erms can't be placed between __memmove_chk
and __memmove it breaks __memmove_chk.
Don't check source == destination first since it is less common.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
(__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk
with unaligned_erms.
(__memmove_erms): Skip if source == destination.
(__memmove_unaligned_erms): Don't check source == destination
first.
(cherry picked from commit ea2785e96fa503f3a2b5dd9f3a6ca65622b3c5f2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement x86-64 memset with unaligned store and rep movsb. Support
16-byte, 32-byte and 64-byte vector register sizes. A single file
provides 2 implementations of memset, one with rep stosb and the other
without rep stosb. They share the same codes when size is between 2
times of vector register size and REP_STOSB_THRESHOLD which defaults
to 2KB.
Key features:
1. Use overlapping store to avoid branch.
2. For size <= 4 times of vector register size, fully unroll the loop.
3. For size > 4 times of vector register size, store 4 times of vector
register size at a time.
[BZ #19881]
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and
memset-avx512-unaligned-erms.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned,
__memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned,
__memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned,
__memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned,
__memset_sse2_unaligned_erms, __memset_erms,
__memset_avx2_unaligned, __memset_avx2_unaligned_erms,
__memset_avx512_unaligned_erms and __memset_avx512_unaligned.
* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New
file.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:
Likewise.
(cherry picked from commit 830566307f038387ca0af3fd327706a8d1a2f595)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement x86-64 memmove with unaligned load/store and rep movsb.
Support 16-byte, 32-byte and 64-byte vector register sizes. When
size <= 8 times of vector register size, there is no check for
address overlap bewteen source and destination. Since overhead for
overlap check is small when size > 8 times of vector register size,
memcpy is an alias of memmove.
A single file provides 2 implementations of memmove, one with rep movsb
and the other without rep movsb. They share the same codes when size is
between 2 times of vector register size and REP_MOVSB_THRESHOLD which
is 2KB for 16-byte vector register size and scaled up by large vector
register size.
Key features:
1. Use overlapping load and store to avoid branch.
2. For size <= 8 times of vector register size, load all sources into
registers and store them together.
3. If there is no address overlap bewteen source and destination, copy
from both ends with 4 times of vector register size at a time.
4. If address of destination > address of source, backward copy 8 times
of vector register size at a time.
5. Otherwise, forward copy 8 times of vector register size at a time.
6. Use rep movsb only for forward copy. Avoid slow backward rep movsb
by fallbacking to backward copy 8 times of vector register size at a
time.
7. Skip when address of destination == address of source.
[BZ #19776]
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
memmove-avx512-unaligned-erms.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Test
__memmove_chk_avx512_unaligned_2,
__memmove_chk_avx512_unaligned_erms,
__memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
__memmove_chk_sse2_unaligned_2,
__memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
__memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
__memmove_avx512_unaligned_erms, __memmove_erms,
__memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
__memcpy_chk_avx512_unaligned_2,
__memcpy_chk_avx512_unaligned_erms,
__memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
__memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
__memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
__memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
__memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
__memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
__mempcpy_chk_avx512_unaligned_erms,
__mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
__mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
__mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
__mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
__mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
__mempcpy_erms.
* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
file.
* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
Likwise.
* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
Likwise.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
Likwise.
(cherry picked from commit 88b57b8ed41d5ecf2e1bdfc19556f9246a665ebb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make
__memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper
to reduce code size of libc.so.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
memcpy-avx512-no-vzeroupper.
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed
to ...
* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This.
(MEMCPY): Don't define.
(MEMCPY_CHK): Likewise.
(MEMPCPY): Likewise.
(MEMPCPY_CHK): Likewise.
(MEMPCPY_CHK): Renamed to ...
(__mempcpy_chk_avx512_no_vzeroupper): This.
(MEMPCPY_CHK): Renamed to ...
(__mempcpy_chk_avx512_no_vzeroupper): This.
(MEMCPY_CHK): Renamed to ...
(__memmove_chk_avx512_no_vzeroupper): This.
(MEMCPY): Renamed to ...
(__memmove_avx512_no_vzeroupper): This.
(__memcpy_avx512_no_vzeroupper): New alias.
(__memcpy_chk_avx512_no_vzeroupper): Likewise.
(cherry picked from commit 064f01b10b57ff09cda7025f484b848c38ddd57a)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement x86-64 multiarch mempcpy in memcpy to share most of code. It
reduces code size of libc.so.
[BZ #18858]
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned
and mempcpy-avx512-no-vzeroupper.
* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK):
New.
(MEMPCPY): Likewise.
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S
(MEMPCPY_CHK): New.
(MEMPCPY): Likewise.
* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New.
(MEMPCPY): Likewise.
* sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New.
(MEMPCPY): Likewise.
* sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed.
* sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S:
Likewise.
* sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise.
(cherry picked from commit c365e615f7429aee302f8af7bf07ae262278febb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On AMD processors, memcpy optimized with unaligned SSE load is
slower than emcpy optimized with aligned SSSE3 while other string
functions are faster with unaligned SSE load. A feature bit,
Fast_Unaligned_Copy, is added to select memcpy optimized with
unaligned SSE load.
[BZ #19583]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel
processors. Set Fast_Copy_Backward for AMD Excavator
processors.
* sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy):
New.
(index_arch_Fast_Unaligned_Copy): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check
Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
(cherry picked from commit e41b395523040fcb58c7d378475720c2836d280c)
|
|
|
|
|
|
|
|
| |
[BZ# 19860]
* sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return
zero if the compiler does not provide the AVX512F bit.
(cherry picked from commit f327f5b47be57bc05a4077344b381016c1bb2c11)
|
|
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY):
Don't set %rcx twice before "rep movsb".
(cherry picked from commit 3c9a4cd16cbc7b79094fec68add2df66061ab5d7)
|
|
|
|
| |
(cherry picked from commit 3bd80c0de2f8e7ca8020d37739339636d169957e)
|
|
|
|
|
|
|
| |
This ensures that GCC will not use unsupported instructions before
the run-time check to ensure support.
(cherry picked from commit 3c0f7407eedb524c9114bb675cd55b903c71daaa)
|
|
|
|
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
Replace .text with .text.avx512.
* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
Likewise.
(cherry picked from commit fee9eb6200f0e44a4b684903bc47fde36d46f1a5)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for
Fast_Copy_Backward to enable __memcpy_ssse3_back. Existing selection
order is updated with following selection order:
1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set.
2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set.
3. __memcpy_sse2 if SSSE3 isn't available.
4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set.
5. __memcpy_ssse3
[BZ #18880]
* sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load,
instead of Slow_BSF, and also check for Fast_Copy_Backward to
enable __memcpy_ssse3_back.
(cherry picked from commit 14a1d7cc4c4fd5ee8e4e66b777221dd32a84efe8)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to GCC bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
__tls_get_addr may be called with 8-byte stack alignment. Although
this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
that stack will be always aligned at 16 bytes. Since SSE optimized
memory/string functions with aligned SSE register load and store are
used in the dynamic linker, we must set DL_RUNTIME_UNALIGNED_VEC_SIZE
to 8 so that _dl_runtime_resolve_sse will align the stack before
calling _dl_fixup:
Dump of assembler code for function _dl_runtime_resolve_sse:
0x00007ffff7deea90 <+0>: push %rbx
0x00007ffff7deea91 <+1>: mov %rsp,%rbx
0x00007ffff7deea94 <+4>: and $0xfffffffffffffff0,%rsp
^^^^^^^^^^^ Align stack to 16 bytes
0x00007ffff7deea98 <+8>: sub $0x100,%rsp
0x00007ffff7deea9f <+15>: mov %rax,0xc0(%rsp)
0x00007ffff7deeaa7 <+23>: mov %rcx,0xc8(%rsp)
0x00007ffff7deeaaf <+31>: mov %rdx,0xd0(%rsp)
0x00007ffff7deeab7 <+39>: mov %rsi,0xd8(%rsp)
0x00007ffff7deeabf <+47>: mov %rdi,0xe0(%rsp)
0x00007ffff7deeac7 <+55>: mov %r8,0xe8(%rsp)
0x00007ffff7deeacf <+63>: mov %r9,0xf0(%rsp)
0x00007ffff7deead7 <+71>: movaps %xmm0,(%rsp)
0x00007ffff7deeadb <+75>: movaps %xmm1,0x10(%rsp)
0x00007ffff7deeae0 <+80>: movaps %xmm2,0x20(%rsp)
0x00007ffff7deeae5 <+85>: movaps %xmm3,0x30(%rsp)
0x00007ffff7deeaea <+90>: movaps %xmm4,0x40(%rsp)
0x00007ffff7deeaef <+95>: movaps %xmm5,0x50(%rsp)
0x00007ffff7deeaf4 <+100>: movaps %xmm6,0x60(%rsp)
0x00007ffff7deeaf9 <+105>: movaps %xmm7,0x70(%rsp)
[BZ #19679]
* sysdeps/x86_64/dl-trampoline.S (DL_RUNIME_UNALIGNED_VEC_SIZE):
Renamed to ...
(DL_RUNTIME_UNALIGNED_VEC_SIZE): This. Set to 8.
(DL_RUNIME_RESOLVE_REALIGN_STACK): Renamed to ...
(DL_RUNTIME_RESOLVE_REALIGN_STACK): This. Updated.
(DL_RUNIME_RESOLVE_REALIGN_STACK): Renamed to ...
(DL_RUNTIME_RESOLVE_REALIGN_STACK): This.
* sysdeps/x86_64/dl-trampoline.h
(DL_RUNIME_RESOLVE_REALIGN_STACK): Renamed to ...
(DL_RUNTIME_RESOLVE_REALIGN_STACK): This.
|
|
|
|
|
|
|
|
|
| |
Since libmvec_nonshared.a may be linked into shared objects, ALIAS_IMPL
should use PIC relocation.
[BZ #19590]
* sysdeps/x86_64/fpu/svml_finite_alias.S (ALIAS_IMPL): Use PIC
relocation.
|
|
|
|
|
|
|
|
|
| |
[BZ #19490]
* sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S (pthread_cond_broadcast): Use ENTRY/END
* sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S (pthread_cond_signal): Likewise
* sysdeps/x86_64/nptl/pthread_spin_lock.S (pthread_spin_lock): Likewise
* sysdeps/x86_64/nptl/pthread_spin_trylock.S (pthread_spin_trylock): Likewise
* sysdeps/x86_64/nptl/pthread_spin_unlock.S (pthread_spin_unlock): Likewise
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Fixed build with
assembler not supporting AVX-512.
|
|
|
|
| |
* sysdeps/x86_64/multiarch/memcpy_chk.S: Fixed typos.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added AVX512 implementations of memcpy, mempcpy, memmove, memcpy_chk,
mempcpy_chk, memmove_chk.
It shows average improvement more than 30% over AVX versions on KNL
hardware (performance results in the thread
<https://sourceware.org/ml/libc-alpha/2016-01/msg00258.html>).
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new files.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: New file.
* sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy.S: Added new IFUNC branch.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It shows improvement up to 28% over AVX2 memset (performance results
attached at <https://sourceware.org/ml/libc-alpha/2015-12/msg00052.html>).
* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: New file.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new file.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
* sysdeps/x86_64/multiarch/memset.S: Added new IFUNC branch.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86/cpu-features.h (bit_Prefer_No_VZEROUPPER,
index_Prefer_No_VZEROUPPER): New.
* sysdeps/x86/cpu-features.c (init_cpu_features): Set the
Prefer_No_VZEROUPPER for Knights Landing.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Old workaround based on assembly aliases can lead to link fail (bug 19058).
This patch makes workaround in another way to avoid it.
[BZ #19058]
* math/Makefile ($(inst_libdir)/libm.so): Added libmvec_nonshared.a
to AS_NEEDED.
* sysdeps/x86/fpu/bits/math-vector.h: Removed code with old workaround.
* sysdeps/x86_64/fpu/Makefile (libmvec-support,
libmvec-static-only-routines): Added new file.
* sysdeps/x86_64/fpu/svml_finite_alias.S: New file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the -ffinite-math-only versions of various x86_64 and x86 log*
functions, a zero result from log* (1) is returned with incorrect sign
in round-downward mode. This patch fixes this in a similar way to the
previous fixes for the non-*_finite versions of the functions.
Tested for x86_64 and x86 (including an i586 build), together with a
patch that will be applied separately to enable the main libm-test.inc
tests for the finite-math-only functions.
[BZ #19213]
* sysdeps/i386/fpu/e_log.S (__log_finite): Ensure +0 is always
returned for argument 1.
* sysdeps/i386/fpu/e_logf.S (__logf_finite): Likewise.
* sysdeps/i386/fpu/e_logl.S (__logl_finite): Likewise.
* sysdeps/i386/i686/fpu/e_logl.S (__logl_finite): Likewise.
* sysdeps/x86_64/fpu/e_log10l.S (__log10l_finite): Likewise.
* sysdeps/x86_64/fpu/e_log2l.S (__log2l_finite): Likewise.
* sysdeps/x86_64/fpu/e_logl.S (__logl_finite): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes miscellaneous __GNUC_PREREQ (4, 7) conditionals
that are now dead.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).
* sysdeps/arm/atomic-machine.h
[__GNUC_PREREQ (4, 7) && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4]:
Change conditional to [__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4].
[__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 && !__GNUC_PREREQ (4, 7)]:
Remove conditional code.
[!__GNUC_PREREQ (4, 7) || !__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4]:
Change conditional to [!__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4].
* sysdeps/i386/sysdep.h [__ASSEMBLER__ && __GNUC_PREREQ (4, 7)]:
Change conditional to [__ASSEMBLER__].
[__ASSEMBLER__ && !__GNUC_PREREQ (4, 7)]: Remove conditional code.
[!__ASSEMBLER__ && __GNUC_PREREQ (4, 7)]: Change conditional to
[!__ASSEMBLER__].
[!__ASSEMBLER__ && !__GNUC_PREREQ (4, 7)]: Remove conditional
code.
* sysdeps/unix/sysv/linux/sh/atomic-machine.h (rNOSP): Remove
conditional macro definitions.
(__arch_compare_and_exchange_val_8_acq): Use "u" instead of rNOSP.
(__arch_compare_and_exchange_val_16_acq): Likewise.
(__arch_compare_and_exchange_val_32_acq): Likewise.
(atomic_exchange_and_add): Likewise.
(atomic_add): Likewise.
(atomic_add_negative): Likewise.
(atomic_add_zero): Likewise.
(atomic_bit_set): Likewise.
(atomic_bit_test_set): Likewise.
* sysdeps/x86_64/atomic-machine.h [__GNUC_PREREQ (4, 7)]: Make
code unconditional.
[!__GNUC_PREREQ (4, 7)]: Remove conditional code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are configure tests for the cpuid.h header for x86 / x86_64.
GCC 4.3 and later install this header, so those tests are obsolete.
This patch removes them.
Tested for x86_64 and x86 (testsuite, and that installed shared
libraries are unchanged by the patch).
* sysdeps/i386/configure.ac (cpuid.h): Do not test for header.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure.ac (cpuid.h): Do not test for header.
* sysdeps/x86_64/configure: Regenerated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fenv_t should include architecture-specific floating-point modes and
status flags. i386 and x86_64 fesetenv limit which bits they use from
the x87 status and control words, when using saved state, and limit
which parts of the state they set to fixed values, when using
FE_DFL_ENV / FE_NOMASK_ENV. The following should be included but are
excluded in at least some cases: status and masking for the "denormal
operand" exception (which isn't part of FE_ALL_EXCEPT); precision
control (explicitly mentioned in Annex F as something that counts as
part of the floating-point environment); MXCSR FZ and DAZ bits (for
FE_DFL_ENV and FE_NOMASK_ENV). This patch arranges for this extra
state to be handled by fesetenv (and thereby by feupdateenv, which
calls fesetenv).
(Note that glibc functions using floating point are not generally
expected to work correctly with non-default values of this state,
especially precision control, but it is still logically part of the
floating-point environment and should be handled as such by fesetenv.
Changes to the state relating to subnormals ought generally to work
with libm functions when the arguments aren't subnormal and neither
are the expected results; that's a consequence of functions avoiding
spurious internal underflows.)
A question arising from this is whether FE_NOMASK_ENV should or should
not mask the "denormal operand" exception. I decided it should mask
that exception. This is the status quo - previously that exception
could only be unmasked by direct manipulation of control registers
(possibly via <fpu_control.h>). In addition, it means that use of
FE_NOMASK_ENV leaves a floating-point environment the same as could be
obtained by fesetenv (FE_DFL_ENV); feenableexcept (FE_ALL_EXCEPT);,
rather than an environment in which an exception is unmasked that
could only be masked again by using fesetenv with FE_DFL_ENV (or a
previously saved environment) - this exception not being usable with
other <fenv.h> functions because it's outside FE_ALL_EXCEPT.
Tested for x86_64 and x86.
[BZ #16068]
* sysdeps/i386/fpu/fesetenv.c: Include <fpu_control.h>.
(FE_ALL_EXCEPT_X86): New macro.
(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
FE_ALL_EXCEPT. Ensure precision control is included in
floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV
handle "denormal operand exception" and clear FZ and DAZ bits.
* sysdeps/x86_64/fpu/fesetenv.c: Include <fpu_control.h>.
(FE_ALL_EXCEPT_X86): New macro.
(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
FE_ALL_EXCEPT. Ensure precision control is included in
floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV
handle "denormal operand exception" and clear FZ and DAZ bits.
* sysdeps/x86/fpu/test-fenv-sse-2.c: New file.
* sysdeps/x86/fpu/test-fenv-x87.c: Likewise.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-x87 and test-fenv-sse-2.
[$(subdir) = math] (CFLAGS-test-fenv-sse-2.c): New variable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The i386 and x86_64 versions of fesetenv, when called with FE_DFL_ENV
or FE_NOMASK_ENV as argument, do not clear SSE exceptions raised in
MXCSR. These arguments should, like other fenv_t values, represent
the whole of the floating-point state, so such exceptions should be
cleared; this patch adds the required clearing. (Discovered while
working on bug 16068.)
Tested for x86_64 and x86.
[BZ #19181]
* sysdeps/i386/fpu/fesetenv.c (__fesetenv): Clear already-raised
SSE exceptions when argument is FE_DFL_ENV or FE_NOMASK_ENV.
* sysdeps/x86_64/fpu/fesetenv.c (__fesetenv): Likewise.
* math/test-fenv-clear-main.c: New file.
* math/test-fenv-clear.c: Likewise.
* math/Makefile (tests): Add test-fenv-clear.
* sysdeps/x86/fpu/test-fenv-clear-sse.c: New file.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-clear-sse.
[$(subdir) = math] (CFLAGS-test-fenv-clear-sse.c): New variable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are configure tests for the -mavx2 compiler option. AVX2
support was added in GCC 4.7, so these tests are now obsolete; this
patch removes them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).
* sysdeps/i386/configure.ac (libc_cv_cc_avx2): Remove configure
test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure.ac (libc_cv_cc_avx2): Remove configure
test.
* sysdeps/x86_64/configure: Regenerated.
* config.h.in (HAVE_AVX2_SUPPORT): Remove #undef.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memset-avx2 unconditionally instead of conditionally on
[$(config-cflags-avx2) = yes].
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list) [HAVE_AVX2_SUPPORT]: Make code
unconditional.
* sysdeps/x86_64/multiarch/memset.S [HAVE_AVX2_SUPPORT]: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S
[IS_IN (libc) && SHARED && HAVE_AVX2_SUPPORT]: Change conditional
to [IS_IN (libc) && SHARED].
|
|
|
|
| |
This comes from running “make regen-ulps” on AMD Opteron 6272 CPUs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves the libm test coverage for a few more functions.
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of log, log10, log1p and
log2.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (MAX_EXP): New macro.
(ilogb_test_data): Add more tests.
(isfinite_test_data): Likewise.
(isgreater_test_data): Likewise.
(isgreaterequal_test_data): Likewise.
(isinf_test_data): Likewise.
(isless_test_data): Likewise.
(islessequal_test_data): Likewise.
(islessgreater_test_data): Likewise.
(isnan_test_data): Likewise.
(isnormal_test_data): Likewise.
(issignaling_test_data): Likewise.
(isunordered_test_data): Likewise.
(j0_test_data): Likewise.
(j1_test_data): Likewise.
(jn_test_data): Likewise.
(lgamma_test_data): Likewise.
(log_test_data): Likewise.
(log10_test_data): Likewise.
(log1p_test_data): Likewise.
(log2_test_data): Likewise.
(logb_test_data): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The implementations of nearbyint functions using x87 floating point
(i386 all versions, x86_64 long double only) use the fclex
instruction, which clears any exceptions that were raised before the
function was called. These functions must not clear exceptions that
were raised before they were called.
This patch fixes these functions to save and restore the whole
floating-point environment (fnstenv / fldenv) as the way of avoiding
raising "inexact" (recall that there isn't an x87 instruction for
loading just the status word, so the whole environment has to be saved
and loaded instead - the code already saved and loaded the control
word, which is now obtained from the saved environment after this
patch, to disable traps on "inexact"). In the case of the long double
functions, any "invalid" exception from frndint (applied to a
signaling NaN) needs merging into the saved state; this issue doesn't
apply to the float and double functions because that exception would
have been raised when the argument is loaded, before the environment
is saved.
[BZ #15491]
* sysdeps/i386/fpu/s_nearbyint.S (__nearbyint): Save and restore
floating-point environment instead of clearing all exceptions.
* sysdeps/i386/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sysdeps/i386/fpu/s_nearbyintl.S (__nearbyintl): Likewise,
merging in "invalid" exceptions from frndint.
* sysdeps/x86_64/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
* math/test-nearbyint-except.c: New file.
* math/Makefile (tests): Add test-nearbyint-except.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mostly automatically-generated patch converts 231 sysdeps
function definitions in glibc from old-style K&R to prototype-style.
For __aio_sigqueue and __gai_sigqueue I had to add internal_function
to the definitions as noted by Florian in
<https://sourceware.org/ml/libc-alpha/2015-10/msg00595.html> to keep
the functions compiling on x86 after conversion to prototype
definitions. Otherwise, the patch is automatically generated with all
the same exclusions and caveats as in
<https://sourceware.org/ml/libc-alpha/2015-10/msg00594.html> except
that it's a patch for sysdeps files.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch). Also tested for arm,
mips64 and powerpc32 that installed stripped shared libraries are
unchanged by the patch.
* sysdeps/arm/backtrace.c (__backtrace): Convert to
prototype-style function definition.
* sysdeps/i386/backtrace.c (__backtrace): Likewise.
* sysdeps/i386/ffs.c (__ffs): Likewise.
* sysdeps/i386/i686/ffs.c (__ffs): Likewise.
* sysdeps/ia64/nptl/pthread_spin_lock.c (pthread_spin_lock):
Likewise.
* sysdeps/ia64/nptl/pthread_spin_trylock.c (pthread_spin_trylock):
Likewise.
* sysdeps/ieee754/ldbl-128/e_log2l.c (__ieee754_log2l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_log2l.c (__ieee754_log2l):
Likewise.
* sysdeps/m68k/ffs.c (__ffs): Likewise.
* sysdeps/m68k/m680x0/fpu/e_acos.c (FUNC): Likewise.
* sysdeps/m68k/m680x0/fpu/e_fmod.c (FUNC): Likewise.
* sysdeps/mach/adjtime.c (__adjtime): Likewise.
* sysdeps/mach/gettimeofday.c (__gettimeofday): Likewise.
* sysdeps/mach/hurd/_exit.c (_exit): Likewise.
* sysdeps/mach/hurd/access.c (__access): Likewise.
* sysdeps/mach/hurd/adjtime.c (__adjtime): Likewise.
* sysdeps/mach/hurd/chdir.c (__chdir): Likewise.
* sysdeps/mach/hurd/chmod.c (__chmod): Likewise.
* sysdeps/mach/hurd/chown.c (__chown): Likewise.
* sysdeps/mach/hurd/cthreads.c (cthread_keycreate): Likewise.
(cthread_getspecific): Likewise.
(cthread_setspecific): Likewise.
(__libc_getspecific): Likewise.
* sysdeps/mach/hurd/euidaccess.c (__euidaccess): Likewise.
* sysdeps/mach/hurd/faccessat.c (faccessat): Likewise.
* sysdeps/mach/hurd/fchdir.c (__fchdir): Likewise.
* sysdeps/mach/hurd/fchmod.c (__fchmod): Likewise.
* sysdeps/mach/hurd/fchmodat.c (fchmodat): Likewise.
* sysdeps/mach/hurd/fchown.c (__fchown): Likewise.
* sysdeps/mach/hurd/fchownat.c (fchownat): Likewise.
* sysdeps/mach/hurd/flock.c (__flock): Likewise.
* sysdeps/mach/hurd/fsync.c (fsync): Likewise.
* sysdeps/mach/hurd/ftruncate.c (__ftruncate): Likewise.
* sysdeps/mach/hurd/getgroups.c (__getgroups): Likewise.
* sysdeps/mach/hurd/gethostname.c (__gethostname): Likewise.
* sysdeps/mach/hurd/getitimer.c (__getitimer): Likewise.
* sysdeps/mach/hurd/getlogin_r.c (__getlogin_r): Likewise.
* sysdeps/mach/hurd/getpgid.c (__getpgid): Likewise.
* sysdeps/mach/hurd/getrusage.c (__getrusage): Likewise.
* sysdeps/mach/hurd/getsockname.c (__getsockname): Likewise.
* sysdeps/mach/hurd/group_member.c (__group_member): Likewise.
* sysdeps/mach/hurd/isatty.c (__isatty): Likewise.
* sysdeps/mach/hurd/lchown.c (__lchown): Likewise.
* sysdeps/mach/hurd/link.c (__link): Likewise.
* sysdeps/mach/hurd/linkat.c (linkat): Likewise.
* sysdeps/mach/hurd/listen.c (__listen): Likewise.
* sysdeps/mach/hurd/mkdir.c (__mkdir): Likewise.
* sysdeps/mach/hurd/mkdirat.c (mkdirat): Likewise.
* sysdeps/mach/hurd/openat.c (__openat): Likewise.
* sysdeps/mach/hurd/poll.c (__poll): Likewise.
* sysdeps/mach/hurd/readlink.c (__readlink): Likewise.
* sysdeps/mach/hurd/readlinkat.c (readlinkat): Likewise.
* sysdeps/mach/hurd/recv.c (__recv): Likewise.
* sysdeps/mach/hurd/rename.c (rename): Likewise.
* sysdeps/mach/hurd/renameat.c (renameat): Likewise.
* sysdeps/mach/hurd/revoke.c (revoke): Likewise.
* sysdeps/mach/hurd/rewinddir.c (__rewinddir): Likewise.
* sysdeps/mach/hurd/rmdir.c (__rmdir): Likewise.
* sysdeps/mach/hurd/seekdir.c (seekdir): Likewise.
* sysdeps/mach/hurd/send.c (__send): Likewise.
* sysdeps/mach/hurd/setdomain.c (setdomainname): Likewise.
* sysdeps/mach/hurd/setegid.c (setegid): Likewise.
* sysdeps/mach/hurd/seteuid.c (seteuid): Likewise.
* sysdeps/mach/hurd/setgid.c (__setgid): Likewise.
* sysdeps/mach/hurd/setgroups.c (setgroups): Likewise.
* sysdeps/mach/hurd/sethostid.c (sethostid): Likewise.
* sysdeps/mach/hurd/sethostname.c (sethostname): Likewise.
* sysdeps/mach/hurd/setlogin.c (setlogin): Likewise.
* sysdeps/mach/hurd/setpgid.c (__setpgid): Likewise.
* sysdeps/mach/hurd/setregid.c (__setregid): Likewise.
* sysdeps/mach/hurd/setreuid.c (__setreuid): Likewise.
* sysdeps/mach/hurd/settimeofday.c (__settimeofday): Likewise.
* sysdeps/mach/hurd/setuid.c (__setuid): Likewise.
* sysdeps/mach/hurd/shutdown.c (shutdown): Likewise.
* sysdeps/mach/hurd/sigaction.c (__sigaction): Likewise.
* sysdeps/mach/hurd/sigaltstack.c (__sigaltstack): Likewise.
* sysdeps/mach/hurd/sigpending.c (sigpending): Likewise.
* sysdeps/mach/hurd/sigprocmask.c (__sigprocmask): Likewise.
* sysdeps/mach/hurd/sigsuspend.c (__sigsuspend): Likewise.
* sysdeps/mach/hurd/socket.c (__socket): Likewise.
* sysdeps/mach/hurd/symlink.c (__symlink): Likewise.
* sysdeps/mach/hurd/symlinkat.c (symlinkat): Likewise.
* sysdeps/mach/hurd/telldir.c (telldir): Likewise.
* sysdeps/mach/hurd/truncate.c (__truncate): Likewise.
* sysdeps/mach/hurd/umask.c (__umask): Likewise.
* sysdeps/mach/hurd/unlink.c (__unlink): Likewise.
* sysdeps/mach/hurd/unlinkat.c (unlinkat): Likewise.
* sysdeps/mips/mips64/__longjmp.c (__longjmp): Likewise.
* sysdeps/posix/alarm.c (alarm): Likewise.
* sysdeps/posix/cuserid.c (cuserid): Likewise.
* sysdeps/posix/dirfd.c (dirfd): Likewise.
* sysdeps/posix/dup.c (__dup): Likewise.
* sysdeps/posix/dup2.c (__dup2): Likewise.
* sysdeps/posix/euidaccess.c (euidaccess): Likewise.
(main): Likewise.
* sysdeps/posix/flock.c (__flock): Likewise.
* sysdeps/posix/fpathconf.c (__fpathconf): Likewise.
* sysdeps/posix/getcwd.c (__getcwd): Likewise.
* sysdeps/posix/gethostname.c (__gethostname): Likewise.
* sysdeps/posix/gettimeofday.c (__gettimeofday): Likewise.
* sysdeps/posix/isatty.c (__isatty): Likewise.
* sysdeps/posix/killpg.c (killpg): Likewise.
* sysdeps/posix/libc_fatal.c (__libc_fatal): Likewise.
* sysdeps/posix/mkfifoat.c (mkfifoat): Likewise.
* sysdeps/posix/raise.c (raise): Likewise.
* sysdeps/posix/remove.c (remove): Likewise.
* sysdeps/posix/rename.c (rename): Likewise.
* sysdeps/posix/rewinddir.c (__rewinddir): Likewise.
* sysdeps/posix/seekdir.c (seekdir): Likewise.
* sysdeps/posix/sigblock.c (__sigblock): Likewise.
* sysdeps/posix/sigignore.c (sigignore): Likewise.
* sysdeps/posix/sigintr.c (siginterrupt): Likewise.
* sysdeps/posix/signal.c (__bsd_signal): Likewise.
* sysdeps/posix/sigset.c (sigset): Likewise.
* sysdeps/posix/sigsuspend.c (__sigsuspend): Likewise.
* sysdeps/posix/sysconf.c (__sysconf): Likewise.
* sysdeps/posix/sysv_signal.c (__sysv_signal): Likewise.
* sysdeps/posix/time.c (time): Likewise.
* sysdeps/posix/ttyname.c (getttyname): Likewise.
(ttyname): Likewise.
* sysdeps/posix/ttyname_r.c (__ttyname_r): Likewise.
* sysdeps/posix/utime.c (utime): Likewise.
* sysdeps/powerpc/fpu/s_isnan.c (__isnan): Likewise.
* sysdeps/powerpc/nptl/pthread_spin_lock.c (pthread_spin_lock):
Likewise.
* sysdeps/powerpc/nptl/pthread_spin_trylock.c
(pthread_spin_trylock): Likewise.
* sysdeps/pthread/aio_error.c (aio_error): Likewise.
* sysdeps/pthread/aio_read.c (aio_read): Likewise.
* sysdeps/pthread/aio_read64.c (aio_read64): Likewise.
* sysdeps/pthread/aio_write.c (aio_write): Likewise.
* sysdeps/pthread/aio_write64.c (aio_write64): Likewise.
* sysdeps/pthread/flockfile.c (__flockfile): Likewise.
* sysdeps/pthread/ftrylockfile.c (__ftrylockfile): Likewise.
* sysdeps/pthread/funlockfile.c (__funlockfile): Likewise.
* sysdeps/pthread/timer_create.c (timer_create): Likewise.
* sysdeps/pthread/timer_getoverr.c (timer_getoverrun): Likewise.
* sysdeps/pthread/timer_gettime.c (timer_gettime): Likewise.
* sysdeps/s390/ffs.c (__ffs): Likewise.
* sysdeps/s390/nptl/pthread_spin_lock.c (pthread_spin_lock):
Likewise.
* sysdeps/s390/nptl/pthread_spin_trylock.c (pthread_spin_trylock):
Likewise.
* sysdeps/sh/nptl/pthread_spin_lock.c (pthread_spin_lock):
Likewise.
* sysdeps/sparc/nptl/pthread_barrier_destroy.c
(pthread_barrier_destroy): Likewise.
* sysdeps/sparc/nptl/pthread_barrier_wait.c
(__pthread_barrier_wait): Likewise.
* sysdeps/sparc/sparc32/e_sqrt.c (__ieee754_sqrt): Likewise.
* sysdeps/sparc/sparc32/pthread_barrier_wait.c
(__pthread_barrier_wait): Likewise.
* sysdeps/sparc/sparc32/sem_init.c (__old_sem_init): Likewise.
* sysdeps/tile/memcmp.c (memcmp_common_alignment): Likewise.
(memcmp_not_common_alignment): Likewise.
(MEMCMP): Likewise.
* sysdeps/tile/wordcopy.c (_wordcopy_fwd_aligned): Likewise.
(_wordcopy_fwd_dest_aligned): Likewise.
(_wordcopy_bwd_aligned): Likewise.
(_wordcopy_bwd_dest_aligned): Likewise.
* sysdeps/unix/bsd/ftime.c (ftime): Likewise.
* sysdeps/unix/bsd/gtty.c (gtty): Likewise.
* sysdeps/unix/bsd/stty.c (stty): Likewise.
* sysdeps/unix/bsd/tcflow.c (tcflow): Likewise.
* sysdeps/unix/bsd/tcflush.c (tcflush): Likewise.
* sysdeps/unix/bsd/tcgetattr.c (__tcgetattr): Likewise.
* sysdeps/unix/bsd/tcgetpgrp.c (tcgetpgrp): Likewise.
* sysdeps/unix/bsd/tcsendbrk.c (tcsendbreak): Likewise.
* sysdeps/unix/bsd/tcsetattr.c (tcsetattr): Likewise.
* sysdeps/unix/bsd/tcsetpgrp.c (tcsetpgrp): Likewise.
* sysdeps/unix/bsd/ualarm.c (ualarm): Likewise.
* sysdeps/unix/bsd/wait3.c (__wait3): Likewise.
* sysdeps/unix/getlogin_r.c (__getlogin_r): Likewise.
* sysdeps/unix/sockatmark.c (sockatmark): Likewise.
* sysdeps/unix/stime.c (stime): Likewise.
* sysdeps/unix/sysv/linux/_exit.c (_exit): Likewise.
* sysdeps/unix/sysv/linux/aio_sigqueue.c (__aio_sigqueue):
Likewise. Use internal_function.
* sysdeps/unix/sysv/linux/arm/sigaction.c (__libc_sigaction):
Convert to prototype-style function definition.
* sysdeps/unix/sysv/linux/faccessat.c (faccessat): Likewise.
* sysdeps/unix/sysv/linux/fchmodat.c (fchmodat): Likewise.
* sysdeps/unix/sysv/linux/fpathconf.c (__fpathconf): Likewise.
* sysdeps/unix/sysv/linux/gai_sigqueue.c (__gai_sigqueue):
Likewise. Use internal_function.
* sysdeps/unix/sysv/linux/gethostid.c (sethostid): Convert to
prototype-style function definition
* sysdeps/unix/sysv/linux/getlogin_r.c (__getlogin_r_loginuid):
Likewise.
(__getlogin_r): Likewise.
* sysdeps/unix/sysv/linux/getpt.c (__posix_openpt): Likewise.
* sysdeps/unix/sysv/linux/hppa/pthread_cond_broadcast.c
(__pthread_cond_broadcast): Likewise.
* sysdeps/unix/sysv/linux/hppa/pthread_cond_destroy.c
(__pthread_cond_destroy): Likewise.
* sysdeps/unix/sysv/linux/hppa/pthread_cond_init.c
(__pthread_cond_init): Likewise.
* sysdeps/unix/sysv/linux/hppa/pthread_cond_signal.c
(__pthread_cond_signal): Likewise.
* sysdeps/unix/sysv/linux/hppa/pthread_cond_wait.c
(__pthread_cond_wait): Likewise.
* sysdeps/unix/sysv/linux/i386/getmsg.c (getmsg): Likewise.
* sysdeps/unix/sysv/linux/i386/setegid.c (setegid): Likewise.
* sysdeps/unix/sysv/linux/ia64/sigaction.c (__libc_sigaction):
Likewise.
* sysdeps/unix/sysv/linux/ia64/sigpending.c (sigpending):
Likewise.
* sysdeps/unix/sysv/linux/ia64/sigprocmask.c (__sigprocmask):
Likewise.
* sysdeps/unix/sysv/linux/mips/sigaction.c (__libc_sigaction):
Likewise.
* sysdeps/unix/sysv/linux/msgget.c (msgget): Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/ftruncate64.c
(__ftruncate64): Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/truncate64.c
(truncate64): Likewise.
* sysdeps/unix/sysv/linux/pt-raise.c (raise): Likewise.
* sysdeps/unix/sysv/linux/pthread_getcpuclockid.c
(pthread_getcpuclockid): Likewise.
* sysdeps/unix/sysv/linux/pthread_getname.c (pthread_getname_np):
Likewise.
* sysdeps/unix/sysv/linux/pthread_setname.c (pthread_setname_np):
Likewise.
* sysdeps/unix/sysv/linux/pthread_sigmask.c (pthread_sigmask):
Likewise.
* sysdeps/unix/sysv/linux/pthread_sigqueue.c (pthread_sigqueue):
Likewise.
* sysdeps/unix/sysv/linux/raise.c (raise): Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sigaction.c
(__libc_sigaction): Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sigpending.c (sigpending):
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sigprocmask.c
(__sigprocmask): Likewise.
* sysdeps/unix/sysv/linux/semget.c (semget): Likewise.
* sysdeps/unix/sysv/linux/semop.c (semop): Likewise.
* sysdeps/unix/sysv/linux/setrlimit64.c (setrlimit64): Likewise.
* sysdeps/unix/sysv/linux/shmat.c (shmat): Likewise.
* sysdeps/unix/sysv/linux/shmdt.c (shmdt): Likewise.
* sysdeps/unix/sysv/linux/shmget.c (shmget): Likewise.
* sysdeps/unix/sysv/linux/sigaction.c (__libc_sigaction):
Likewise.
* sysdeps/unix/sysv/linux/sigpending.c (sigpending): Likewise.
* sysdeps/unix/sysv/linux/sigprocmask.c (__sigprocmask): Likewise.
* sysdeps/unix/sysv/linux/sigqueue.c (__sigqueue): Likewise.
* sysdeps/unix/sysv/linux/sigstack.c (sigstack): Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/sigpending.c (sigpending):
Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/sigprocmask.c
(__sigprocmask): Likewise.
* sysdeps/unix/sysv/linux/speed.c (cfgetospeed): Likewise.
(cfgetispeed): Likewise.
(cfsetospeed): Likewise.
(cfsetispeed): Likewise.
* sysdeps/unix/sysv/linux/tcflow.c (tcflow): Likewise.
* sysdeps/unix/sysv/linux/tcflush.c (tcflush): Likewise.
* sysdeps/unix/sysv/linux/tcgetattr.c (__tcgetattr): Likewise.
* sysdeps/unix/sysv/linux/tcsetattr.c (tcsetattr): Likewise.
* sysdeps/unix/sysv/linux/time.c (time): Likewise.
* sysdeps/unix/sysv/linux/timer_create.c (timer_create): Likewise.
* sysdeps/unix/sysv/linux/timer_delete.c (timer_delete): Likewise.
* sysdeps/unix/sysv/linux/timer_getoverr.c (timer_getoverrun):
Likewise.
* sysdeps/unix/sysv/linux/timer_gettime.c (timer_gettime):
Likewise.
* sysdeps/unix/sysv/linux/x86_64/sigpending.c (sigpending):
Likewise.
* sysdeps/unix/sysv/linux/x86_64/sigprocmask.c (__sigprocmask):
Likewise.
* sysdeps/x86_64/backtrace.c (__backtrace): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since x86 _dl_unmap and _dl_make_tlsdesc_dynamic are only used
internally in ld.so, they can be made hidden.
[BZ #19122]
* sysdeps/i386/dl-lookupcfg.h (_dl_unmap): Add attribute_hidden.
* sysdeps/i386/dl-tlsdesc.h (_dl_make_tlsdesc_dynamic):
Likewise.
* sysdeps/x86_64/dl-tlsdesc.h (_dl_make_tlsdesc_dynamic):
Likewise.
* sysdeps/x86_64/dl-lookupcfg.h (_dl_unmap): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linker in binutils 2.26 and newer generate GOT references instead
PLT references when -z now is passed to linker. We need to extend
scripts/localplt.awk to allow PLT or GOT references.
[BZ #19007]
* scripts/localplt.awk: Also allow GOT references.
* sysdeps/unix/sysv/linux/i386/localplt.data: Mark
_Unwind_Find_FDE, calloc, memalign, realloc and __libc_memalign
with "+ REL R_386_GLOB_DAT".
* sysdeps/x86_64/localplt.data: Mark calloc, memalign, realloc
and __libc_memalign with "+ RELA R_X86_64_GLOB_DAT".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When x86-64 assmebler doesn't support AVX512, we should make
_dl_runtime_resolve_avx512/_dl_runtime_profile_avx512 as aliases of
_dl_runtime_resolve_avx/_dl_runtime_profile_avx. Tested on x86-64
using GCC 5.2 with binutils 20151008 and GCC 4.8 with binutils 20130219.
There are no differences in ld.so with binutils 20151008. There are no
unexpected failures with binutils 20130219 and 20151008.
[BZ #19124]
* sysdeps/x86_64/dl-trampoline.S [!HAVE_AVX512_ASM_SUPPORT]
(_dl_runtime_resolve_avx512): Make it a hidden alias of
_dl_runtime_resolve_avx.
(_dl_runtime_profile_avx512): Make it a hidden alias of
_dl_runtime_profile_avx.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The x86_64 versions of lrint/lrintf/ lrintl are aliases for the long
long versions which isn't correct for x32, where exceptions must respect
overflow for 32-bit long. Separate versions of the long functions for
x32 that convert to 32-bit long and raise the right exceptions for that
conversion, while keeping the aliases in the non-x32 case.
Tested on x86_64 and x32. There are no code changes in libm.so on
x86_64.
* sysdeps/x86_64/fpu/s_llrint.S (__lrint): Add alias only if
__ILP32__ isn't defined.
(lrint): Likewise.
* sysdeps/x86_64/fpu/s_llrintf.S (__lrintf): Likewise.
(lrintf): Likewise.
* sysdeps/x86_64/fpu/s_llrintl.S (__lrintl): Likewise.
(lrintl): Likewise.
* sysdeps/x86_64/x32/fpu/s_lrint.S: New file.
* sysdeps/x86_64/x32/fpu/s_lrintf.S: Likewise.
* sysdeps/x86_64/x32/fpu/s_lrintl.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC added support for -mno-vzeroupper in version 4.6. Thus the
configure tests for this support are obsolete, and this patch removes
them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).
* sysdeps/i386/configure.ac (libc_cv_cc_novzeroupper): Remove
configure test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure.ac (libc_cv_cc_novzeroupper): Remove
configure test.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/Makefile [$(config-cflags-novzeroupper) = yes]:
Make code unconditional.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC added support for -mfma4 in version 4.5. Thus the configure tests
for this support are obsolete, and this patch removes them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).
* sysdeps/i386/configure.ac (libc_cv_cc_fma4): Remove configure
test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure.ac (libc_cv_cc_fma4): Remove configure
test.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/fpu/multiarch/Makefile [$(have-mfma4) = yes]:
Make code unconditional.
* sysdeps/x86_64/fpu/multiarch/e_asin.c [HAVE_FMA4_SUPPORT]:
Likewise.
* sysdeps/x86_64/fpu/multiarch/e_atan2.c [HAVE_FMA4_SUPPORT]:
Likewise.
[!HAVE_FMA4_SUPPORT]: Remove conditional code.
* sysdeps/x86_64/fpu/multiarch/e_exp.c [HAVE_FMA4_SUPPORT]: Make
code unconditional.
[!HAVE_FMA4_SUPPORT]: Remove conditional code.
* sysdeps/x86_64/fpu/multiarch/e_log.c [HAVE_FMA4_SUPPORT]: Make
code unconditional.
[!HAVE_FMA4_SUPPORT]: Remove conditional code.
* sysdeps/x86_64/fpu/multiarch/e_pow.c [HAVE_FMA4_SUPPORT]: Make
code unconditional.
* sysdeps/x86_64/fpu/multiarch/s_atan.c [HAVE_FMA4_SUPPORT]: Make
code unconditional.
[!HAVE_FMA4_SUPPORT]: Remove conditional code.
* sysdeps/x86_64/fpu/multiarch/s_fma.c [HAVE_FMA4_SUPPORT]: Make
code unconditional.
[!HAVE_FMA4_SUPPORT]: Remove conditional code.
* sysdeps/x86_64/fpu/multiarch/s_fmaf.c [HAVE_FMA4_SUPPORT]: Make
code unconditional.
[!HAVE_FMA4_SUPPORT]: Remove conditional code.
* sysdeps/x86_64/fpu/multiarch/s_sin.c [HAVE_FMA4_SUPPORT]: Make
code unconditional.
[!HAVE_FMA4_SUPPORT]: Remove conditional code.
* sysdeps/x86_64/fpu/multiarch/s_tan.c [HAVE_FMA4_SUPPORT]: Make
code unconditional.
[!HAVE_FMA4_SUPPORT]: Remove conditional code.
* config.h.in (HAVE_FMA4_SUPPORT): Remove #undef.
|