diff options
author | hjl <hjl@138bc75d-0d04-0410-961f-82ee72b054a4> | 2010-10-27 12:36:15 +0000 |
---|---|---|
committer | hjl <hjl@138bc75d-0d04-0410-961f-82ee72b054a4> | 2010-10-27 12:36:15 +0000 |
commit | 3970ad845c9e8831fed616742bf3e269df28f3b3 (patch) | |
tree | e6bb0c094c327d8c2c95394432623ac76fa04746 /gcc/doc | |
parent | 31ccad94ad5151ede14d26b78431d4798c99af04 (diff) | |
download | gcc-3970ad845c9e8831fed616742bf3e269df28f3b3.tar.gz |
Add -mvzeroupper to x86.
gcc/
2010-10-27 H.J. Lu <hongjiu.lu@intel.com>
* config/i386/i386-protos.h (init_cumulative_args): Add an int.
* config/i386/i386.c (block_info): New.
(BLOCK_INFO): Likewise.
(call_avx256_state): Likewise.
(check_avx256_stores): Likewise.
(move_or_delete_vzeroupper_2): Likewise.
(move_or_delete_vzeroupper_1): Likewise.
(move_or_delete_vzeroupper): Likewise.
(use_avx256_p): Likewise.
(function_pass_avx256_p): Likewise.
(flag_opts): Add -mvzeroupper.
(ix86_option_override_internal): Turn on MASK_VZEROUPPER by
default for TARGET_AVX. Turn off MASK_VZEROUPPER if TARGET_AVX
is disabled.
(ix86_function_ok_for_sibcall): Disable sibcall if we need to
generate vzeroupper.
(init_cumulative_args): Add an int to indicate caller. Set
use_avx256_p, callee_return_avx256_p and caller_use_avx256_p
based on return type.
(ix86_function_arg): Set use_avx256_p, callee_pass_avx256_p and
caller_pass_avx256_p based on argument type.
(ix86_expand_epilogue): Emit vzeroupper if 256bit AVX register
is used, but not returned by caller.
(ix86_expand_call): Emit vzeroupper if 256bit AVX register is
used.
(ix86_local_alignment): Set use_avx256_p if 256bit AVX register
is used.
(ix86_minimum_alignment): Likewise.
(ix86_expand_special_args_builtin): Set target to
GEN_INT (vzeroupper_intrinsic) for CODE_FOR_avx_vzeroupper.
(ix86_reorg): Run the vzeroupper optimization if needed.
* config/i386/i386.h (ix86_args): Add caller.
(INIT_CUMULATIVE_ARGS): Updated.
(machine_function): Add use_vzeroupper_p, use_avx256_p,
caller_pass_avx256_p, caller_return_avx256_p,
callee_pass_avx256_p and callee_return_avx256_p.
* config/i386/i386.opt (-mvzeroupper): New.
* config/i386/predicates.md (vzeroupper_operation): Removed.
* config/i386/sse.md (avx_vzeroupper): Removed.
(*avx_vzeroupper): Removed.
(avx_vzeroupper): New.
* doc/invoke.texi: Document -mvzeroupper.
gcc/testsuite/
2010-10-27 H.J. Lu <hongjiu.lu@intel.com>
* gcc.target/i386/avx-vzeroupper-1.c: Add -mtune=generic.
* gcc.target/i386/avx-vzeroupper-2.c: Likewise.
* gcc.target/i386/avx-vzeroupper-3.c: New.
* gcc.target/i386/avx-vzeroupper-4.c: Likewise.
* gcc.target/i386/avx-vzeroupper-5.c: Likewise.
* gcc.target/i386/avx-vzeroupper-6.c: Likewise.
* gcc.target/i386/avx-vzeroupper-7.c: Likewise.
* gcc.target/i386/avx-vzeroupper-8.c: Likewise.
* gcc.target/i386/avx-vzeroupper-9.c: Likewise.
* gcc.target/i386/avx-vzeroupper-10.c: Likewise.
* gcc.target/i386/avx-vzeroupper-11.c: Likewise.
* gcc.target/i386/avx-vzeroupper-12.c: Likewise.
* gcc.target/i386/avx-vzeroupper-13.c: Likewise.
* gcc.target/i386/avx-vzeroupper-14.c: Likewise.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@166000 138bc75d-0d04-0410-961f-82ee72b054a4
Diffstat (limited to 'gcc/doc')
-rw-r--r-- | gcc/doc/invoke.texi | 9 |
1 files changed, 8 insertions, 1 deletions
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 7ea042f6775..365b8c3af43 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -594,7 +594,7 @@ Objective-C and Objective-C++ Dialects}. -mno-wide-multiply -mrtd -malign-double @gol -mpreferred-stack-boundary=@var{num} -mincoming-stack-boundary=@var{num} @gol --mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip @gol +-mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip -mvzeroupper @gol -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol -maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfused-madd @gol -msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop -mlwp @gol @@ -12466,6 +12466,13 @@ GCC with the @option{--enable-cld} configure option. Generation of @code{cld} instructions can be suppressed with the @option{-mno-cld} compiler option in this case. +@item -mvzeroupper +@opindex mvzeroupper +This option instructs GCC to emit a @code{vzeroupper} instruction +before a transfer of control flow out of the function to minimize +AVX to SSE transition penalty as well as remove unnecessary zeroupper +intrinsics. + @item -mcx16 @opindex mcx16 This option will enable GCC to use CMPXCHG16B instruction in generated code. |