diff options
author | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2019-11-07 10:39:39 +0000 |
---|---|---|
committer | Kyrylo Tkachov <ktkachov@gcc.gnu.org> | 2019-11-07 10:39:39 +0000 |
commit | cf16f980e5278c146f04587ea2a378fab950d7b3 (patch) | |
tree | 7d0163547be22d04c3d43c5bc0d6c02b9f1c5773 /gcc/config/arm/arm.md | |
parent | e9d01715bd7e033eacfbadbfab5f1f206221305c (diff) | |
download | gcc-cf16f980e5278c146f04587ea2a378fab950d7b3.tar.gz |
[arm][1/X] Add initial support for saturation intrinsics
This patch adds the plumbing for and an implementation of the saturation
intrinsics from ACLE, in particular the __ssat, __usat intrinsics.
These intrinsics set the Q sticky bit in APSR if an overflow occurred.
ACLE allows the user to read that bit (within the same function, it's not
defined across function boundaries) using the __saturation_occurred
intrinsic
and reset it using __set_saturation_occurred.
Thus, if the user cares about the Q bit they would be using a flow such as:
__set_saturation_occurred (0); // reset the Q bit
...
__ssat (...) // Do some calculations involving __ssat
...
if (__saturation_occurred ()) // if Q bit set handle overflow
...
For the implementation this has a few implications:
* We must track the Q-setting side-effects of these instructions to make
sure
saturation reading/writing intrinsics are ordered properly.
This is done by introducing a new "apsrq" register (and associated
APSRQ_REGNUM) in a similar way to the "fake"" cc register.
* The RTL patterns coming out of these intrinsics can have two forms:
one where they set the APSRQ_REGNUM and one where they don't.
Which one is used depends on whether the function cares about reading the Q
flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the
__saturation_occurred, __set_saturation_occurred occurrences.
If no Q-flag read is present in the function we'll use the simpler
non-Q-setting form to allow for more aggressive scheduling and such.
If a Q-bit read is present then the Q-setting form is emitted.
To avoid adding two patterns for each intrinsic to the MD file we make
use of define_subst to auto-generate the Q-setting forms
* Some existing patterns already produce instructions that may clobber the
Q bit, but they don't model it (as we didn't care about that bit up till
now).
Since these patterns can be generated from straight-line C code they can
affect
the Q-bit reads from intrinsics. Therefore they have to be disabled when
a Q-bit read is present. These are mostly patterns in arm-fixed.md that are
not very common anyway, but there are also a couple of widening
multiply-accumulate patterns in arm.md that can set the Q-bit during
accumulation.
There are more Q-setting intrinsics in ACLE, but these will be
implemented in
a more mechanical fashion once the infrastructure in this patch goes in.
* config/arm/aout.h (REGISTER_NAMES): Add apsrq.
* config/arm/arm.md (APSRQ_REGNUM): Define.
(add_setq): New define_subst.
(add_clobber_q_name): New define_subst_attr.
(add_clobber_q_pred): Likewise.
(maddhisi4): Change to define_expand. Split into mult and add if
ARM_Q_BIT_READ.
(arm_maddhisi4): New define_insn.
(*maddhisi4tb): Disable for ARM_Q_BIT_READ.
(*maddhisi4tt): Likewise.
(arm_ssat): New define_expand.
(arm_usat): Likewise.
(arm_get_apsr): New define_insn.
(arm_set_apsr): Likewise.
(arm_saturation_occurred): New define_expand.
(arm_set_saturation): Likewise.
(*satsi_<SAT:code>): Rename to...
(satsi_<SAT:code><add_clobber_q_name>): ... This.
(*satsi_<SAT:code>_shift): Disable for ARM_Q_BIT_READ.
* config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed.
(CALL_USED_REGISTERS): Mark apsrq.
(FIRST_PSEUDO_REGISTER): Update value.
(REG_ALLOC_ORDER): Add APSRQ_REGNUM.
(machine_function): Add q_bit_access.
(ARM_Q_BIT_READ): Define.
* config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define.
(arm_conditional_register_usage): Clear APSRQ_REGNUM from
operand_reg_set.
(arm_q_bit_access): Define.
* config/arm/arm-builtins.c: Include stringpool.h.
(arm_sat_binop_imm_qualifiers,
arm_unsigned_sat_binop_unsigned_imm_qualifiers,
arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define.
(SAT_BINOP_UNSIGNED_IMM_QUALIFIERS,
UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS,
SET_SAT_QUALIFIERS): Likewise.
(arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK.
(arm_init_acle_builtins): Initialize __builtin_sat_imm_check.
Handle 0 argument expander.
(arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK.
(arm_check_builtin_call): Define.
* config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3,
arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ.
* config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype.
(arm_q_bit_access): Likewise.
* config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation,
__saturation_occurred, __set_saturation_occurred): Define.
* config/arm/arm_acle_builtins.def: Define builtins for ssat, usat,
saturation_occurred, set_saturation_occurred.
* config/arm/unspecs.md (UNSPEC_Q_SET): Define.
(UNSPEC_APSR_READ): Likewise.
(VUNSPEC_APSR_WRITE): Likewise.
* config/arm/arm-fixed.md (ssadd<mode>3): Convert to define_expand.
(*arm_ssadd<mode>3): New define_insn.
(sssub<mode>3): Convert to define_expand.
(*arm_sssub<mode>3): New define_insn.
(ssmulsa3): Convert to define_expand.
(*arm_ssmulsa3): New define_insn.
(usmulusa3): Convert to define_expand.
(*arm_usmulusa3): New define_insn.
(ssmulha3): FAIL if ARM_Q_BIT_READ.
(arm_ssatsihi_shift, arm_usatsihi): Disable for ARM_Q_BIT_READ.
* config/arm/iterators.md (qaddsub_clob_q): New mode attribute.
* gcc.target/arm/acle/saturation.c: New test.
* gcc.target/arm/acle/sat_no_smlatb.c: Likewise.
* lib/target-supports.exp (check_effective_target_arm_qbit_ok_nocache):
Define..
(check_effective_target_arm_qbit_ok): Likewise.
(add_options_for_arm_qbit): Likewise.
From-SVN: r277914
Diffstat (limited to 'gcc/config/arm/arm.md')
-rw-r--r-- | gcc/config/arm/arm.md | 152 |
1 files changed, 145 insertions, 7 deletions
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 4f035cbfddd..992d7b60bbc 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -39,6 +39,7 @@ (LAST_ARM_REGNUM 15) ; (CC_REGNUM 100) ; Condition code pseudo register (VFPCC_REGNUM 101) ; VFP Condition code pseudo register + (APSRQ_REGNUM 104) ; Q bit pseudo register ] ) ;; 3rd operand to select_dominance_cc_mode @@ -423,6 +424,20 @@ (include "marvell-pj4.md") (include "xgene1.md") +;; define_subst and associated attributes + +(define_subst "add_setq" + [(set (match_operand:SI 0 "" "") + (match_operand:SI 1 "" ""))] + "" + [(set (match_dup 0) + (match_dup 1)) + (set (reg:CC APSRQ_REGNUM) + (unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))]) + +(define_subst_attr "add_clobber_q_name" "add_setq" "" "_setq") +(define_subst_attr "add_clobber_q_pred" "add_setq" "!ARM_Q_BIT_READ" + "ARM_Q_BIT_READ") ;;--------------------------------------------------------------------------- ;; Insn patterns @@ -2515,14 +2530,36 @@ (set_attr "predicable" "yes")] ) -(define_insn "maddhisi4" +(define_expand "maddhisi4" + [(set (match_operand:SI 0 "s_register_operand") + (plus:SI (mult:SI (sign_extend:SI + (match_operand:HI 1 "s_register_operand")) + (sign_extend:SI + (match_operand:HI 2 "s_register_operand"))) + (match_operand:SI 3 "s_register_operand")))] + "TARGET_DSP_MULTIPLY" + { + /* If this function reads the Q bit from ACLE intrinsics break up the + multiplication and accumulation as an overflow during accumulation will + clobber the Q flag. */ + if (ARM_Q_BIT_READ) + { + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_mulhisi3 (tmp, operands[1], operands[2])); + emit_insn (gen_addsi3 (operands[0], tmp, operands[3])); + DONE; + } + } +) + +(define_insn "*arm_maddhisi4" [(set (match_operand:SI 0 "s_register_operand" "=r") (plus:SI (mult:SI (sign_extend:SI (match_operand:HI 1 "s_register_operand" "r")) (sign_extend:SI (match_operand:HI 2 "s_register_operand" "r"))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlabb%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] @@ -2537,7 +2574,7 @@ (sign_extend:SI (match_operand:HI 2 "s_register_operand" "r"))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlatb%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] @@ -2552,7 +2589,7 @@ (match_operand:SI 2 "s_register_operand" "r") (const_int 16))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlatt%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] @@ -4044,12 +4081,113 @@ (define_code_attr SATlo [(smin "1") (smax "2")]) (define_code_attr SAThi [(smin "2") (smax "1")]) -(define_insn "*satsi_<SAT:code>" +(define_expand "arm_ssat" + [(match_operand:SI 0 "s_register_operand") + (match_operand:SI 1 "s_register_operand") + (match_operand:SI 2 "const_int_operand")] + "TARGET_32BIT && arm_arch6" + { + HOST_WIDE_INT val = INTVAL (operands[2]); + /* The builtin checking code should have ensured the right + range for the immediate. */ + gcc_assert (IN_RANGE (val, 1, 32)); + HOST_WIDE_INT upper_bound = (HOST_WIDE_INT_1 << (val - 1)) - 1; + HOST_WIDE_INT lower_bound = -upper_bound - 1; + rtx up_rtx = gen_int_mode (upper_bound, SImode); + rtx lo_rtx = gen_int_mode (lower_bound, SImode); + if (ARM_Q_BIT_READ) + emit_insn (gen_satsi_smin_setq (operands[0], lo_rtx, + up_rtx, operands[1])); + else + emit_insn (gen_satsi_smin (operands[0], lo_rtx, up_rtx, operands[1])); + DONE; + } +) + +(define_expand "arm_usat" + [(match_operand:SI 0 "s_register_operand") + (match_operand:SI 1 "s_register_operand") + (match_operand:SI 2 "const_int_operand")] + "TARGET_32BIT && arm_arch6" + { + HOST_WIDE_INT val = INTVAL (operands[2]); + /* The builtin checking code should have ensured the right + range for the immediate. */ + gcc_assert (IN_RANGE (val, 0, 31)); + HOST_WIDE_INT upper_bound = (HOST_WIDE_INT_1 << val) - 1; + rtx up_rtx = gen_int_mode (upper_bound, SImode); + rtx lo_rtx = CONST0_RTX (SImode); + if (ARM_Q_BIT_READ) + emit_insn (gen_satsi_smin_setq (operands[0], lo_rtx, up_rtx, + operands[1])); + else + emit_insn (gen_satsi_smin (operands[0], lo_rtx, up_rtx, operands[1])); + DONE; + } +) + +(define_insn "arm_get_apsr" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (unspec:SI [(reg:CC APSRQ_REGNUM)] UNSPEC_APSR_READ))] + "TARGET_ARM_QBIT" + "mrs%?\t%0, APSR" + [(set_attr "predicable" "yes") + (set_attr "conds" "use")] +) + +(define_insn "arm_set_apsr" + [(set (reg:CC APSRQ_REGNUM) + (unspec_volatile:CC + [(match_operand:SI 0 "s_register_operand" "r")] VUNSPEC_APSR_WRITE))] + "TARGET_ARM_QBIT" + "msr%?\tAPSR_nzcvq, %0" + [(set_attr "predicable" "yes") + (set_attr "conds" "set")] +) + +;; Read the APSR and extract the Q bit (bit 27) +(define_expand "arm_saturation_occurred" + [(match_operand:SI 0 "s_register_operand")] + "TARGET_ARM_QBIT" + { + rtx apsr = gen_reg_rtx (SImode); + emit_insn (gen_arm_get_apsr (apsr)); + emit_insn (gen_extzv (operands[0], apsr, CONST1_RTX (SImode), + gen_int_mode (27, SImode))); + DONE; + } +) + +;; Read the APSR and set the Q bit (bit position 27) according to operand 0 +(define_expand "arm_set_saturation" + [(match_operand:SI 0 "reg_or_int_operand")] + "TARGET_ARM_QBIT" + { + rtx apsr = gen_reg_rtx (SImode); + emit_insn (gen_arm_get_apsr (apsr)); + rtx to_insert = gen_reg_rtx (SImode); + if (CONST_INT_P (operands[0])) + emit_move_insn (to_insert, operands[0] == CONST0_RTX (SImode) + ? CONST0_RTX (SImode) : CONST1_RTX (SImode)); + else + { + rtx cmp = gen_rtx_NE (SImode, operands[0], CONST0_RTX (SImode)); + emit_insn (gen_cstoresi4 (to_insert, cmp, operands[0], + CONST0_RTX (SImode))); + } + emit_insn (gen_insv (apsr, CONST1_RTX (SImode), + gen_int_mode (27, SImode), to_insert)); + emit_insn (gen_arm_set_apsr (apsr)); + DONE; + } +) + +(define_insn "satsi_<SAT:code><add_clobber_q_name>" [(set (match_operand:SI 0 "s_register_operand" "=r") (SAT:SI (<SATrev>:SI (match_operand:SI 3 "s_register_operand" "r") (match_operand:SI 1 "const_int_operand" "i")) (match_operand:SI 2 "const_int_operand" "i")))] - "TARGET_32BIT && arm_arch6 + "TARGET_32BIT && arm_arch6 && <add_clobber_q_pred> && arm_sat_operator_match (operands[<SAT:SATlo>], operands[<SAT:SAThi>], NULL, NULL)" { int mask; @@ -4075,7 +4213,7 @@ (match_operand:SI 5 "const_int_operand" "i")]) (match_operand:SI 1 "const_int_operand" "i")) (match_operand:SI 2 "const_int_operand" "i")))] - "TARGET_32BIT && arm_arch6 + "TARGET_32BIT && arm_arch6 && !ARM_Q_BIT_READ && arm_sat_operator_match (operands[<SAT:SATlo>], operands[<SAT:SAThi>], NULL, NULL)" { int mask; |