From cf16f980e5278c146f04587ea2a378fab950d7b3 Mon Sep 17 00:00:00 2001 From: Kyrylo Tkachov Date: Thu, 7 Nov 2019 10:39:39 +0000 Subject: [arm][1/X] Add initial support for saturation intrinsics This patch adds the plumbing for and an implementation of the saturation intrinsics from ACLE, in particular the __ssat, __usat intrinsics. These intrinsics set the Q sticky bit in APSR if an overflow occurred. ACLE allows the user to read that bit (within the same function, it's not defined across function boundaries) using the __saturation_occurred intrinsic and reset it using __set_saturation_occurred. Thus, if the user cares about the Q bit they would be using a flow such as: __set_saturation_occurred (0); // reset the Q bit ... __ssat (...) // Do some calculations involving __ssat ... if (__saturation_occurred ()) // if Q bit set handle overflow ... For the implementation this has a few implications: * We must track the Q-setting side-effects of these instructions to make sure saturation reading/writing intrinsics are ordered properly. This is done by introducing a new "apsrq" register (and associated APSRQ_REGNUM) in a similar way to the "fake"" cc register. * The RTL patterns coming out of these intrinsics can have two forms: one where they set the APSRQ_REGNUM and one where they don't. Which one is used depends on whether the function cares about reading the Q flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the __saturation_occurred, __set_saturation_occurred occurrences. If no Q-flag read is present in the function we'll use the simpler non-Q-setting form to allow for more aggressive scheduling and such. If a Q-bit read is present then the Q-setting form is emitted. To avoid adding two patterns for each intrinsic to the MD file we make use of define_subst to auto-generate the Q-setting forms * Some existing patterns already produce instructions that may clobber the Q bit, but they don't model it (as we didn't care about that bit up till now). Since these patterns can be generated from straight-line C code they can affect the Q-bit reads from intrinsics. Therefore they have to be disabled when a Q-bit read is present. These are mostly patterns in arm-fixed.md that are not very common anyway, but there are also a couple of widening multiply-accumulate patterns in arm.md that can set the Q-bit during accumulation. There are more Q-setting intrinsics in ACLE, but these will be implemented in a more mechanical fashion once the infrastructure in this patch goes in. * config/arm/aout.h (REGISTER_NAMES): Add apsrq. * config/arm/arm.md (APSRQ_REGNUM): Define. (add_setq): New define_subst. (add_clobber_q_name): New define_subst_attr. (add_clobber_q_pred): Likewise. (maddhisi4): Change to define_expand. Split into mult and add if ARM_Q_BIT_READ. (arm_maddhisi4): New define_insn. (*maddhisi4tb): Disable for ARM_Q_BIT_READ. (*maddhisi4tt): Likewise. (arm_ssat): New define_expand. (arm_usat): Likewise. (arm_get_apsr): New define_insn. (arm_set_apsr): Likewise. (arm_saturation_occurred): New define_expand. (arm_set_saturation): Likewise. (*satsi_): Rename to... (satsi_): ... This. (*satsi__shift): Disable for ARM_Q_BIT_READ. * config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed. (CALL_USED_REGISTERS): Mark apsrq. (FIRST_PSEUDO_REGISTER): Update value. (REG_ALLOC_ORDER): Add APSRQ_REGNUM. (machine_function): Add q_bit_access. (ARM_Q_BIT_READ): Define. * config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define. (arm_conditional_register_usage): Clear APSRQ_REGNUM from operand_reg_set. (arm_q_bit_access): Define. * config/arm/arm-builtins.c: Include stringpool.h. (arm_sat_binop_imm_qualifiers, arm_unsigned_sat_binop_unsigned_imm_qualifiers, arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define. (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS, SET_SAT_QUALIFIERS): Likewise. (arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK. (arm_init_acle_builtins): Initialize __builtin_sat_imm_check. Handle 0 argument expander. (arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK. (arm_check_builtin_call): Define. * config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3, arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ. * config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype. (arm_q_bit_access): Likewise. * config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation, __saturation_occurred, __set_saturation_occurred): Define. * config/arm/arm_acle_builtins.def: Define builtins for ssat, usat, saturation_occurred, set_saturation_occurred. * config/arm/unspecs.md (UNSPEC_Q_SET): Define. (UNSPEC_APSR_READ): Likewise. (VUNSPEC_APSR_WRITE): Likewise. * config/arm/arm-fixed.md (ssadd3): Convert to define_expand. (*arm_ssadd3): New define_insn. (sssub3): Convert to define_expand. (*arm_sssub3): New define_insn. (ssmulsa3): Convert to define_expand. (*arm_ssmulsa3): New define_insn. (usmulusa3): Convert to define_expand. (*arm_usmulusa3): New define_insn. (ssmulha3): FAIL if ARM_Q_BIT_READ. (arm_ssatsihi_shift, arm_usatsihi): Disable for ARM_Q_BIT_READ. * config/arm/iterators.md (qaddsub_clob_q): New mode attribute. * gcc.target/arm/acle/saturation.c: New test. * gcc.target/arm/acle/sat_no_smlatb.c: Likewise. * lib/target-supports.exp (check_effective_target_arm_qbit_ok_nocache): Define.. (check_effective_target_arm_qbit_ok): Likewise. (add_options_for_arm_qbit): Likewise. From-SVN: r277914 --- gcc/config/arm/arm.md | 152 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 145 insertions(+), 7 deletions(-) (limited to 'gcc/config/arm/arm.md') diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 4f035cbfddd..992d7b60bbc 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -39,6 +39,7 @@ (LAST_ARM_REGNUM 15) ; (CC_REGNUM 100) ; Condition code pseudo register (VFPCC_REGNUM 101) ; VFP Condition code pseudo register + (APSRQ_REGNUM 104) ; Q bit pseudo register ] ) ;; 3rd operand to select_dominance_cc_mode @@ -423,6 +424,20 @@ (include "marvell-pj4.md") (include "xgene1.md") +;; define_subst and associated attributes + +(define_subst "add_setq" + [(set (match_operand:SI 0 "" "") + (match_operand:SI 1 "" ""))] + "" + [(set (match_dup 0) + (match_dup 1)) + (set (reg:CC APSRQ_REGNUM) + (unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))]) + +(define_subst_attr "add_clobber_q_name" "add_setq" "" "_setq") +(define_subst_attr "add_clobber_q_pred" "add_setq" "!ARM_Q_BIT_READ" + "ARM_Q_BIT_READ") ;;--------------------------------------------------------------------------- ;; Insn patterns @@ -2515,14 +2530,36 @@ (set_attr "predicable" "yes")] ) -(define_insn "maddhisi4" +(define_expand "maddhisi4" + [(set (match_operand:SI 0 "s_register_operand") + (plus:SI (mult:SI (sign_extend:SI + (match_operand:HI 1 "s_register_operand")) + (sign_extend:SI + (match_operand:HI 2 "s_register_operand"))) + (match_operand:SI 3 "s_register_operand")))] + "TARGET_DSP_MULTIPLY" + { + /* If this function reads the Q bit from ACLE intrinsics break up the + multiplication and accumulation as an overflow during accumulation will + clobber the Q flag. */ + if (ARM_Q_BIT_READ) + { + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_mulhisi3 (tmp, operands[1], operands[2])); + emit_insn (gen_addsi3 (operands[0], tmp, operands[3])); + DONE; + } + } +) + +(define_insn "*arm_maddhisi4" [(set (match_operand:SI 0 "s_register_operand" "=r") (plus:SI (mult:SI (sign_extend:SI (match_operand:HI 1 "s_register_operand" "r")) (sign_extend:SI (match_operand:HI 2 "s_register_operand" "r"))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlabb%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] @@ -2537,7 +2574,7 @@ (sign_extend:SI (match_operand:HI 2 "s_register_operand" "r"))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlatb%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] @@ -2552,7 +2589,7 @@ (match_operand:SI 2 "s_register_operand" "r") (const_int 16))) (match_operand:SI 3 "s_register_operand" "r")))] - "TARGET_DSP_MULTIPLY" + "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ" "smlatt%?\\t%0, %1, %2, %3" [(set_attr "type" "smlaxy") (set_attr "predicable" "yes")] @@ -4044,12 +4081,113 @@ (define_code_attr SATlo [(smin "1") (smax "2")]) (define_code_attr SAThi [(smin "2") (smax "1")]) -(define_insn "*satsi_" +(define_expand "arm_ssat" + [(match_operand:SI 0 "s_register_operand") + (match_operand:SI 1 "s_register_operand") + (match_operand:SI 2 "const_int_operand")] + "TARGET_32BIT && arm_arch6" + { + HOST_WIDE_INT val = INTVAL (operands[2]); + /* The builtin checking code should have ensured the right + range for the immediate. */ + gcc_assert (IN_RANGE (val, 1, 32)); + HOST_WIDE_INT upper_bound = (HOST_WIDE_INT_1 << (val - 1)) - 1; + HOST_WIDE_INT lower_bound = -upper_bound - 1; + rtx up_rtx = gen_int_mode (upper_bound, SImode); + rtx lo_rtx = gen_int_mode (lower_bound, SImode); + if (ARM_Q_BIT_READ) + emit_insn (gen_satsi_smin_setq (operands[0], lo_rtx, + up_rtx, operands[1])); + else + emit_insn (gen_satsi_smin (operands[0], lo_rtx, up_rtx, operands[1])); + DONE; + } +) + +(define_expand "arm_usat" + [(match_operand:SI 0 "s_register_operand") + (match_operand:SI 1 "s_register_operand") + (match_operand:SI 2 "const_int_operand")] + "TARGET_32BIT && arm_arch6" + { + HOST_WIDE_INT val = INTVAL (operands[2]); + /* The builtin checking code should have ensured the right + range for the immediate. */ + gcc_assert (IN_RANGE (val, 0, 31)); + HOST_WIDE_INT upper_bound = (HOST_WIDE_INT_1 << val) - 1; + rtx up_rtx = gen_int_mode (upper_bound, SImode); + rtx lo_rtx = CONST0_RTX (SImode); + if (ARM_Q_BIT_READ) + emit_insn (gen_satsi_smin_setq (operands[0], lo_rtx, up_rtx, + operands[1])); + else + emit_insn (gen_satsi_smin (operands[0], lo_rtx, up_rtx, operands[1])); + DONE; + } +) + +(define_insn "arm_get_apsr" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (unspec:SI [(reg:CC APSRQ_REGNUM)] UNSPEC_APSR_READ))] + "TARGET_ARM_QBIT" + "mrs%?\t%0, APSR" + [(set_attr "predicable" "yes") + (set_attr "conds" "use")] +) + +(define_insn "arm_set_apsr" + [(set (reg:CC APSRQ_REGNUM) + (unspec_volatile:CC + [(match_operand:SI 0 "s_register_operand" "r")] VUNSPEC_APSR_WRITE))] + "TARGET_ARM_QBIT" + "msr%?\tAPSR_nzcvq, %0" + [(set_attr "predicable" "yes") + (set_attr "conds" "set")] +) + +;; Read the APSR and extract the Q bit (bit 27) +(define_expand "arm_saturation_occurred" + [(match_operand:SI 0 "s_register_operand")] + "TARGET_ARM_QBIT" + { + rtx apsr = gen_reg_rtx (SImode); + emit_insn (gen_arm_get_apsr (apsr)); + emit_insn (gen_extzv (operands[0], apsr, CONST1_RTX (SImode), + gen_int_mode (27, SImode))); + DONE; + } +) + +;; Read the APSR and set the Q bit (bit position 27) according to operand 0 +(define_expand "arm_set_saturation" + [(match_operand:SI 0 "reg_or_int_operand")] + "TARGET_ARM_QBIT" + { + rtx apsr = gen_reg_rtx (SImode); + emit_insn (gen_arm_get_apsr (apsr)); + rtx to_insert = gen_reg_rtx (SImode); + if (CONST_INT_P (operands[0])) + emit_move_insn (to_insert, operands[0] == CONST0_RTX (SImode) + ? CONST0_RTX (SImode) : CONST1_RTX (SImode)); + else + { + rtx cmp = gen_rtx_NE (SImode, operands[0], CONST0_RTX (SImode)); + emit_insn (gen_cstoresi4 (to_insert, cmp, operands[0], + CONST0_RTX (SImode))); + } + emit_insn (gen_insv (apsr, CONST1_RTX (SImode), + gen_int_mode (27, SImode), to_insert)); + emit_insn (gen_arm_set_apsr (apsr)); + DONE; + } +) + +(define_insn "satsi_" [(set (match_operand:SI 0 "s_register_operand" "=r") (SAT:SI (:SI (match_operand:SI 3 "s_register_operand" "r") (match_operand:SI 1 "const_int_operand" "i")) (match_operand:SI 2 "const_int_operand" "i")))] - "TARGET_32BIT && arm_arch6 + "TARGET_32BIT && arm_arch6 && && arm_sat_operator_match (operands[], operands[], NULL, NULL)" { int mask; @@ -4075,7 +4213,7 @@ (match_operand:SI 5 "const_int_operand" "i")]) (match_operand:SI 1 "const_int_operand" "i")) (match_operand:SI 2 "const_int_operand" "i")))] - "TARGET_32BIT && arm_arch6 + "TARGET_32BIT && arm_arch6 && !ARM_Q_BIT_READ && arm_sat_operator_match (operands[], operands[], NULL, NULL)" { int mask; -- cgit v1.2.1