summaryrefslogtreecommitdiff
path: root/gcc/config/arm/mve.md
Commit message (Collapse)AuthorAgeFilesLines
* arm: Auto-vectorization for MVE: vcmpChristophe Lyon2021-05-171-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since MVE has a different set of vector comparison operators from Neon, we have to update the expansion to take into account the new ones, for instance 'NE' for which MVE does not require to use 'EQ' with the inverted condition. Conversely, Neon supports comparisons with #0, MVE does not. For: typedef long int vs32 __attribute__((vector_size(16))); vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; } we now generate: cmp_eq_vs32_reg: vldr.64 d4, .L123 @ 8 [c=8 l=4] *mve_movv4si/8 vldr.64 d5, .L123+8 vldr.64 d6, .L123+16 @ 9 [c=8 l=4] *mve_movv4si/8 vldr.64 d7, .L123+24 vcmp.i32 eq, q0, q1 @ 7 [c=16 l=4] mve_vcmpeqq_v4si vpsel q0, q3, q2 @ 15 [c=8 l=4] mve_vpselq_sv4si bx lr @ 26 [c=8 l=4] *thumb2_return .L124: .align 3 .L123: .word 0 .word 0 .word 0 .word 0 .word 1 .word 1 .word 1 .word 1 For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) produces a pair of vldr instead of vmov.i32, qX, #0 2021-05-17 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm-protos.h (arm_expand_vector_compare): Update prototype. * config/arm/arm.c (arm_expand_vector_compare): Add support for MVE. (arm_expand_vcond): Likewise. * config/arm/iterators.md (supf): Remove VCMPNEQ_S, VCMPEQQ_S, VCMPEQQ_N_S, VCMPNEQ_N_S. (VCMPNEQ, VCMPEQQ, VCMPEQQ_N, VCMPNEQ_N): Remove. * config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>): Add '@' prefix. (@mve_vcmp<mve_cmp_op>q_f<mode>): Likewise. (@mve_vcmp<mve_cmp_op>q_n_f<mode>): Likewise. (@mve_vpselq_<supf><mode>): Likewise. (@mve_vpselq_f<mode>"): Likewise. * config/arm/neon.md (vec_cmp<mode><v_cmp_result): Enable for MVE and move to vec-common.md. (vec_cmpu<mode><mode>): Likewise. (vcond<mode><mode>): Likewise. (vcond<V_cvtto><mode>): Likewise. (vcondu<mode><v_cmp_result>): Likewise. (vcond_mask_<mode><v_cmp_result>): Likewise. * config/arm/unspecs.md (VCMPNEQ_U, VCMPNEQ_S, VCMPEQQ_S) (VCMPEQQ_N_S, VCMPNEQ_N_S, VCMPEQQ_U, CMPEQQ_N_U, VCMPNEQ_N_U) (VCMPGEQ_N_S, VCMPGEQ_S, VCMPGTQ_N_S, VCMPGTQ_S, VCMPLEQ_N_S) (VCMPLEQ_S, VCMPLTQ_N_S, VCMPLTQ_S, VCMPCSQ_N_U, VCMPCSQ_U) (VCMPHIQ_N_U, VCMPHIQ_U): Remove. * config/arm/vec-common.md (vec_cmp<mode><v_cmp_result): Moved from neon.md. (vec_cmpu<mode><mode>): Likewise. (vcond<mode><mode>): Likewise. (vcond<V_cvtto><mode>): Likewise. (vcondu<mode><v_cmp_result>): Likewise. (vcond_mask_<mode><v_cmp_result>): Likewise. Added unsafe math condition. gcc/testsuite * gcc.target/arm/simd/mve-compare-1.c: New test with GCC vectors. * gcc.target/arm/simd/mve-compare-2.c: New test with GCC vectors. * gcc.target/arm/simd/mve-compare-scalar-1.c: New test with GCC vectors. * gcc.target/arm/simd/mve-vcmp-f32.c: New test for auto-vectorization. * gcc.target/arm/simd/mve-vcmp.c: New test for auto-vectorization.
* arm: MVE: Factorize vcmp_*f*Christophe Lyon2021-05-101-162/+10
| | | | | | | | | | | | | | | | | | | | | | Like in the previous, we factorize the vcmp_*f* patterns to make maintenance easier. 2021-05-10 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (MVE_FP_COMPARISONS): New. * config/arm/mve.md (mve_vcmp<mve_cmp_op>q_f<mode>) (mve_vcmp<mve_cmp_op>q_n_f<mode>): New, merge all vcmp_*f* patterns. (mve_vcmpeqq_f<mode>, mve_vcmpeqq_n_f<mode>, mve_vcmpgeq_f<mode>) (mve_vcmpgeq_n_f<mode>, mve_vcmpgtq_f<mode>) (mve_vcmpgtq_n_f<mode>, mve_vcmpleq_f<mode>) (mve_vcmpleq_n_f<mode>, mve_vcmpltq_f<mode>) (mve_vcmpltq_n_f<mode>, mve_vcmpneq_f<mode>) (mve_vcmpneq_n_f<mode>): Remove. * config/arm/unspecs.md (VCMPEQQ_F, VCMPEQQ_N_F, VCMPGEQ_F) (VCMPGEQ_N_F, VCMPGTQ_F, VCMPGTQ_N_F, VCMPLEQ_F, VCMPLEQ_N_F) (VCMPLTQ_F, VCMPLTQ_N_F, VCMPNEQ_F, VCMPNEQ_N_F): Remove.
* arm: MVE: Factorize all vcmp* integer patternsChristophe Lyon2021-05-101-231/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | After removing the signed and unsigned suffixes in the previous patches, we can now factorize the vcmp* patterns: there is no longer an asymmetry where operators do not have the same set of signed and unsigned variants. The will make maintenance easier. MVE has a different set of vector comparison operators than Neon, so we have to introduce dedicated iterators. 2021-05-10 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (MVE_COMPARISONS): New. (mve_cmp_op): New. (mve_cmp_type): New. * config/arm/mve.md (mve_vcmp<mve_cmp_op>q_<mode>): New, merge all mve_vcmp patterns. (mve_vcmpneq_<mode>, mve_vcmpcsq_n_<mode>, mve_vcmpcsq_<mode>) (mve_vcmpeqq_n_<mode>, mve_vcmpeqq_<mode>, mve_vcmpgeq_n_<mode>) (mve_vcmpgeq_<mode>, mve_vcmpgtq_n_<mode>, mve_vcmpgtq_<mode>) (mve_vcmphiq_n_<mode>, mve_vcmphiq_<mode>, mve_vcmpleq_n_<mode>) (mve_vcmpleq_<mode>, mve_vcmpltq_n_<mode>, mve_vcmpltq_<mode>) (mve_vcmpneq_n_<mode>, mve_vcmpltq_n_<mode>, mve_vcmpltq_<mode>) (mve_vcmpneq_n_<mode>): Remove.
* arm: MVE: Remove _s and _u suffixes from vcmp* builtins.Christophe Lyon2021-05-101-32/+32
| | | | | | | | | | | | | | This patch brings more unification in the vector comparison builtins, by removing the useless 's' (signed) suffix since we no longer need unsigned versions. 2021-05-10 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm_mve.h (__arm_vcmp*): Remove 's' suffix. * config/arm/arm_mve_builtins.def (vcmp*): Remove 's' suffix. * config/arm/mve.md (mve_vcmp*): Remove 's' suffix in pattern names.
* arm: MVE: Cleanup vcmpne/vcmpeq builtinsChristophe Lyon2021-05-101-8/+8
| | | | | | | | | | | | | | | | | | | | After the previous patch, we no longer need to emit the unsigned variants of vcmpneq/vcmpeqq. This patch removes them as well as the corresponding iterator entries. 2021-05-10 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/arm_mve_builtins.def (vcmpneq_u): Remove. (vcmpneq_n_u): Likewise. (vcmpeqq_u,): Likewise. (vcmpeqq_n_u): Likewise. * config/arm/iterators.md (supf): Remove VCMPNEQ_U, VCMPEQQ_U, VCMPEQQ_N_U and VCMPNEQ_N_U. * config/arm/mve.md (mve_vcmpneq): Remove <supf> iteration. (mve_vcmpeqq_n): Likewise. (mve_vcmpeqq): Likewise. (mve_vcmpneq_n): Likewise.
* arm: Fix wrong code with MVE V2DImode loads and stores [PR99960]Alex Coplan2021-05-101-30/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the PR shows, we currently miscompile V2DImode loads and stores for MVE. We're currently using 64-bit loads/stores, but need to be using 128-bit vector loads and stores. Fixed thusly. Some intrinsics tests were checking that we (incorrectly) used the 64-bit loads/stores: these have been updated. gcc/ChangeLog: PR target/99960 * config/arm/mve.md (*mve_mov<mode>): Simplify output code. Use vldrw.u32 and vstrw.32 for V2D[IF]mode loads and stores. gcc/testsuite/ChangeLog: PR target/99960 * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: Update now that we're (correctly) using full 128-bit vector loads/stores. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise. * gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: Likewise.
* arm: Various MVE vec_duplicate fixes [PR99647]Alex Coplan2021-04-081-18/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes various issues with vec_duplicate in the MVE patterns. Currently there are two patterns named *mve_mov<mode>. The second of these is really a vector duplicate rather than a move, so I've renamed it accordingly. As it stands, there are several issues with this pattern: 1. The MVE_types iterator has an entry for TImode, but vec_duplicate:TI is invalid. 2. The mode of the operand to vec_duplicate is SImode, but it should vary according to the vector mode iterator. 3. The second alternative of this pattern is bogus: it allows matching symbol_refs (the cause of the PR) and const_ints (which means that it matches (vec_duplicate (const_int ...)) which is non-canonical: such rtxes should be const_vectors instead and handled by the main vector move pattern). This patch fixes all of these issues, and removes the redundant *mve_vec_duplicate<mode> pattern. gcc/ChangeLog: PR target/99647 * config/arm/iterators.md (MVE_vecs): New. (V_elem): Also handle V2DF. * config/arm/mve.md (*mve_mov<mode>): Rename to ... (*mve_vdup<mode>): ... this. Remove second alternative since vec_duplicate of const_int is not canonical RTL, and we don't want to match symbol_refs. (*mve_vec_duplicate<mode>): Delete (pattern is redundant). gcc/testsuite/ChangeLog: PR target/99647 * gcc.c-torture/compile/pr99647.c: New test.
* arm: Fix MVE constraints for movmisalign [PR target/99727]Christophe Lyon2021-03-241-2/+2
| | | | | | | | | | | | | | | | MVE has different constraints than Neon for load/store: we should use the Ux constraint instead of Um. 2021-03-24 Christophe Lyon <christophe.lyon@linaro.org> PR target/99727 gcc/ * config/arm/mve.md (movmisalign<mode>_mve_store): Use Ux constraint. (movmisalign<mode>_mve_load): Likewise. gcc/testsuite/ * gcc.target/arm/pr99727.c: New test.
* arm: Fix MVE ICEs with vector moves and -mpure-code [PR97252]Alex Coplan2021-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes around 500 ICEs in the testsuite which can be seen when testing with -march=armv8.1-m.main+mve -mfloat-abi=hard -mpure-code (leaving the testsuite free of ICEs in this configuration). All of the ICEs are in arm_print_operand (which is expecting a mem and gets another rtx, e.g. a const_vector) when running the output code for *mve_mov<mode> in alternative 4. The issue is that MVE vector moves were relying on the arm_reorg pass to move constant vectors that we can't easily synthesize to the literal pool. This doesn't work for -mpure-code where the literal pool is disabled. LLVM puts these in .rodata: I've chosen to do the same here. With this change, for -mpure-code, we no longer want to allow a constant on the RHS of a vector load in RA. To achieve this, I added a new constraint which matches constants only if the literal pool is available. gcc/ChangeLog: PR target/97252 * config/arm/arm-protos.h (neon_make_constant): Add generate argument to guard emitting insns, default to true. * config/arm/arm.c (arm_legitimate_constant_p_1): Reject CONST_VECTORs which neon_make_constant can't handle. (neon_vdup_constant): Add generate argument, avoid emitting insns if it's not set. (neon_make_constant): Plumb new generate argument through. * config/arm/constraints.md (Ui): New. Use it... * config/arm/mve.md (*mve_mov<mode>): ... here. * config/arm/vec-common.md (movv8hf): Use neon_make_constant to synthesize constants.
* arm: Auto-vectorization for MVE: vornChristophe Lyon2021-02-021-8/+15
| | | | | | | | | | | | | | | | | | | | This patch enables MVE vornq instructions for auto-vectorization. MVE vornq insns in mve.md are modified to use ior instead of unspec expression. 2021-02-01 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (supf): Remove VORNQ_S and VORNQ_U. (VORNQ): Remove. * config/arm/mve.md (mve_vornq_s<mode>): New entry for vorn instruction using expression ior. (mve_vornq_u<mode>): New expander. (mve_vornq_f<mode>): Use ior code instead of unspec. * config/arm/unspecs.md (VORNQ_S, VORNQ_U, VORNQ_F): Remove. gcc/testsuite/ * gcc.target/arm/simd/mve-vorn.c: Add vorn tests.
* Arm: Add NEON and MVE complex mul, mla and mls patterns.Tamar Christina2021-01-251-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds implementation for the optabs for complex operations. With this the following C code: void g (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] * b[i]; } generates NEON: g: vmov.f32 q11, #0.0 @ v4sf add r3, r2, #1600 .L2: vmov q8, q11 @ v4sf vld1.32 {q10}, [r1]! vld1.32 {q9}, [r0]! vcmla.f32 q8, q9, q10, #0 vcmla.f32 q8, q9, q10, #90 vst1.32 {q8}, [r2]! cmp r3, r2 bne .L2 bx lr MVE: g: push {lr} mov lr, #100 dls lr, lr .L2: vldrw.32 q1, [r1], #16 vldrw.32 q2, [r0], #16 vcmul.f32 q3, q2, q1, #0 vcmla.f32 q3, q2, q1, #90 vstrw.32 q3, [r2], #16 le lr, .L2 ldr pc, [sp], #4 instead of g: add r3, r2, #1600 .L2: vld2.32 {d20-d23}, [r0]! vld2.32 {d16-d19}, [r1]! vmul.f32 q14, q11, q9 vmul.f32 q15, q11, q8 vneg.f32 q14, q14 vfma.f32 q15, q10, q9 vfma.f32 q14, q10, q8 vmov q13, q15 @ v4sf vmov q12, q14 @ v4sf vst2.32 {d24-d27}, [r2]! cmp r3, r2 bne .L2 bx lr and g: add r3, r2, #1600 .L2: vld2.32 {d20-d23}, [r0]! vld2.32 {d16-d19}, [r1]! vmul.f32 q15, q10, q8 vmul.f32 q14, q10, q9 vmls.f32 q15, q11, q9 vmla.f32 q14, q11, q8 vmov q12, q15 @ v4sf vmov q13, q14 @ v4sf vst2.32 {d24-d27}, [r2]! cmp r3, r2 bne .L2 bx lr respectively. gcc/ChangeLog: * config/arm/iterators.md (rotsplit1, rotsplit2, conj_op, fcmac1, VCMLA_OP, VCMUL_OP): New. * config/arm/mve.md (mve_vcmlaq<mve_rot><mode>): Support vec_dup 0. * config/arm/neon.md (cmul<conj_op><mode>3): New. * config/arm/unspecs.md (UNSPEC_VCMLA_CONJ, UNSPEC_VCMLA180_CONJ, UNSPEC_VCMUL_CONJ): New. * config/arm/vec-common.md (cmul<conj_op><mode>3, arm_vcmla<rot><mode>, cml<fcmac1><conj_op><mode>4): New.
* arm: Auto-vectorization for MVE: vshrChristophe Lyon2021-01-151-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | This patch enables MVE vshr instructions for auto-vectorization. New MVE patterns are introduced that take a vector of constants as second operand, all constants being equal. The existing mve_vshrq_n_<supf><mode> is kept, as it takes a single immediate as second operand, and is used by arm_mve.h. The vashr<mode>3 and vlshr<mode>3 expanders are moved fron neon.md to vec-common.md, updated to rely on the normal expansion scheme to generate shifts by immediate. 2020-12-03 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/mve.md (mve_vshrq_n_s<mode>_imm): New entry. (mve_vshrq_n_u<mode>_imm): Likewise. * config/arm/neon.md (vashr<mode>3, vlshr<mode>3): Move to ... * config/arm/vec-common.md: ... here. gcc/testsuite/ * gcc.target/arm/simd/mve-vshr.c: Add tests for vshr.
* arm: Auto-vectorization for MVE: vshlChristophe Lyon2021-01-151-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables MVE vshlq instructions for auto-vectorization. The existing mve_vshlq_n_<supf><mode> is kept, as it takes a single immediate as second operand, and is used by arm_mve.h. We move the vashl<mode>3 insn from neon.md to an expander in vec-common.md, and the mve_vshlq_<supf><mode> insn from mve.md to vec-common.md, adding the second alternative fron neon.md. mve_vshlq_<supf><mode> will be used by a later patch enabling vectorization for vshr, as a unified version of ashl3<mode3>_[signed|unsigned] from neon.md. Keeping the use of unspec VSHLQ enables to generate both 's' and 'u' variants. It is not clear whether the neon_shift_[reg|imm]<q> attribute is still suitable, since this insn is also used for MVE. I kept the mve_vshlq_<supf><mode> naming instead of renaming it to ashl3_<supf>_<mode> as discussed because the reference in arm_mve_builtins.def automatically inserts the "mve_" prefix and I didn't want to make a special case for this. I haven't yet found why the v16qi and v8hi tests are not vectorized. With dest[i] = a[i] << b[i] and: { int i; unsigned int i.24_1; unsigned int _2; int16_t * _3; short int _4; int _5; int16_t * _6; short int _7; int _8; int _9; int16_t * _10; short int _11; unsigned int ivtmp_42; unsigned int ivtmp_43; <bb 2> [local count: 119292720]: <bb 3> [local count: 954449105]: i.24_1 = (unsigned int) i_23; _2 = i.24_1 * 2; _3 = a_15(D) + _2; _4 = *_3; _5 = (int) _4; _6 = b_16(D) + _2; _7 = *_6; _8 = (int) _7; _9 = _5 << _8; _10 = dest_17(D) + _2; _11 = (short int) _9; *_10 = _11; i_19 = i_23 + 1; ivtmp_42 = ivtmp_43 - 1; if (ivtmp_42 != 0) goto <bb 5>; [87.50%] else goto <bb 4>; [12.50%] <bb 5> [local count: 835156386]: goto <bb 3>; [100.00%] <bb 4> [local count: 119292720]: return; } the vectorizer says: mve-vshl.c:37:96: note: ==> examining statement: _5 = (int) _4; mve-vshl.c:37:96: note: vect_is_simple_use: operand *_3, type of def: internal mve-vshl.c:37:96: note: vect_is_simple_use: vectype vector(8) short int mve-vshl.c:37:96: missed: conversion not supported by target. mve-vshl.c:37:96: note: vect_is_simple_use: operand *_3, type of def: internal mve-vshl.c:37:96: note: vect_is_simple_use: vectype vector(8) short int mve-vshl.c:37:96: note: vect_is_simple_use: operand *_3, type of def: internal mve-vshl.c:37:96: note: vect_is_simple_use: vectype vector(8) short int mve-vshl.c:37:117: missed: not vectorized: relevant stmt not supported: _5 = (int) _4; mve-vshl.c:37:96: missed: bad operation or unsupported loop bound. mve-vshl.c:37:96: note: ***** Analysis failed with vector mode V8HI 2020-12-03 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/mve.md (mve_vshlq_<supf><mode>): Move to vec-commond.md. * config/arm/neon.md (vashl<mode>3): Delete. * config/arm/vec-common.md (mve_vshlq_<supf><mode>): New. (vasl<mode>3): New expander. gcc/testsuite/ * gcc.target/arm/simd/mve-vshl.c: Add tests for vshl.
* arm: Add movmisalign patterns for MVE (PR target/97875)Christophe Lyon2021-01-121-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds new movmisalign<mode>_mve_load and store patterns for MVE to help vectorization. They are very similar to their Neon counterparts, but use different iterators and instructions. Indeed MVE supports less vectors modes than Neon, so we use the MVE_VLD_ST iterator where Neon uses VQX. Since the supported modes are different from the ones valid for arithmetic operators, we introduce two new sets of macros: ARM_HAVE_NEON_<MODE>_LDST true if Neon has vector load/store instructions for <MODE> ARM_HAVE_<MODE>_LDST true if any vector extension has vector load/store instructions for <MODE> We move the movmisalign<mode> expander from neon.md to vec-commond.md, and replace the TARGET_NEON enabler with ARM_HAVE_<MODE>_LDST. The patch also updates the mve-vneg.c test to scan for the better code generation when loading and storing the vectors involved: it checks that no 'orr' instruction is generated to cope with misalignment at runtime. This test was chosen among the other mve tests, but any other should be OK. Using a plain vector copy loop (dest[i] = a[i]) is not a good test because the compiler chooses to use memcpy. For instance we now generate: test_vneg_s32x4: vldrw.32 q3, [r1] vneg.s32 q3, q3 vstrw.32 q3, [r0] bx lr instead of: test_vneg_s32x4: orr r3, r1, r0 lsls r3, r3, #28 bne .L15 vldrw.32 q3, [r1] vneg.s32 q3, q3 vstrw.32 q3, [r0] bx lr .L15: push {r4, r5} ldrd r2, r3, [r1, #8] ldrd r5, r4, [r1] rsbs r2, r2, #0 rsbs r5, r5, #0 rsbs r4, r4, #0 rsbs r3, r3, #0 strd r5, r4, [r0] pop {r4, r5} strd r2, r3, [r0, #8] bx lr 2021-01-12 Christophe Lyon <christophe.lyon@linaro.org> PR target/97875 gcc/ * config/arm/arm.h (ARM_HAVE_NEON_V8QI_LDST): New macro. (ARM_HAVE_NEON_V16QI_LDST, ARM_HAVE_NEON_V4HI_LDST): Likewise. (ARM_HAVE_NEON_V8HI_LDST, ARM_HAVE_NEON_V2SI_LDST): Likewise. (ARM_HAVE_NEON_V4SI_LDST, ARM_HAVE_NEON_V4HF_LDST): Likewise. (ARM_HAVE_NEON_V8HF_LDST, ARM_HAVE_NEON_V4BF_LDST): Likewise. (ARM_HAVE_NEON_V8BF_LDST, ARM_HAVE_NEON_V2SF_LDST): Likewise. (ARM_HAVE_NEON_V4SF_LDST, ARM_HAVE_NEON_DI_LDST): Likewise. (ARM_HAVE_NEON_V2DI_LDST): Likewise. (ARM_HAVE_V8QI_LDST, ARM_HAVE_V16QI_LDST): Likewise. (ARM_HAVE_V4HI_LDST, ARM_HAVE_V8HI_LDST): Likewise. (ARM_HAVE_V2SI_LDST, ARM_HAVE_V4SI_LDST, ARM_HAVE_V4HF_LDST): Likewise. (ARM_HAVE_V8HF_LDST, ARM_HAVE_V4BF_LDST, ARM_HAVE_V8BF_LDST): Likewise. (ARM_HAVE_V2SF_LDST, ARM_HAVE_V4SF_LDST, ARM_HAVE_DI_LDST): Likewise. (ARM_HAVE_V2DI_LDST): Likewise. * config/arm/mve.md (*movmisalign<mode>_mve_store): New pattern. (*movmisalign<mode>_mve_load): New pattern. * config/arm/neon.md (movmisalign<mode>): Move to ... * config/arm/vec-common.md: ... here. PR target/97875 gcc/testsuite/ * gcc.target/arm/simd/mve-vneg.c: Update test.
* Update copyright years.Jakub Jelinek2021-01-041-1/+1
|
* Arm: MVE: Split refactoring of remaining complex instrinsicsTamar Christina2020-12-161-105/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This refactors the complex numbers bits of MVE to go through the same unspecs as the NEON variant. This is pre-work to allow code to be shared between NEON and MVE for the complex vectorization patches. gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vcmulq_rot90_f16): (__arm_vcmulq_rot270_f16, _arm_vcmulq_rot180_f16, __arm_vcmulq_f16, __arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32, __arm_vcmulq_rot180_f32, __arm_vcmulq_f32, __arm_vcmlaq_f16, __arm_vcmlaq_rot180_f16, __arm_vcmlaq_rot270_f16, __arm_vcmlaq_rot90_f16, __arm_vcmlaq_f32, __arm_vcmlaq_rot180_f32, __arm_vcmlaq_rot270_f32, __arm_vcmlaq_rot90_f32): Update builtin calls. * config/arm/arm_mve_builtins.def (vcmulq_f, vcmulq_rot90_f, vcmulq_rot180_f, vcmulq_rot270_f, vcmlaq_f, vcmlaq_rot90_f, vcmlaq_rot180_f, vcmlaq_rot270_f): Removed. (vcmulq, vcmulq_rot90, vcmulq_rot180, vcmulq_rot270, vcmlaq, vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270): New. * config/arm/iterators.md (mve_rot): Add UNSPEC_VCMLA, UNSPEC_VCMLA90, UNSPEC_VCMLA180, UNSPEC_VCMLA270, UNSPEC_VCMUL, UNSPEC_VCMUL90, UNSPEC_VCMUL180, UNSPEC_VCMUL270. (VCMUL): New. * config/arm/mve.md (mve_vcmulq_f<mode, mve_vcmulq_rot180_f<mode>, mve_vcmulq_rot270_f<mode>, mve_vcmulq_rot90_f<mode>, mve_vcmlaq_f<mode>, mve_vcmlaq_rot180_f<mode>, mve_vcmlaq_rot270_f<mode>, mve_vcmlaq_rot90_f<mode>): Removed. (mve_vcmlaq<mve_rot><mode>, mve_vcmulq<mve_rot><mode>, mve_vcaddq<mve_rot><mode>, cadd<rot><mode>3, mve_vcaddq<mve_rot><mode>): New. * config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270, UNSPEC_VCMUL, UNSPEC_VCMUL180): New. (VCMULQ_F, VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F, VCMLAQ_F, VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F): Removed.
* Arm: Add NEON and MVE RTL patterns for Complex Addition.Tamar Christina2020-12-161-37/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds implementation for the optabs for complex additions. With this the following C code: void f90 (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] + (b[i] * I); } generates f90: add r3, r2, #1600 .L2: vld1.32 {q8}, [r0]! vld1.32 {q9}, [r1]! vcadd.f32 q8, q8, q9, #90 vst1.32 {q8}, [r2]! cmp r3, r2 bne .L2 bx lr instead of f90: add r3, r2, #1600 .L2: vld2.32 {d24-d27}, [r0]! vld2.32 {d20-d23}, [r1]! vsub.f32 q8, q12, q11 vadd.f32 q9, q13, q10 vst2.32 {d16-d19}, [r2]! cmp r3, r2 bne .L2 bx lr gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vcaddq_rot90_u8, __arm_vcaddq_rot270_u8, __arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8, __arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16, __arm_vcaddq_rot90_s16, __arm_vcaddq_rot270_s16, __arm_vcaddq_rot90_u32, __arm_vcaddq_rot270_u32, __arm_vcaddq_rot90_s32, __arm_vcaddq_rot270_s32, __arm_vcaddq_rot90_f16, __arm_vcaddq_rot270_f16, __arm_vcaddq_rot90_f32, __arm_vcaddq_rot270_f32): Update builtin calls. * config/arm/arm_mve_builtins.def (vcaddq_rot90_u, vcaddq_rot270_u, vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f, vcaddq_rot270_f): Removed. (vcaddq_rot90, vcaddq_rot270): New. * config/arm/constraints.md (Dz): Include MVE. * config/arm/iterators.md (mve_rot): New. (supf): Remove VCADDQ_ROT270_S, VCADDQ_ROT270_U, VCADDQ_ROT90_S, VCADDQ_ROT90_U. (VCADDQ_ROT270, VCADDQ_ROT90): Removed. * config/arm/mve.md (mve_vcaddq_rot270_<supf><mode, mve_vcaddq_rot90_<supf><mode>, mve_vcaddq_rot270_f<mode>, mve_vcaddq_rot90_f<mode>): Removed. (mve_vcaddq<mve_rot><mode>, mve_vcaddq<mve_rot><mode>): New. * config/arm/unspecs.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S, VCADDQ_ROT270_U, VCADDQ_ROT90_U, VCADDQ_ROT270_F, VCADDQ_ROT90_F): Removed. * config/arm/vec-common.md (cadd<rot><mode>3): New.
* arm: Auto-vectorization for MVE: vnegChristophe Lyon2020-12-141-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables MVE vneg instructions for auto-vectorization. MVE vnegq insns in mve.md are modified to use 'neg' instead of unspec expression. The neg<mode>2 expander is added to vec-common.md. Existing patterns in neon.md are prefixed with neon_. It's not clear why we have different patterns for VDQW and VH in neon.md, when WDQWH handles both, and patterns with VDQ have provision for attributes for FP modes. Another question is why <absneg_str><mode>2 always sets neon_abs<q> type when it also handles neon_neq<q> cases. 2020-12-11 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/mve.md (mve_vnegq_f): Use 'neg' instead of unspec. (mve_vnegq_s): Likewise. * config/arm/neon.md (neg<mode>2): Rename into neon_neg<mode>2. (<absneg_str><mode>2): Rename into neon_<absneg_str><mode>2. (neon_v<absneg_str><mode>): Call gen_neon_<absneg_str><mode>2. (vashr<mode>3): Call gen_neon_neg<mode>2. (vlshr<mode>3): Call gen_neon_neg<mode>2. (neon_vneg<mode>): Call gen_neon_neg<mode>2. * config/arm/unspecs.md (VNEGQ_F, VNEGQ_S): Remove. * config/arm/vec-common.md (neg<mode>2): New expander. gcc/testsuite/ * gcc.target/arm/simd/mve-vneg.c: Add tests for vneg.
* arm: Auto-vectorization for MVE: vmvnChristophe Lyon2020-12-141-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | This patch enables MVE vmvnq instructions for auto-vectorization. MVE vmvnq insns in mve.md are modified to use 'not' instead of unspec expression to support one_cmpl<mode>2. The one_cmpl<mode>2 expander is added to vec-common.md. 2020-12-11 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (VDQNOTM2): New mode iterator. (supf): Remove VMVNQ_S and VMVNQ_U. (VMVNQ): Remove. * config/arm/mve.md (mve_vmvnq_u<mode>): New entry for vmvn instruction using expression not. (mve_vmvnq_s<mode>): New expander. * config/arm/neon.md (one_cmpl<mode>2): Renamed into one_cmpl<mode>2_neon. * config/arm/unspecs.md (VMVNQ_S, VMVNQ_U): Remove. * config/arm/vec-common.md (one_cmpl<mode>2): New expander. gcc/testsuite/ * gcc.target/arm/simd/mve-vmvn.c: Add tests for vmvn.
* arm: Auto-vectorization for MVE: vbicChristophe Lyon2020-12-141-8/+15
| | | | | | | | | | | | | | | | | | | | This patch enables MVE vbic instructions for auto-vectorization. MVE vbicq insns in mve.md are modified to use 'and not' instead of unspec expression. 2020-12-11 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (supf): Remove VBICQ_S and VBICQ_U. (VBICQ): Remove. * config/arm/mve.md (mve_vbicq_u<mode>): New entry for vbic instruction using expression and not. (mve_vbicq_s<mode>): New expander. (mve_vbicq_f<mode>): Replace use of unspec by 'and not'. * config/arm/unspecs.md (VBICQ_S, VBICQ_U, VBICQ_F): Remove. gcc/testsuite/ * gcc.target/arm/simd/mve-vbic.c: Add tests for vbic.
* arm: Auto-vectorization for MVE: veorChristophe Lyon2020-12-141-8/+14
| | | | | | | | | | | | | | | | | | | | | | | This patch enables MVE veorq instructions for auto-vectorization. MVE veorq insns in mve.md are modified to use xor instead of unspec expression to support xor<mode>3. The xor<mode>3 expander is added to vec-common.md 2020-12-11 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (supf): Remove VEORQ_S and VEORQ_U. (VEORQ): Remove. * config/arm/mve.md (mve_veorq_u<mode>): New entry for veor instruction using expression xor. (mve_veorq_s<mode>): New expander. (mve_veorq_f<mode>): Use 'xor' code instead of unspec. * config/arm/neon.md (xor<mode>3): Renamed into xor<mode>3_neon. * config/arm/unspecs.md (VEORQ_S, VEORQ_U, VEORQ_F): Remove. * config/arm/vec-common.md (xor<mode>3): New expander. gcc/testsuite/ * gcc.target/arm/simd/mve-veor.c: Add tests for veor.
* Revert "Arm: Add NEON and MVE RTL patterns for Complex Addition, Multiply ↵Tamar Christina2020-12-131-30/+142
| | | | | | | | and FMA." This reverts commit 3b8a82f97dd48e153ce93b317c44254839e11461. Has a dependency on the AArch64 patch which hasn't been approved yet.
* Arm: Add NEON and MVE RTL patterns for Complex Addition, Multiply and FMA.Tamar Christina2020-12-131-142/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds implementation for the optabs for complex additions. With this the following C code: void f90 (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] + (b[i] * I); } generates f90: add r3, r2, #1600 .L2: vld1.32 {q8}, [r0]! vld1.32 {q9}, [r1]! vcadd.f32 q8, q8, q9, #90 vst1.32 {q8}, [r2]! cmp r3, r2 bne .L2 bx lr instead of f90: add r3, r2, #1600 .L2: vld2.32 {d24-d27}, [r0]! vld2.32 {d20-d23}, [r1]! vsub.f32 q8, q12, q11 vadd.f32 q9, q13, q10 vst2.32 {d16-d19}, [r2]! cmp r3, r2 bne .L2 bx lr gcc/ChangeLog: * config/arm/arm_mve.h (__arm_vcaddq_rot90_u8, __arm_vcaddq_rot270_u8, , __arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8, __arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16, __arm_vcaddq_rot90_s16, __arm_vcaddq_rot270_s16, __arm_vcaddq_rot90_u32, __arm_vcaddq_rot270_u32, __arm_vcaddq_rot90_s32, __arm_vcaddq_rot270_s32, __arm_vcmulq_rot90_f16, __arm_vcmulq_rot270_f16, __arm_vcmulq_rot180_f16, __arm_vcmulq_f16, __arm_vcaddq_rot90_f16, __arm_vcaddq_rot270_f16, __arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32, __arm_vcmulq_rot180_f32, __arm_vcmulq_f32, __arm_vcaddq_rot90_f32, __arm_vcaddq_rot270_f32, __arm_vcmlaq_f16, __arm_vcmlaq_rot180_f16, __arm_vcmlaq_rot270_f16, __arm_vcmlaq_rot90_f16, __arm_vcmlaq_f32, __arm_vcmlaq_rot180_f32, __arm_vcmlaq_rot270_f32, __arm_vcmlaq_rot90_f32): Update builtin calls. * config/arm/arm_mve_builtins.def (vcaddq_rot90_u, vcaddq_rot270_u, vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f, vcaddq_rot270_f, vcmulq_f, vcmulq_rot90_f, vcmulq_rot180_f, vcmulq_rot270_f, vcmlaq_f, vcmlaq_rot90_f, vcmlaq_rot180_f, vcmlaq_rot270_f): Removed. (vcaddq_rot90, vcaddq_rot270, vcmulq, vcmulq_rot90, vcmulq_rot180, vcmulq_rot270, vcmlaq, vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270): New. * config/arm/constraints.md (Dz): Include MVE. * config/arm/iterators.md (mve_rotsplit1, mve_rotsplit2): New. (rot): Add UNSPEC_VCMLS, UNSPEC_VCMUL and UNSPEC_VCMUL180. (rot_op, rotsplit1, rotsplit2, fcmac1, VCMLA_OP, VCMUL_OP): New. * config/arm/mve.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S, VCADDQ_ROT270_U, VCADDQ_ROT90_U, VCADDQ_ROT270_F, VCADDQ_ROT90_F, VCMULQ_F, VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F, VCMLAQ_F, VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F, VCADDQ_ROT270_S, VCADDQ_ROT270, VCADDQ_ROT90): Removed. (mve_rot, VCMUL): New. (mve_vcaddq_rot270_<supf><mode, mve_vcaddq_rot90_<supf><mode>, mve_vcaddq_rot270_f<mode>, mve_vcaddq_rot90_f<mode>, mve_vcmulq_f<mode, mve_vcmulq_rot180_f<mode>, mve_vcmulq_rot270_f<mode>, mve_vcmulq_rot90_f<mode>, mve_vcmlaq_f<mode>, mve_vcmlaq_rot180_f<mode>, mve_vcmlaq_rot270_f<mode>, mve_vcmlaq_rot90_f<mode>): Removed. (mve_vcmlaq<mve_rot><mode>, mve_vcmulq<mve_rot><mode>, mve_vcaddq<mve_rot><mode>, cadd<rot><mode>3, mve_vcaddq<mve_rot><mode>): New. (cmul<rot_op><mode>3): Exclude MVE types. * config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270): New. * config/arm/vec-common.md (cadd<rot><mode>3, cmul<rot_op><mode>3, arm_vcmla<rot><mode>, cml<fcmac1><rot_op><mode>4): New. * config/arm/unspecs.md (UNSPEC_VCMUL, UNSPEC_VCMUL180, UNSPEC_VCMLS, UNSPEC_VCMLS180): New. * config/arm/neon.md (cmul<rot_op><mode>3): New.
* arm: Auto-vectorization for MVE: vorrChristophe Lyon2020-12-111-9/+21
| | | | | | | | | | | | | | | | | | | | | | | | | This patch enables MVE vorrq instructions for auto-vectorization. MVE vorrq insns in mve.md are modified to use ior instead of unspec expression to support ior<mode>3. The ior<mode>3 expander is added to vec-common.md 2020-12-03 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (supf): Remove VORRQ_S and VORRQ_U. (VORRQ): Remove. * config/arm/mve.md (mve_vorrq_s<mode>): New entry for vorr instruction using expression ior. (mve_vorrq_u<mode>): New expander. (mve_vorrq_f<mode>): Use ior code instead of unspec. * config/arm/neon.md (ior<mode>3): Renamed into ior<mode>3_neon. * config/arm/predicates.md (imm_for_neon_logic_operand): Enable for MVE. * config/arm/unspecs.md (VORRQ_S, VORRQ_U, VORRQ_F): Remove. * config/arm/vec-common.md (ior<mode>3): New expander. gcc/testsuite/ * gcc.target/arm/simd/mve-vorr.c: Add vorr tests.
* arm: Auto-vectorization for MVE: vandChristophe Lyon2020-12-101-9/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables MVE vandq instructions for auto-vectorization. MVE vandq insns in mve.md are modified to use 'and' instead of unspec expression to support and<mode>3. The and<mode>3 expander is added to vec-common.md 2020-12-03 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U. (VANQ): Remove. (VDQ): Add TARGET_HAVE_MVE condition where relevant. * config/arm/mve.md (mve_vandq_u<mode>): New entry for vand instruction using expression 'and'. (mve_vandq_s<mode>): New expander. (mve_vaddq_n_f<mode>): Use 'and' code instead of unspec. * config/arm/neon.md (and<mode>3): Rename into and<mode>3_neon. * config/arm/predicates.md (imm_for_neon_inv_logic_operand): Enable for MVE. * config/arm/unspecs.md (VANDQ_S, VANDQ_U, VANDQ_F): Remove. * config/arm/vec-common.md (and<mode>3): New expander. gcc/testsuite/ * gcc.target/arm/simd/mve-vand.c: New test.
* arm: Auto-vectorization for MVE: vsubDennis Zhang2020-10-231-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | This patch enables MVE vsub instructions for auto-vectorization. The sub<mode>3 in vec-common.md is modified to use new mode macros to include MVE extension for vectorization. MVE vsub insns in mve.md are modified to use 'minus' instead of unspec expression to support sub<mode>3. Use VDQ instead fo VALL to cover all supported modes. The redundant sub<mode>3 insns in neon.md are then removed. gcc/ChangeLog: 2020-10-23 Dennis Zhang <dennis.zhang@arm.com> * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction using expression 'minus'. (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec. * config/arm/neon.md (sub<mode>3, sub<mode>3_fp16): Removed. (neon_vsub<mode>): Use gen_sub<mode>3 instead of gen_sub<mode>3_fp16. * config/arm/vec-common.md (sub<mode>3): Use the new mode macros ARM_HAVE_<MODE>_ARITH. Use iterator VDQ instead of VALL. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/mve-vsub_1.c: New test.
* arm: Auto-vectorization for MVE: vmin/vmaxDennis Zhang2020-10-221-16/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables MVE vmin/vmax instructions for auto-vectorization. MVE target is included in expander smin<mode>3, umin<mode>3, smax<mode>3 and umax<mode>3 for vectorization. Related insns for vmin/vmax in mve.md are modified to use smin, umin, smax and umax expressions instead of unspec to support the expanders. gcc/ChangeLog: 2020-10-22 Dennis Zhang <dennis.zhang@arm.com> * config/arm/mve.md (mve_vmaxq_<supf><mode>): Replace with ... (mve_vmaxq_s<mode>, mve_vmaxq_u<mode>): ... these new insns to use smax/umax instead of VMAXQ. (mve_vminq_<supf><mode>): Replace with ... (mve_vminq_s<mode>, mve_vminq_u<mode>): ... these new insns to use smin/umin instead of VMINQ. (mve_vmaxnmq_f<mode>): Use smax instead of VMAXNMQ_F. (mve_vminnmq_f<mode>): Use smin instead of VMINNMQ_F. * config/arm/vec-common.md (smin<mode>3): Use the new mode macros ARM_HAVE_<MODE>_ARITH. (umin<mode>3, smax<mode>3, umax<mode>3): Likewise. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/mve-vminmax_1.c: New test.
* arm: Auto-vectorization for MVE: vmulDennis Zhang2020-10-221-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables MVE vmul instructions for auto-vectorization. It includes MVE in expander mul<mode>3 to enable vectorization for MVE. Related MVE vmul insns are modified to support the expander by using expression 'mult' instead of unspec. The mul<mode>3 for vectorization in vec-common.md uses mode iterator VDQWH instead of VALLW to cover all supported modes. The macros ARM_HAVE_NEON_<MODE>_ARITH are used to select supported modes for different targets. The redundant mul<mode>3 in neon.md is removed. gcc/ChangeLog: 2020-10-22 Dennis Zhang <dennis.zhang@arm.com> * config/arm/mve.md (mve_vmulq<mode>): New entry for vmul instruction using expression 'mult'. (mve_vmulq_f<mode>): Use mult instead of VMULQ_F. * config/arm/neon.md (mul<mode>3): Removed. * config/arm/vec-common.md (mul<mode>3): Use the new mode macros ARM_HAVE_<MODE>_ARITH. Use mode iterator VDQWH instead of VALLW. gcc/testsuite/ChangeLog: * gcc.target/arm/simd/mve-vmul_1.c: New test.
* arm: Fix wrong code generated for mve scatter store with writeback ↵Srinath Parvathaneni2020-10-161-188/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intrinsics with -O2 (PR97271). This patch fixes (PR97271) the wrong code-gen for mve scatter store with writeback intrinsics with -O2. $cat bug.c void foo (uint32x4_t * addr, const int offset, int32x4_t value) { vstrwq_scatter_base_wb_s32 (addr, 8, value); } $ arm-none-eabi-gcc bug.c -S -O2 -march=armv8.1-m.main+mve -mfloat-abi=hard -o - Without this patch: ... foo: vldrw.32 q3, [r0] vstrw.u32 q0, [q3, #8]! ---> (A) vldr.64 d4, .L3 vldr.64 d5, .L3+8 vldrw.32 q3, [r0] vstrw.u32 q2, [q3, #8]! ---> (B) bx lr ... With this patch: ... foo: vldrw.32 q3, [r0] vstrw.u32 q0, [q3, #8]! --> (C) vstrw.32 q3, [r0] bx lr ... Without this patch 2 vstrw assembly instructions (A and B) are generated for vstrwq_scatter_base_wb_s32 intrinsic where as fix generates only one vstrw assembly instruction (C). gcc/ChangeLog: 2020-10-06 Srinath Parvathaneni <srinath.parvathaneni@arm.com> PR target/97291 * config/arm/arm-builtins.c (arm_strsbwbs_qualifiers): Modify array. (arm_strsbwbu_qualifiers): Likewise. (arm_strsbwbs_p_qualifiers): Likewise. (arm_strsbwbu_p_qualifiers): Likewise. * config/arm/arm_mve.h (__arm_vstrdq_scatter_base_wb_s64): Modify function definition. (__arm_vstrdq_scatter_base_wb_u64): Likewise. (__arm_vstrdq_scatter_base_wb_p_s64): Likewise. (__arm_vstrdq_scatter_base_wb_p_u64): Likewise. (__arm_vstrwq_scatter_base_wb_p_s32): Likewise. (__arm_vstrwq_scatter_base_wb_p_u32): Likewise. (__arm_vstrwq_scatter_base_wb_s32): Likewise. (__arm_vstrwq_scatter_base_wb_u32): Likewise. (__arm_vstrwq_scatter_base_wb_f32): Likewise. (__arm_vstrwq_scatter_base_wb_p_f32): Likewise. * config/arm/arm_mve_builtins.def (vstrwq_scatter_base_wb_add_u): Remove expansion for the builtin. (vstrwq_scatter_base_wb_add_s): Likewise. (vstrwq_scatter_base_wb_add_f): Likewise. (vstrdq_scatter_base_wb_add_u): Likewise. (vstrdq_scatter_base_wb_add_s): Likewise. (vstrwq_scatter_base_wb_p_add_u): Likewise. (vstrwq_scatter_base_wb_p_add_s): Likewise. (vstrwq_scatter_base_wb_p_add_f): Likewise. (vstrdq_scatter_base_wb_p_add_u): Likewise. (vstrdq_scatter_base_wb_p_add_s): Likewise. * config/arm/mve.md (mve_vstrwq_scatter_base_wb_<supf>v4si): Remove expand. (mve_vstrwq_scatter_base_wb_add_<supf>v4si): Likewise. (mve_vstrwq_scatter_base_wb_<supf>v4si_insn): Rename pattern to ... (mve_vstrwq_scatter_base_wb_<supf>v4si): This. (mve_vstrwq_scatter_base_wb_p_<supf>v4si): Remove expand. (mve_vstrwq_scatter_base_wb_p_add_<supf>v4si): Likewise. (mve_vstrwq_scatter_base_wb_p_<supf>v4si_insn): Rename pattern to ... (mve_vstrwq_scatter_base_wb_p_<supf>v4si): This. (mve_vstrwq_scatter_base_wb_fv4sf): Remove expand. (mve_vstrwq_scatter_base_wb_add_fv4sf): Likewise. (mve_vstrwq_scatter_base_wb_fv4sf_insn): Rename pattern to ... (mve_vstrwq_scatter_base_wb_fv4sf): This. (mve_vstrwq_scatter_base_wb_p_fv4sf): Remove expand. (mve_vstrwq_scatter_base_wb_p_add_fv4sf): Likewise. (mve_vstrwq_scatter_base_wb_p_fv4sf_insn): Rename pattern to ... (mve_vstrwq_scatter_base_wb_p_fv4sf): This. (mve_vstrdq_scatter_base_wb_<supf>v2di): Remove expand. (mve_vstrdq_scatter_base_wb_add_<supf>v2di): Likewise. (mve_vstrdq_scatter_base_wb_<supf>v2di_insn): Rename pattern to ... (mve_vstrdq_scatter_base_wb_<supf>v2di): This. (mve_vstrdq_scatter_base_wb_p_<supf>v2di): Remove expand. (mve_vstrdq_scatter_base_wb_p_add_<supf>v2di): Likewise. (mve_vstrdq_scatter_base_wb_p_<supf>v2di_insn): Rename pattern to ... (mve_vstrdq_scatter_base_wb_p_<supf>v2di): This. gcc/testsuite/ChangeLog: PR target/97291 * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_s64.c: Modify. * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c: Likewise.
* arm: [MVE] Remove illegal intrinsics (PR target/96914)Christophe Lyon2020-10-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A few MVE intrinsics had an unsigned variant implement while they are supported by the hardware. This patch removes them: __arm_vqrdmlashq_n_u8 __arm_vqrdmlahq_n_u8 __arm_vqdmlahq_n_u8 __arm_vqrdmlashq_n_u16 __arm_vqrdmlahq_n_u16 __arm_vqdmlahq_n_u16 __arm_vqrdmlashq_n_u32 __arm_vqrdmlahq_n_u32 __arm_vqdmlahq_n_u32 __arm_vmlaldavaxq_p_u32 __arm_vmlaldavaxq_p_u16 2020-10-08 Christophe Lyon <christophe.lyon@linaro.org> gcc/ PR target/96914 * config/arm/arm_mve.h (vqrdmlashq_n_u8, vqrdmlashq_n_u16) (vqrdmlashq_n_u32, vqrdmlahq_n_u8, vqrdmlahq_n_u16) (vqrdmlahq_n_u32, vqdmlahq_n_u8, vqdmlahq_n_u16, vqdmlahq_n_u32) (vmlaldavaxq_p_u16, vmlaldavaxq_p_u32): Remove. * config/arm/arm_mve_builtins.def (vqrdmlashq_n_u, vqrdmlahq_n_u) (vqdmlahq_n_u, vmlaldavaxq_p_u): Remove. * config/arm/unspecs.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U) (VQRDMLASHQ_N_U) (VMLALDAVAXQ_P_U): Remove unspecs. * config/arm/iterators.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U) (VQRDMLASHQ_N_U, VMLALDAVAXQ_P_U): Remove attributes. (VQDMLAHQ_N, VQRDMLAHQ_N, VQRDMLASHQ_N, VMLALDAVAXQ_P): Remove unsigned variants from iterators. * config/arm/mve.md (mve_vqdmlahq_n_<supf><mode>) (mve_vqrdmlahq_n_<supf><mode>) (mve_vqrdmlashq_n_<supf><mode>, mve_vmlaldavaxq_p_<supf><mode>): Update comment. gcc/testsuite/ PR target/96914 * gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c: Remove. * gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c: Remove. * gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c: Remove. * gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c: Remove. * gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c: Remove. * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c: Remove. * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c: Remove. * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c: Remove. * gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c: Remove. * gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c: Remove. * gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c: Remove.
* arm: [MVE[ Add vqdmlashq intrinsics (PR target/96914)Christophe Lyon2020-10-081-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds: vqdmlashq_m_n_s16 vqdmlashq_m_n_s32 vqdmlashq_m_n_s8 vqdmlashq_n_s16 vqdmlashq_n_s32 vqdmlashq_n_s8 2020-10-08 Christophe Lyon <christophe.lyon@linaro.org> gcc/ PR target/96914 * config/arm/arm_mve.h (vqdmlashq, vqdmlashq_m): Define. * config/arm/arm_mve_builtins.def (vqdmlashq_n_s) (vqdmlashq_m_n_s,): New. * config/arm/unspecs.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New unspecs. * config/arm/iterators.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New attributes. (VQDMLASHQ_N): New iterator. * config/arm/mve.md (mve_vqdmlashq_n_, mve_vqdmlashq_m_n_s): New patterns. gcc/testsuite/ PR target/96914 * gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: New test. * gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: New test. * gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: New test. * gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: New test. * gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: New test. * gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: New test.
* [PATCH][GCC] arm: Move iterators from mve.md to iterators.md to maintain ↵Srinath Parvathaneni2020-10-061-648/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | consistency. To maintain consistency with other Arm Architectures backend, iterators and iterator attributes are moved from mve.md file to iterators.md. Also move enumerators for MVE unspecs from mve.md file to unspecs.md file. gcc/ChangeLog: 2020-10-06 Srinath Parvathaneni <srinath.parvathaneni@arm.com> * config/arm/iterators.md (MVE_types): Move mode iterator from mve.md to iterators.md. (MVE_VLD_ST): Likewise. (MVE_0): Likewise. (MVE_1): Likewise. (MVE_3): Likewise. (MVE_2): Likewise. (MVE_5): Likewise. (MVE_6): Likewise. (MVE_CNVT): Move mode attribute iterator from mve.md to iterators.md. (MVE_LANES): Likewise. (MVE_constraint): Likewise. (MVE_constraint1): Likewise. (MVE_constraint2): Likewise. (MVE_constraint3): Likewise. (MVE_pred): Likewise. (MVE_pred1): Likewise. (MVE_pred2): Likewise. (MVE_pred3): Likewise. (MVE_B_ELEM): Likewise. (MVE_H_ELEM): Likewise. (V_sz_elem1): Likewise. (V_extr_elem): Likewise. (earlyclobber_32): Likewise. (supf): Move int attribute from mve.md to iterators.md. (mode1): Likewise. (VCVTQ_TO_F): Move int iterator from mve.md to iterators.md. (VMVNQ_N): Likewise. (VREV64Q): Likewise. (VCVTQ_FROM_F): Likewise. (VREV16Q): Likewise. (VCVTAQ): Likewise. (VMVNQ): Likewise. (VDUPQ_N): Likewise. (VCLZQ): Likewise. (VADDVQ): Likewise. (VREV32Q): Likewise. (VMOVLBQ): Likewise. (VMOVLTQ): Likewise. (VCVTPQ): Likewise. (VCVTNQ): Likewise. (VCVTMQ): Likewise. (VADDLVQ): Likewise. (VCTPQ): Likewise. (VCTPQ_M): Likewise. (VCVTQ_N_TO_F): Likewise. (VCREATEQ): Likewise. (VSHRQ_N): Likewise. (VCVTQ_N_FROM_F): Likewise. (VADDLVQ_P): Likewise. (VCMPNEQ): Likewise. (VSHLQ): Likewise. (VABDQ): Likewise. (VADDQ_N): Likewise. (VADDVAQ): Likewise. (VADDVQ_P): Likewise. (VANDQ): Likewise. (VBICQ): Likewise. (VBRSRQ_N): Likewise. (VCADDQ_ROT270): Likewise. (VCADDQ_ROT90): Likewise. (VCMPEQQ): Likewise. (VCMPEQQ_N): Likewise. (VCMPNEQ_N): Likewise. (VEORQ): Likewise. (VHADDQ): Likewise. (VHADDQ_N): Likewise. (VHSUBQ): Likewise. (VHSUBQ_N): Likewise. (VMAXQ): Likewise. (VMAXVQ): Likewise. (VMINQ): Likewise. (VMINVQ): Likewise. (VMLADAVQ): Likewise. (VMULHQ): Likewise. (VMULLBQ_INT): Likewise. (VMULLTQ_INT): Likewise. (VMULQ): Likewise. (VMULQ_N): Likewise. (VORNQ): Likewise. (VORRQ): Likewise. (VQADDQ): Likewise. (VQADDQ_N): Likewise. (VQRSHLQ): Likewise. (VQRSHLQ_N): Likewise. (VQSHLQ): Likewise. (VQSHLQ_N): Likewise. (VQSHLQ_R): Likewise. (VQSUBQ): Likewise. (VQSUBQ_N): Likewise. (VRHADDQ): Likewise. (VRMULHQ): Likewise. (VRSHLQ): Likewise. (VRSHLQ_N): Likewise. (VRSHRQ_N): Likewise. (VSHLQ_N): Likewise. (VSHLQ_R): Likewise. (VSUBQ): Likewise. (VSUBQ_N): Likewise. (VADDLVAQ): Likewise. (VBICQ_N): Likewise. (VMLALDAVQ): Likewise. (VMLALDAVXQ): Likewise. (VMOVNBQ): Likewise. (VMOVNTQ): Likewise. (VORRQ_N): Likewise. (VQMOVNBQ): Likewise. (VQMOVNTQ): Likewise. (VSHLLBQ_N): Likewise. (VSHLLTQ_N): Likewise. (VRMLALDAVHQ): Likewise. (VBICQ_M_N): Likewise. (VCVTAQ_M): Likewise. (VCVTQ_M_TO_F): Likewise. (VQRSHRNBQ_N): Likewise. (VABAVQ): Likewise. (VSHLCQ): Likewise. (VRMLALDAVHAQ): Likewise. (VADDVAQ_P): Likewise. (VCLZQ_M): Likewise. (VCMPEQQ_M_N): Likewise. (VCMPEQQ_M): Likewise. (VCMPNEQ_M_N): Likewise. (VCMPNEQ_M): Likewise. (VDUPQ_M_N): Likewise. (VMAXVQ_P): Likewise. (VMINVQ_P): Likewise. (VMLADAVAQ): Likewise. (VMLADAVQ_P): Likewise. (VMLAQ_N): Likewise. (VMLASQ_N): Likewise. (VMVNQ_M): Likewise. (VPSELQ): Likewise. (VQDMLAHQ_N): Likewise. (VQRDMLAHQ_N): Likewise. (VQRDMLASHQ_N): Likewise. (VQRSHLQ_M_N): Likewise. (VQSHLQ_M_R): Likewise. (VREV64Q_M): Likewise. (VRSHLQ_M_N): Likewise. (VSHLQ_M_R): Likewise. (VSLIQ_N): Likewise. (VSRIQ_N): Likewise. (VMLALDAVQ_P): Likewise. (VQMOVNBQ_M): Likewise. (VMOVLTQ_M): Likewise. (VMOVNBQ_M): Likewise. (VRSHRNTQ_N): Likewise. (VORRQ_M_N): Likewise. (VREV32Q_M): Likewise. (VREV16Q_M): Likewise. (VQRSHRNTQ_N): Likewise. (VMOVNTQ_M): Likewise. (VMOVLBQ_M): Likewise. (VMLALDAVAQ): Likewise. (VQSHRNBQ_N): Likewise. (VSHRNBQ_N): Likewise. (VRSHRNBQ_N): Likewise. (VMLALDAVXQ_P): Likewise. (VQMOVNTQ_M): Likewise. (VMVNQ_M_N): Likewise. (VQSHRNTQ_N): Likewise. (VMLALDAVAXQ): Likewise. (VSHRNTQ_N): Likewise. (VCVTMQ_M): Likewise. (VCVTNQ_M): Likewise. (VCVTPQ_M): Likewise. (VCVTQ_M_N_FROM_F): Likewise. (VCVTQ_M_FROM_F): Likewise. (VRMLALDAVHQ_P): Likewise. (VADDLVAQ_P): Likewise. (VABAVQ_P): Likewise. (VSHLQ_M): Likewise. (VSRIQ_M_N): Likewise. (VSUBQ_M): Likewise. (VCVTQ_M_N_TO_F): Likewise. (VHSUBQ_M): Likewise. (VSLIQ_M_N): Likewise. (VRSHLQ_M): Likewise. (VMINQ_M): Likewise. (VMULLBQ_INT_M): Likewise. (VMULHQ_M): Likewise. (VMULQ_M): Likewise. (VHSUBQ_M_N): Likewise. (VHADDQ_M_N): Likewise. (VORRQ_M): Likewise. (VRMULHQ_M): Likewise. (VQADDQ_M): Likewise. (VRSHRQ_M_N): Likewise. (VQSUBQ_M_N): Likewise. (VADDQ_M): Likewise. (VORNQ_M): Likewise. (VRHADDQ_M): Likewise. (VQSHLQ_M): Likewise. (VANDQ_M): Likewise. (VBICQ_M): Likewise. (VSHLQ_M_N): Likewise. (VCADDQ_ROT270_M): Likewise. (VQRSHLQ_M): Likewise. (VQADDQ_M_N): Likewise. (VADDQ_M_N): Likewise. (VMAXQ_M): Likewise. (VQSUBQ_M): Likewise. (VMLASQ_M_N): Likewise. (VMLADAVAQ_P): Likewise. (VBRSRQ_M_N): Likewise. (VMULQ_M_N): Likewise. (VCADDQ_ROT90_M): Likewise. (VMULLTQ_INT_M): Likewise. (VEORQ_M): Likewise. (VSHRQ_M_N): Likewise. (VSUBQ_M_N): Likewise. (VHADDQ_M): Likewise. (VABDQ_M): Likewise. (VMLAQ_M_N): Likewise. (VQSHLQ_M_N): Likewise. (VMLALDAVAQ_P): Likewise. (VMLALDAVAXQ_P): Likewise. (VQRSHRNBQ_M_N): Likewise. (VQRSHRNTQ_M_N): Likewise. (VQSHRNBQ_M_N): Likewise. (VQSHRNTQ_M_N): Likewise. (VRSHRNBQ_M_N): Likewise. (VRSHRNTQ_M_N): Likewise. (VSHLLBQ_M_N): Likewise. (VSHLLTQ_M_N): Likewise. (VSHRNBQ_M_N): Likewise. (VSHRNTQ_M_N): Likewise. (VSTRWSBQ): Likewise. (VSTRBSOQ): Likewise. (VSTRBQ): Likewise. (VLDRBGOQ): Likewise. (VLDRBQ): Likewise. (VLDRWGBQ): Likewise. (VLD1Q): Likewise. (VLDRHGOQ): Likewise. (VLDRHGSOQ): Likewise. (VLDRHQ): Likewise. (VLDRWQ): Likewise. (VLDRDGBQ): Likewise. (VLDRDGOQ): Likewise. (VLDRDGSOQ): Likewise. (VLDRWGOQ): Likewise. (VLDRWGSOQ): Likewise. (VST1Q): Likewise. (VSTRHSOQ): Likewise. (VSTRHSSOQ): Likewise. (VSTRHQ): Likewise. (VSTRWQ): Likewise. (VSTRDSBQ): Likewise. (VSTRDSOQ): Likewise. (VSTRDSSOQ): Likewise. (VSTRWSOQ): Likewise. (VSTRWSSOQ): Likewise. (VSTRWSBWBQ): Likewise. (VLDRWGBWBQ): Likewise. (VSTRDSBWBQ): Likewise. (VLDRDGBWBQ): Likewise. (VADCIQ): Likewise. (VADCIQ_M): Likewise. (VSBCQ): Likewise. (VSBCQ_M): Likewise. (VSBCIQ): Likewise. (VSBCIQ_M): Likewise. (VADCQ): Likewise. (VADCQ_M): Likewise. (UQRSHLLQ): Likewise. (SQRSHRLQ): Likewise. (VSHLCQ_M): Likewise. * config/arm/mve.md (MVE_types): Move mode iterator to iterators.md from mve.md. (MVE_VLD_ST): Likewise. (MVE_0): Likewise. (MVE_1): Likewise. (MVE_3): Likewise. (MVE_2): Likewise. (MVE_5): Likewise. (MVE_6): Likewise. (MVE_CNVT): Move mode attribute iterator to iterators.md from mve.md. (MVE_LANES): Likewise. (MVE_constraint): Likewise. (MVE_constraint1): Likewise. (MVE_constraint2): Likewise. (MVE_constraint3): Likewise. (MVE_pred): Likewise. (MVE_pred1): Likewise. (MVE_pred2): Likewise. (MVE_pred3): Likewise. (MVE_B_ELEM): Likewise. (MVE_H_ELEM): Likewise. (V_sz_elem1): Likewise. (V_extr_elem): Likewise. (earlyclobber_32): Likewise. (supf): Move int attribute to iterators.md from mve.md. (mode1): Likewise. (VCVTQ_TO_F): Move int iterator to iterators.md from mve.md. (VMVNQ_N): Likewise. (VREV64Q): Likewise. (VCVTQ_FROM_F): Likewise. (VREV16Q): Likewise. (VCVTAQ): Likewise. (VMVNQ): Likewise. (VDUPQ_N): Likewise. (VCLZQ): Likewise. (VADDVQ): Likewise. (VREV32Q): Likewise. (VMOVLBQ): Likewise. (VMOVLTQ): Likewise. (VCVTPQ): Likewise. (VCVTNQ): Likewise. (VCVTMQ): Likewise. (VADDLVQ): Likewise. (VCTPQ): Likewise. (VCTPQ_M): Likewise. (VCVTQ_N_TO_F): Likewise. (VCREATEQ): Likewise. (VSHRQ_N): Likewise. (VCVTQ_N_FROM_F): Likewise. (VADDLVQ_P): Likewise. (VCMPNEQ): Likewise. (VSHLQ): Likewise. (VABDQ): Likewise. (VADDQ_N): Likewise. (VADDVAQ): Likewise. (VADDVQ_P): Likewise. (VANDQ): Likewise. (VBICQ): Likewise. (VBRSRQ_N): Likewise. (VCADDQ_ROT270): Likewise. (VCADDQ_ROT90): Likewise. (VCMPEQQ): Likewise. (VCMPEQQ_N): Likewise. (VCMPNEQ_N): Likewise. (VEORQ): Likewise. (VHADDQ): Likewise. (VHADDQ_N): Likewise. (VHSUBQ): Likewise. (VHSUBQ_N): Likewise. (VMAXQ): Likewise. (VMAXVQ): Likewise. (VMINQ): Likewise. (VMINVQ): Likewise. (VMLADAVQ): Likewise. (VMULHQ): Likewise. (VMULLBQ_INT): Likewise. (VMULLTQ_INT): Likewise. (VMULQ): Likewise. (VMULQ_N): Likewise. (VORNQ): Likewise. (VORRQ): Likewise. (VQADDQ): Likewise. (VQADDQ_N): Likewise. (VQRSHLQ): Likewise. (VQRSHLQ_N): Likewise. (VQSHLQ): Likewise. (VQSHLQ_N): Likewise. (VQSHLQ_R): Likewise. (VQSUBQ): Likewise. (VQSUBQ_N): Likewise. (VRHADDQ): Likewise. (VRMULHQ): Likewise. (VRSHLQ): Likewise. (VRSHLQ_N): Likewise. (VRSHRQ_N): Likewise. (VSHLQ_N): Likewise. (VSHLQ_R): Likewise. (VSUBQ): Likewise. (VSUBQ_N): Likewise. (VADDLVAQ): Likewise. (VBICQ_N): Likewise. (VMLALDAVQ): Likewise. (VMLALDAVXQ): Likewise. (VMOVNBQ): Likewise. (VMOVNTQ): Likewise. (VORRQ_N): Likewise. (VQMOVNBQ): Likewise. (VQMOVNTQ): Likewise. (VSHLLBQ_N): Likewise. (VSHLLTQ_N): Likewise. (VRMLALDAVHQ): Likewise. (VBICQ_M_N): Likewise. (VCVTAQ_M): Likewise. (VCVTQ_M_TO_F): Likewise. (VQRSHRNBQ_N): Likewise. (VABAVQ): Likewise. (VSHLCQ): Likewise. (VRMLALDAVHAQ): Likewise. (VADDVAQ_P): Likewise. (VCLZQ_M): Likewise. (VCMPEQQ_M_N): Likewise. (VCMPEQQ_M): Likewise. (VCMPNEQ_M_N): Likewise. (VCMPNEQ_M): Likewise. (VDUPQ_M_N): Likewise. (VMAXVQ_P): Likewise. (VMINVQ_P): Likewise. (VMLADAVAQ): Likewise. (VMLADAVQ_P): Likewise. (VMLAQ_N): Likewise. (VMLASQ_N): Likewise. (VMVNQ_M): Likewise. (VPSELQ): Likewise. (VQDMLAHQ_N): Likewise. (VQRDMLAHQ_N): Likewise. (VQRDMLASHQ_N): Likewise. (VQRSHLQ_M_N): Likewise. (VQSHLQ_M_R): Likewise. (VREV64Q_M): Likewise. (VRSHLQ_M_N): Likewise. (VSHLQ_M_R): Likewise. (VSLIQ_N): Likewise. (VSRIQ_N): Likewise. (VMLALDAVQ_P): Likewise. (VQMOVNBQ_M): Likewise. (VMOVLTQ_M): Likewise. (VMOVNBQ_M): Likewise. (VRSHRNTQ_N): Likewise. (VORRQ_M_N): Likewise. (VREV32Q_M): Likewise. (VREV16Q_M): Likewise. (VQRSHRNTQ_N): Likewise. (VMOVNTQ_M): Likewise. (VMOVLBQ_M): Likewise. (VMLALDAVAQ): Likewise. (VQSHRNBQ_N): Likewise. (VSHRNBQ_N): Likewise. (VRSHRNBQ_N): Likewise. (VMLALDAVXQ_P): Likewise. (VQMOVNTQ_M): Likewise. (VMVNQ_M_N): Likewise. (VQSHRNTQ_N): Likewise. (VMLALDAVAXQ): Likewise. (VSHRNTQ_N): Likewise. (VCVTMQ_M): Likewise. (VCVTNQ_M): Likewise. (VCVTPQ_M): Likewise. (VCVTQ_M_N_FROM_F): Likewise. (VCVTQ_M_FROM_F): Likewise. (VRMLALDAVHQ_P): Likewise. (VADDLVAQ_P): Likewise. (VABAVQ_P): Likewise. (VSHLQ_M): Likewise. (VSRIQ_M_N): Likewise. (VSUBQ_M): Likewise. (VCVTQ_M_N_TO_F): Likewise. (VHSUBQ_M): Likewise. (VSLIQ_M_N): Likewise. (VRSHLQ_M): Likewise. (VMINQ_M): Likewise. (VMULLBQ_INT_M): Likewise. (VMULHQ_M): Likewise. (VMULQ_M): Likewise. (VHSUBQ_M_N): Likewise. (VHADDQ_M_N): Likewise. (VORRQ_M): Likewise. (VRMULHQ_M): Likewise. (VQADDQ_M): Likewise. (VRSHRQ_M_N): Likewise. (VQSUBQ_M_N): Likewise. (VADDQ_M): Likewise. (VORNQ_M): Likewise. (VRHADDQ_M): Likewise. (VQSHLQ_M): Likewise. (VANDQ_M): Likewise. (VBICQ_M): Likewise. (VSHLQ_M_N): Likewise. (VCADDQ_ROT270_M): Likewise. (VQRSHLQ_M): Likewise. (VQADDQ_M_N): Likewise. (VADDQ_M_N): Likewise. (VMAXQ_M): Likewise. (VQSUBQ_M): Likewise. (VMLASQ_M_N): Likewise. (VMLADAVAQ_P): Likewise. (VBRSRQ_M_N): Likewise. (VMULQ_M_N): Likewise. (VCADDQ_ROT90_M): Likewise. (VMULLTQ_INT_M): Likewise. (VEORQ_M): Likewise. (VSHRQ_M_N): Likewise. (VSUBQ_M_N): Likewise. (VHADDQ_M): Likewise. (VABDQ_M): Likewise. (VMLAQ_M_N): Likewise. (VQSHLQ_M_N): Likewise. (VMLALDAVAQ_P): Likewise. (VMLALDAVAXQ_P): Likewise. (VQRSHRNBQ_M_N): Likewise. (VQRSHRNTQ_M_N): Likewise. (VQSHRNBQ_M_N): Likewise. (VQSHRNTQ_M_N): Likewise. (VRSHRNBQ_M_N): Likewise. (VRSHRNTQ_M_N): Likewise. (VSHLLBQ_M_N): Likewise. (VSHLLTQ_M_N): Likewise. (VSHRNBQ_M_N): Likewise. (VSHRNTQ_M_N): Likewise. (VSTRWSBQ): Likewise. (VSTRBSOQ): Likewise. (VSTRBQ): Likewise. (VLDRBGOQ): Likewise. (VLDRBQ): Likewise. (VLDRWGBQ): Likewise. (VLD1Q): Likewise. (VLDRHGOQ): Likewise. (VLDRHGSOQ): Likewise. (VLDRHQ): Likewise. (VLDRWQ): Likewise. (VLDRDGBQ): Likewise. (VLDRDGOQ): Likewise. (VLDRDGSOQ): Likewise. (VLDRWGOQ): Likewise. (VLDRWGSOQ): Likewise. (VST1Q): Likewise. (VSTRHSOQ): Likewise. (VSTRHSSOQ): Likewise. (VSTRHQ): Likewise. (VSTRWQ): Likewise. (VSTRDSBQ): Likewise. (VSTRDSOQ): Likewise. (VSTRDSSOQ): Likewise. (VSTRWSOQ): Likewise. (VSTRWSSOQ): Likewise. (VSTRWSBWBQ): Likewise. (VLDRWGBWBQ): Likewise. (VSTRDSBWBQ): Likewise. (VLDRDGBWBQ): Likewise. (VADCIQ): Likewise. (VADCIQ_M): Likewise. (VSBCQ): Likewise. (VSBCQ_M): Likewise. (VSBCIQ): Likewise. (VSBCIQ_M): Likewise. (VADCQ): Likewise. (VADCQ_M): Likewise. (UQRSHLLQ): Likewise. (SQRSHRLQ): Likewise. (VSHLCQ_M): Likewise. (define_c_enum "unspec"): Move MVE enumerator to unspecs.md from mve.md. * config/arm/unspecs.md (define_c_enum "unspec"): Move MVE enumerator from mve.md to unspecs.md.
* arm: Require MVE memory operand for destination of vst1q intrinsicJoe Ramsay2020-08-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Previously, the machine description patterns for vst1q accepted a generic memory operand for the destination, which could lead to an unrecognised builtin when expanding vst1q* intrinsics. This change fixes the pattern to only accept MVE memory operands. gcc/ChangeLog: PR target/96683 * config/arm/mve.md (mve_vst1q_f<mode>): Require MVE memory operand for destination. (mve_vst1q_<supf><mode>): Likewise. gcc/testsuite/ChangeLog: PR target/96683 * gcc.target/arm/mve/intrinsics/vst1q_f16.c: New test. * gcc.target/arm/mve/intrinsics/vst1q_s16.c: New test. * gcc.target/arm/mve/intrinsics/vst1q_s8.c: New test. * gcc.target/arm/mve/intrinsics/vst1q_u16.c: New test. * gcc.target/arm/mve/intrinsics/vst1q_u8.c: New test.
* [PATCH][GCC] arm: Fix MVE scalar shift intrinsics code-gen.Srinath Parvathaneni2020-06-161-36/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch modifies the MVE scalar shift RTL patterns. The current patterns have wrong constraints and predicates due to which the values returned from MVE scalar shift instructions are overwritten in the code-gen. example: $ cat x.c int32_t foo(int64_t acc, int shift) { return sqrshrl_sat48 (acc, shift); } Code-gen before applying this patch: $ arm-none-eabi-gcc -march=armv8.1-m.main+mve -mfloat-abi=hard -O2 -S $ cat x.s foo: push {r4, r5} sqrshrl r0, r1, #48, r2 ----> (a) mov r0, r4 ----> (b) pop {r4, r5} bx lr Code-gen after applying this patch: foo: sqrshrl r0, r1, #48, r2 bx lr In the current compiler the return value (r0) from sqrshrl (a) is getting overwritten by the mov statement (b). This patch fixes above issue. 2020-06-12 Srinath Parvathaneni <srinath.parvathaneni@arm.com> gcc/ * config/arm/mve.md (mve_uqrshll_sat<supf>_di): Correct the predicate and constraint of all the operands. (mve_sqrshrl_sat<supf>_di): Likewise. (mve_uqrshl_si): Likewise. (mve_sqrshr_si): Likewise. (mve_uqshll_di): Likewise. (mve_urshrl_di): Likewise. (mve_uqshl_si): Likewise. (mve_urshr_si): Likewise. (mve_sqshl_si): Likewise. (mve_srshr_si): Likewise. (mve_srshrl_di): Likewise. (mve_sqshll_di): Likewise. * config/arm/predicates.md (arm_low_register_operand): Define. gcc/testsuite/ * gcc.target/arm/mve/intrinsics/mve_scalar_shifts1.c: New test. * gcc.target/arm/mve/intrinsics/mve_scalar_shifts2.c: Likewise. * gcc.target/arm/mve/intrinsics/mve_scalar_shifts3.c: Likewise. * gcc.target/arm/mve/intrinsics/mve_scalar_shifts4.c: Likewise.
* [ARM]: Correct the grouping of operands in MVE vector scatter store ↵Srinath Parvathaneni2020-06-041-321/+507
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intrinsics (PR94735). The operands in RTL patterns of MVE vector scatter store intrinsics are wrongly grouped, because of which few vector loads and stores instructions are wrongly getting optimized out with -O2. A new predicate "mve_scatter_memory" is defined in this patch, this predicate returns TRUE on matching: (mem(reg)) for MVE scatter store intrinsics. This patch fixes the issue by adding define_expand pattern with "mve_scatter_memory" predicate and calls the corresponding define_insn by passing register_operand as first argument. This register_operand is extracted from the operand with "mve_scatter_memory" predicate in define_expand pattern. gcc/ChangeLog: 2020-06-01 Srinath Parvathaneni <srinath.parvathaneni@arm.com> PR target/94735 * config/arm/predicates.md (mve_scatter_memory): Define to match (mem (reg)) for scatter store memory. * config/arm/mve.md (mve_vstrbq_scatter_offset_<supf><mode>): Modify define_insn to define_expand. (mve_vstrbq_scatter_offset_p_<supf><mode>): Likewise. (mve_vstrhq_scatter_offset_<supf><mode>): Likewise. (mve_vstrhq_scatter_shifted_offset_p_<supf><mode>): Likewise. (mve_vstrhq_scatter_shifted_offset_<supf><mode>): Likewise. (mve_vstrdq_scatter_offset_p_<supf>v2di): Likewise. (mve_vstrdq_scatter_offset_<supf>v2di): Likewise. (mve_vstrdq_scatter_shifted_offset_p_<supf>v2di): Likewise. (mve_vstrdq_scatter_shifted_offset_<supf>v2di): Likewise. (mve_vstrhq_scatter_offset_fv8hf): Likewise. (mve_vstrhq_scatter_offset_p_fv8hf): Likewise. (mve_vstrhq_scatter_shifted_offset_fv8hf): Likewise. (mve_vstrhq_scatter_shifted_offset_p_fv8hf): Likewise. (mve_vstrwq_scatter_offset_fv4sf): Likewise. (mve_vstrwq_scatter_offset_p_fv4sf): Likewise. (mve_vstrwq_scatter_offset_p_<supf>v4si): Likewise. (mve_vstrwq_scatter_offset_<supf>v4si): Likewise. (mve_vstrwq_scatter_shifted_offset_fv4sf): Likewise. (mve_vstrwq_scatter_shifted_offset_p_fv4sf): Likewise. (mve_vstrwq_scatter_shifted_offset_p_<supf>v4si): Likewise. (mve_vstrwq_scatter_shifted_offset_<supf>v4si): Likewise. (mve_vstrbq_scatter_offset_<supf><mode>_insn): Define insn for scatter stores. (mve_vstrbq_scatter_offset_p_<supf><mode>_insn): Likewise. (mve_vstrhq_scatter_offset_<supf><mode>_insn): Likewise. (mve_vstrhq_scatter_shifted_offset_p_<supf><mode>_insn): Likewise. (mve_vstrhq_scatter_shifted_offset_<supf><mode>_insn): Likewise. (mve_vstrdq_scatter_offset_p_<supf>v2di_insn): Likewise. (mve_vstrdq_scatter_offset_<supf>v2di_insn): Likewise. (mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn): Likewise. (mve_vstrdq_scatter_shifted_offset_<supf>v2di_insn): Likewise. (mve_vstrhq_scatter_offset_fv8hf_insn): Likewise. (mve_vstrhq_scatter_offset_p_fv8hf_insn): Likewise. (mve_vstrhq_scatter_shifted_offset_fv8hf_insn): Likewise. (mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn): Likewise. (mve_vstrwq_scatter_offset_fv4sf_insn): Likewise. (mve_vstrwq_scatter_offset_p_fv4sf_insn): Likewise. (mve_vstrwq_scatter_offset_p_<supf>v4si_insn): Likewise. (mve_vstrwq_scatter_offset_<supf>v4si_insn): Likewise. (mve_vstrwq_scatter_shifted_offset_fv4sf_insn): Likewise. (mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn): Likewise. (mve_vstrwq_scatter_shifted_offset_p_<supf>v4si_insn): Likewise. (mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn): Likewise. gcc/testsuite/ChangeLog: 2020-06-01 Srinath Parvathaneni <srinath.parvathaneni@arm.com> PR target/94735 * gcc.target/arm/mve/intrinsics/mve_vstore_scatter_base.c: New test. * gcc.target/arm/mve/intrinsics/mve_vstore_scatter_base_p.c: Likewise. * gcc.target/arm/mve/intrinsics/mve_vstore_scatter_offset.c: Likewise. * gcc.target/arm/mve/intrinsics/mve_vstore_scatter_offset_p.c: Likewise. * gcc.target/arm/mve/intrinsics/mve_vstore_scatter_shifted_offset.c: Likewise. * gcc.target/arm/mve/intrinsics/mve_vstore_scatter_shifted_offset_p.c: Likewise.
* [ARM]: Fix the wrong code-gen generated by MVE vector load/store intrinsics ↵Srinath Parvathaneni2020-05-201-54/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR94959). Few MVE intrinsics like vldrbq_s32, vldrhq_s32 etc., the assembler instructions generated by current compiler are wrong. eg: vldrbq_s32 generates an assembly instructions `vldrb.s32 q0,[ip]`. But as per Arm-arm second argument in above instructions must also be a low register (<= r7). This patch fixes this issue by creating a new predicate "mve_memory_operand" and constraint "Ux" which allows low registers as arguments to the generated instructions depending on the mode of the argument. A new constraint "Ul" is created to handle loading to PC-relative addressing modes for vector store/load intrinsiscs. All the corresponding MVE intrinsic generating wrong code-gen as vldrbq_s32 are modified in this patch. gcc/ChangeLog: 2020-05-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> PR target/94959 * config/arm/arm-protos.h (arm_mode_base_reg_class): Function declaration. (mve_vector_mem_operand): Likewise. * config/arm/arm.c (thumb2_legitimate_address_p): For MVE target check the load from memory to a core register is legitimate for give mode. (mve_vector_mem_operand): Define function. (arm_print_operand): Modify comment. (arm_mode_base_reg_class): Define. * config/arm/arm.h (MODE_BASE_REG_CLASS): Modify to add check for TARGET_HAVE_MVE and expand to arm_mode_base_reg_class on TRUE. * config/arm/constraints.md (Ux): Likewise. (Ul): Likewise. * config/arm/mve.md (mve_mov): Replace constraint Us with Ux and also add support for missing Vector Store Register and Vector Load Register. Add a new alternative to support load from memory to PC (or label) in vector store/load. (mve_vstrbq_<supf><mode>): Modify constraint Us to Ux. (mve_vldrbq_<supf><mode>): Modify constriant Us to Ux, predicate to mve_memory_operand and also modify the MVE instructions to emit. (mve_vldrbq_z_<supf><mode>): Modify constraint Us to Ux. (mve_vldrhq_fv8hf): Modify constriant Us to Ux, predicate to mve_memory_operand and also modify the MVE instructions to emit. (mve_vldrhq_<supf><mode>): Modify constriant Us to Ux, predicate to mve_memory_operand and also modify the MVE instructions to emit. (mve_vldrhq_z_fv8hf): Likewise. (mve_vldrhq_z_<supf><mode>): Likewise. (mve_vldrwq_fv4sf): Likewise. (mve_vldrwq_<supf>v4si): Likewise. (mve_vldrwq_z_fv4sf): Likewise. (mve_vldrwq_z_<supf>v4si): Likewise. (mve_vld1q_f<mode>): Modify constriant Us to Ux. (mve_vld1q_<supf><mode>): Likewise. (mve_vstrhq_fv8hf): Modify constriant Us to Ux, predicate to mve_memory_operand. (mve_vstrhq_p_fv8hf): Modify constriant Us to Ux, predicate to mve_memory_operand and also modify the MVE instructions to emit. (mve_vstrhq_p_<supf><mode>): Likewise. (mve_vstrhq_<supf><mode>): Modify constriant Us to Ux, predicate to mve_memory_operand. (mve_vstrwq_fv4sf): Modify constriant Us to Ux. (mve_vstrwq_p_fv4sf): Modify constriant Us to Ux and also modify the MVE instructions to emit. (mve_vstrwq_p_<supf>v4si): Likewise. (mve_vstrwq_<supf>v4si): Likewise.Modify constriant Us to Ux. * config/arm/predicates.md (mve_memory_operand): Define. gcc/testsuite/ChangeLog: 2020-05-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com> PR target/94959 * gcc.target/arm/mve/intrinsics/mve_vector_float2.c: Modify. * gcc.target/arm/mve/intrinsics/mve_vldr.c: New test. * gcc.target/arm/mve/intrinsics/mve_vldr_z.c: Likewise. * gcc.target/arm/mve/intrinsics/mve_vstr.c: Likewise. * gcc.target/arm/mve/intrinsics/mve_vstr_p.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_f16.c: Modify. * gcc.target/arm/mve/intrinsics/vld1q_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrhq_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrhq_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrhq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrhq_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrhq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vuninitializedq_float.c: Likewise. * gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c: Likewise. * gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise. * gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: Likewise.
* [GCC][PATCH][ARM]: Change arm constraint name from "e" to "Te".Srinath Parvathaneni2020-04-271-28/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches changes the constraint "e" to "Te". gcc/ChangeLog: 2020-04-24 Srinath Parvathaneni <srinath.parvathaneni@arm.com> * config/arm/constraints.md (e): Remove constraint. (Te): Define constraint. * config/arm/mve.md (vaddvq_<supf><mode>): Modify constraint in operand 0 from "e" to "Te". (vaddvaq_<supf><mode>): Likewise. (vaddvq_p_<supf><mode>): Likewise. (vmladavq_<supf><mode>): Likewise. (vmladavxq_s<mode>): Likewise. (vmlsdavq_s<mode>): Likewise. (vmlsdavxq_s<mode>): Likewise. (vaddvaq_p_<supf><mode>): Likewise. (vmladavaq_<supf><mode>): Likewise. (vmladavq_p_<supf><mode>): Likewise. (vmladavxq_p_s<mode>): Likewise. (vmlsdavq_p_s<mode>): Likewise. (vmlsdavxq_p_s<mode>): Likewise. (vmlsdavaxq_s<mode>): Likewise. (vmlsdavaq_s<mode>): Likewise. (vmladavaxq_s<mode>): Likewise. (vmladavaq_p_<supf><mode>): Likewise. (vmladavaxq_p_s<mode>): Likewise. (vmlsdavaq_p_s<mode>): Likewise. (vmlsdavaxq_p_s<mode>): Likewise.
* Arm: MVE: Add mve vec_duplicate patternAndre Vieira2020-04-151-2/+7
| | | | | | | | | | | | | | | | This patch fixes an ICE we were seeing due to a missing vec_duplicate pattern. gcc/ChangeLog: 2020-04-15 Andre Vieira <andre.simoesdiasvieira@arm.com> * config/arm/mve.md (mve_vec_duplicate<mode>): New pattern. (V_sz_elem2): Remove unused mode attribute. gcc/testsuite/ChangeLog: 2020-04-15 Andre Vieira <andre.simoesdiasvieira@arm.com> Srinath Parvathaneni <srinath.parvathaneni@arm.com> * gcc.target/arm/mve/intrinsics/mve_vec_duplicate.c: New test.
* [Arm] Implement CDE predicated intrinsics for MVE registersMatthew Malcomson2020-04-081-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These intrinsics are the predicated version of the intrinsics inroduced in https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542725.html. These are not yet public on developer.arm.com but we have reached internal consensus on them. The approach follows the same method as for the CDE intrinsics for MVE registers, most notably using the same arm_resolve_overloaded_builtin function with minor modifications. The resolver hook has been moved from arm-builtins.c to arm-c.c so it can access the c-common function build_function_call_vec. This function is needed to perform the same checks on arguments as a normal C or C++ function would perform. It is fine to put this resolver in arm-c.c since it's only use is for the ACLE functions, and these are only available in C/C++. So that the resolver function has access to information it needs from the builtins, we put two query functions into arm-builtins.c and use them from arm-c.c. We rely on the order that the builtins are defined in gcc/config/arm/arm_cde_builtins.def, knowing that the predicated versions come after the non-predicated versions. The machine description patterns for these builtins are simpler than those for the non-predicated versions, since the accumulator versions *and* non-accumulator versions both need an input vector now. The input vector is needed for the non-accumulator version to describe the original values for those lanes that are not updated during the merge operation. We additionally need to introduce qualifiers for these new builtins, which follow the same pattern as the non-predicated versions but with an extra argument to describe the predicate. Error message changes: - We directly mention the builtin argument when complaining that an argument is not in the correct range. This more closely matches the C error messages. - We ensure the resolver complains about *all* invalid arguments to a function instead of just the first one. - The resolver error messages index arguments from 1 instead of 0 to match the arguments coming from the C/C++ frontend. In order to allow the user to give an argument for the merging predicate when they don't care what data is stored in the 'false' lanes, we also move the __arm_vuninitializedq* intrinsics from arm_mve.h to arm_mve_types.h which is shared with arm_cde.h. We only move the fully type-specified `__arm_vuninitializedq*` intrinsics and not the polymorphic versions, since moving the polymorphic versions requires moving the _Generic framework as well as just the intrinsics we're interested in. This matches the approach taken for the `__arm_vreinterpret*` functions in this include file. This patch also contains a slight change in spacing of an existing assembly instruction to be emitted. This is just to help writing tests -- vmsr usually has a tab and a space between the mnemonic and the first argument, but in one case it just has a tab -- making all the same helps make test regexps simpler. Testing Done: Bootstrap and full regtest on arm-none-linux-gnueabihf Full regtest on arm-none-eabi All testing done with a local fix for the bugzilla PR below. That bugzilla currently causes multiple ICE's on the tests added in this patch. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94341 gcc/ChangeLog: 2020-04-02 Matthew Malcomson <matthew.malcomson@arm.com> * config/arm/arm-builtins.c (CX_UNARY_UNONE_QUALIFIERS): New. (CX_BINARY_UNONE_QUALIFIERS): New. (CX_TERNARY_UNONE_QUALIFIERS): New. (arm_resolve_overloaded_builtin): Move to arm-c.c. (arm_expand_builtin_args): Update error message. (enum resolver_ident): New. (arm_describe_resolver): New. (arm_cde_end_args): New. * config/arm/arm-builtins.h: New file. * config/arm/arm-c.c (arm_resolve_overloaded_builtin): New. (arm_resolve_cde_builtin): Moved from arm-builtins.c. * config/arm/arm_cde.h (__arm_vcx1q_m, __arm_vcx1qa_m, __arm_vcx2q_m, __arm_vcx2qa_m, __arm_vcx3q_m, __arm_vcx3qa_m): New. * config/arm/arm_cde_builtins.def (vcx1q_p_, vcx1qa_p_, vcx2q_p_, vcx2qa_p_, vcx3q_p_, vcx3qa_p_): New builtin defs. * config/arm/iterators.md (CDE_VCX): New int iterator. (a) New int attribute. * config/arm/mve.md (arm_vcx1q<a>_p_v16qi, arm_vcx2q<a>_p_v16qi, arm_vcx3q<a>_p_v16qi): New patterns. * config/arm/vfp.md (thumb2_movhi_fp16): Extra space in assembly. gcc/testsuite/ChangeLog: 2020-04-02 Matthew Malcomson <matthew.malcomson@arm.com> * gcc.target/arm/acle/cde-errors.c: Add predicated forms. * gcc.target/arm/acle/cde-mve-error-1.c: Add predicated forms. * gcc.target/arm/acle/cde-mve-error-2.c: Add predicated forms. * gcc.target/arm/acle/cde-mve-error-3.c: Add predicated forms. * gcc.target/arm/acle/cde-mve-full-assembly.c: Add predicated forms. * gcc.target/arm/acle/cde-mve-tests.c: Add predicated forms. * gcc.target/arm/acle/cde_v_1_err.c (test_imm_range): Update for error message format change. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c: Update scan-assembler regexp.
* [Arm] Implement CDE intrinsics for MVE registers.Matthew Malcomson2020-04-081-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement CDE intrinsics on MVE registers. Other than the basics required for adding intrinsics this patch consists of three changes. ** We separate out the MVE types and casts from the arm_mve.h header. This is so that the types can be used in arm_cde.h without the need to include the entire arm_mve.h header. The only type that arm_cde.h needs is `uint8x16_t`, so this separation could be avoided by using a `typedef` in this file. Since the introduced intrinsics are all defined to act on the full range of MVE types, declaring all such types seems intuitive since it will provide their declaration to the user too. This arm_mve_types.h header not only includes the MVE types, but also the conversion intrinsics between them. Some of the conversion intrinsics are needed for arm_cde.h, but most are not. We include all conversion intrinsics to keep the definition of such conversion functions all in one place, on the understanding that extra conversion functions being defined when including `arm_cde.h` is not a problem. ** We define the TARGET_RESOLVE_OVERLOADED_BUILTIN hook for the Arm backend. This is needed to implement the polymorphism for the required intrinsics. The intrinsics have no specialised version, and the resulting assembly instruction for all different types should be exactly the same. Due to this we have implemented these intrinsics via one builtin on one type. All other calls to the intrinsic with different types are implicitly cast to the one type that is defined, and hence are all expanded to the same RTL pattern that is only defined for one machine mode. ** We seperate the initialisation of the CDE intrinsics from others. This allows us to ensure that the CDE intrinsics acting on MVE registers are only created when both CDE and MVE are available. Only initialising these builtins when both features are available is especially important since they require a type that is only initialised when the target supports hard float. Hence trying to initialise these builtins on a soft float target would cause an ICE. Testing done: Full bootstrap and regtest on arm-none-linux-gnueabihf Regression test on arm-none-eabi Ok for trunk? gcc/ChangeLog: 2020-03-10 Matthew Malcomson <matthew.malcomson@arm.com> * config.gcc (arm_mve_types.h): New extra_header for arm. * config/arm/arm-builtins.c (arm_resolve_overloaded_builtin): New. (arm_init_cde_builtins): New. (arm_init_acle_builtins): Remove initialisation of CDE builtins. (arm_init_builtins): Call arm_init_cde_builtins when target supports CDE. * config/arm/arm-c.c (arm_resolve_overloaded_builtin): New declaration. (arm_register_target_pragmas): Initialise resolve_overloaded_builtin hook to the implementation for the arm backend. * config/arm/arm.h (ARM_MVE_CDE_CONST_1): New. (ARM_MVE_CDE_CONST_2): New. (ARM_MVE_CDE_CONST_3): New. * config/arm/arm_cde.h (__arm_vcx1q_u8): New. (__arm_vcx1qa): New. (__arm_vcx2q): New. (__arm_vcx2q_u8): New. (__arm_vcx2qa): New. (__arm_vcx3q): New. (__arm_vcx3q_u8): New. (__arm_vcx3qa): New. * config/arm/arm_cde_builtins.def (vcx1q, vcx1qa, vcx2q, vcx2qa, vcx3q, vcx3qa): New builtins defined. * config/arm/arm_mve.h: Move typedefs and conversion intrinsics to arm_mve_types.h header. * config/arm/arm_mve_types.h: New file. * config/arm/mve.md (arm_vcx1qv16qi, arm_vcx1qav16qi, arm_vcx2qv16qi, arm_vcx2qav16qi, arm_vcx3qv16qi, arm_vcx3qav16qi): New patterns. * config/arm/predicates.md (const_int_mve_cde1_operand, const_int_mve_cde2_operand, const_int_mve_cde3_operand): New. gcc/testsuite/ChangeLog: 2020-03-23 Matthew Malcomson <matthew.malcomson@arm.com> Dennis Zhang <dennis.zhang@arm.com> * gcc.target/arm/acle/cde-mve-error-1.c: New test. * gcc.target/arm/acle/cde-mve-error-2.c: New test. * gcc.target/arm/acle/cde-mve-error-3.c: New test. * gcc.target/arm/acle/cde-mve-full-assembly.c: New test. * gcc.target/arm/acle/cde-mve-tests.c: New test. * lib/target-supports.exp (arm_v8_1m_main_cde_mve_fp): New check effective. (arm_v8_1m_main_cde_mve, arm_v8m_main_cde_fp): Use -mfpu=auto so we only check configurations that make sense.
* arm: MVE: Fix vec extracts to memoryAndre Simoes Dias Vieira2020-04-071-3/+3
| | | | | | | | | | | | | | | | | | | This patch fixes vec extracts to memory that can arise from code as seen in the testcase added. The patch fixes this by allowing mem operands in the set of mve_vec_extract patterns, which given the only '=r' constraint will lead to the scalar value being written to a register and then stored in memory using scalar store pattern. gcc/ChangeLog: 2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com> * config/arm/mve.md (mve_vec_extract*): Allow memory operands in set. gcc/testsuite/ChangeLog: 2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com> * gcc.target/arm/mve/intrinsics/mve_vec_extracts_from_memory.c: New test.
* arm: MVE Fix immediate constraints on some vector instructionsAndre Simoes Dias Vieira2020-04-071-16/+18
| | | | | | | | | | | | | | | | | | | | | | | | | Hi, This patch fixes the immediate checks on vcvt and vqshr(u)n[bt] instructions. It also removes the 'arm_mve_immediate_check' as the check was wrong and the error message is not much better than the constraint one, which albeit isn't great either. gcc/ChangeLog: 2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com> * config/arm/arm.c (arm_mve_immediate_check): Removed. * config/arm/mve.md (MVE_pred2, MVE_constraint2): Added FP types. (mve_vcvtq_n_to_f_*, mve_vcvtq_n_from_f_*, mve_vqshrnbq_n_*, mve_vqshrntq_n_*, mve_vqshrunbq_n_s*, mve_vqshruntq_n_s*, mve_vcvtq_m_n_from_f_*, mve_vcvtq_m_n_to_f_*, mve_vqshrnbq_m_n_*, mve_vqrshruntq_m_n_s*, mve_vqshrunbq_m_n_s*, mve_vqshruntq_m_n_s*): Fixed immediate constraints. gcc/testsuite/ChangeLog: 2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com> * gcc.target/arm/mve/intrinsics/mve_immediates_1_n.c: New test.
* arm: MVE: Fix v[id]wdup'sAndre Simoes Dias Vieira2020-04-071-22/+23
| | | | | | | | | | | | | | | | | | This patch fixes v[id]wdup intrinsics. They had two issues: 1) the predicated versions did not link the incoming inactive vector parameter to the output 2) The backend didn't enforce the wrap limit operand be in an odd register. 1) was fixed like we did for all other predicated intrinsics 2) requires a temporary hack where we pass the value in the top end of DImode operand. The proper fix would be to add a register CLASS but this interacted badly with other existing targets codegen. We will look to fix this properly in GCC 11. gcc/ChangeLog: 2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com> * config/arm/arm_mve.h: Fix v[id]wdup intrinsics. * config/arm/mve/md: Fix v[id]wdup patterns.
* arm: MVE: Fix constant load patternAndre Simoes Dias Vieira2020-04-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the constant load pattern for MVE, this was not accounting correctly for label + offset cases. Added test that ICE'd before and removed the scan assemblers for the mve_vector* tests as they were too fragile. gcc/ChangeLog: 2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com> * config/arm/arm.c (output_move_neon): Deal with label + offset cases. * config/arm/mve.md (*mve_mov<mode>): Handle const vectors. gcc/testsuite/ChangeLog: 2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com> * gcc.target/arm/mve/intrinsics/mve_load_from_array.c: New test. * gcc.target/arm/mve/intrinsics/mve_vector_float.c: Remove scan-assembler. * gcc.target/arm/mve/intrinsics/mve_vector_float1.c: Likewise. * gcc.target/arm/mve/intrinsics/mve_vector_int1.c: Likewise. * gcc.target/arm/mve/intrinsics/mve_vector_int2.c: Likewise.
* [ARM]: Fix for MVE ACLE intrinsics with writeback (PR94317).Srinath Parvathaneni2020-04-021-3/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following MVE ACLE intrinsics have an issue with writeback to the base address. vldrdq_gather_base_wb_s64, vldrdq_gather_base_wb_u64, vldrdq_gather_base_wb_z_s64, vldrdq_gather_base_wb_z_u64, vldrwq_gather_base_wb_s32, vldrwq_gather_base_wb_u32, vldrwq_gather_base_wb_z_s32, vldrwq_gather_base_wb_z_u32, vldrwq_gather_base_wb_f32, vldrwq_gather_base_wb_z_f32. This patch fixes the bug reported in PR94317 by adding separate builtin calls to update the result and writeback to base address for the above intrinsics. 2020-04-02 Srinath Parvathaneni <srinath.parvathaneni@arm.com> PR target/94317 * config/arm/arm-builtins.c (LDRGBWBXU_QUALIFIERS): Define. (LDRGBWBXU_Z_QUALIFIERS): Likewise. * config/arm/arm_mve.h (__arm_vldrdq_gather_base_wb_s64): Modify intrinsic defintion by adding a new builtin call to writeback into base address. (__arm_vldrdq_gather_base_wb_u64): Likewise. (__arm_vldrdq_gather_base_wb_z_s64): Likewise. (__arm_vldrdq_gather_base_wb_z_u64): Likewise. (__arm_vldrwq_gather_base_wb_s32): Likewise. (__arm_vldrwq_gather_base_wb_u32): Likewise. (__arm_vldrwq_gather_base_wb_z_s32): Likewise. (__arm_vldrwq_gather_base_wb_z_u32): Likewise. (__arm_vldrwq_gather_base_wb_f32): Likewise. (__arm_vldrwq_gather_base_wb_z_f32): Likewise. * config/arm/arm_mve_builtins.def (vldrwq_gather_base_wb_z_u): Modify builtin's qualifier. (vldrdq_gather_base_wb_z_u): Likewise. (vldrwq_gather_base_wb_u): Likewise. (vldrdq_gather_base_wb_u): Likewise. (vldrwq_gather_base_wb_z_s): Likewise. (vldrwq_gather_base_wb_z_f): Likewise. (vldrdq_gather_base_wb_z_s): Likewise. (vldrwq_gather_base_wb_s): Likewise. (vldrwq_gather_base_wb_f): Likewise. (vldrdq_gather_base_wb_s): Likewise. (vldrwq_gather_base_nowb_z_u): Define builtin. (vldrdq_gather_base_nowb_z_u): Likewise. (vldrwq_gather_base_nowb_u): Likewise. (vldrdq_gather_base_nowb_u): Likewise. (vldrwq_gather_base_nowb_z_s): Likewise. (vldrwq_gather_base_nowb_z_f): Likewise. (vldrdq_gather_base_nowb_z_s): Likewise. (vldrwq_gather_base_nowb_s): Likewise. (vldrwq_gather_base_nowb_f): Likewise. (vldrdq_gather_base_nowb_s): Likewise. * config/arm/mve.md (mve_vldrwq_gather_base_nowb_<supf>v4si): Define RTL pattern. (mve_vldrwq_gather_base_wb_<supf>v4si): Modify RTL pattern. (mve_vldrwq_gather_base_nowb_z_<supf>v4si): Define RTL pattern. (mve_vldrwq_gather_base_wb_z_<supf>v4si): Modify RTL pattern. (mve_vldrwq_gather_base_wb_fv4sf): Modify RTL pattern. (mve_vldrwq_gather_base_nowb_fv4sf): Define RTL pattern. (mve_vldrwq_gather_base_wb_z_fv4sf): Modify RTL pattern. (mve_vldrwq_gather_base_nowb_z_fv4sf): Define RTL pattern. (mve_vldrdq_gather_base_nowb_<supf>v4di): Define RTL pattern. (mve_vldrdq_gather_base_wb_<supf>v4di): Modify RTL pattern. (mve_vldrdq_gather_base_nowb_z_<supf>v4di): Define RTL pattern. (mve_vldrdq_gather_base_wb_z_<supf>v4di): Modify RTL pattern. gcc/testsuite/ChangeLog: 2020-04-02 Srinath Parvathaneni <srinath.parvathaneni@arm.com> PR target/94317 * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: Modify. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c: Likewise.
* [ARM][GCC][14x]: MVE ACLE whole vector left shift with carry intrinsics.Srinath Parvathaneni2020-03-231-3/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch supports following MVE ACLE whole vector left shift with carry intrinsics. vshlcq_m_s8, vshlcq_m_s16, vshlcq_m_s32, vshlcq_m_u8, vshlcq_m_u16, vshlcq_m_u32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics 2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> Mihail Ionescu <mihail.ionescu@arm.com> * config/arm/arm_mve.h (vshlcq_m_s8): Define macro. (vshlcq_m_u8): Likewise. (vshlcq_m_s16): Likewise. (vshlcq_m_u16): Likewise. (vshlcq_m_s32): Likewise. (vshlcq_m_u32): Likewise. (__arm_vshlcq_m_s8): Define intrinsic. (__arm_vshlcq_m_u8): Likewise. (__arm_vshlcq_m_s16): Likewise. (__arm_vshlcq_m_u16): Likewise. (__arm_vshlcq_m_s32): Likewise. (__arm_vshlcq_m_u32): Likewise. (vshlcq_m): Define polymorphic variant. * config/arm/arm_mve_builtins.def (QUADOP_NONE_NONE_UNONE_IMM_UNONE): Use builtin qualifier. (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE): Likewise. * config/arm/mve.md (mve_vshlcq_m_vec_<supf><mode>): Define RTL pattern. (mve_vshlcq_m_carry_<supf><mode>): Likewise. (mve_vshlcq_m_<supf><mode>): Likewise. gcc/testsuite/ChangeLog: 2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> Mihail Ionescu <mihail.ionescu@arm.com> * gcc.target/arm/mve/intrinsics/vshlcq_m_s16.c: New test. * gcc.target/arm/mve/intrinsics/vshlcq_m_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlcq_m_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlcq_m_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlcq_m_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vshlcq_m_u8.c: Likewise.
* [ARM][GCC][13x]: MVE ACLE scalar shift intrinsics.Srinath Parvathaneni2020-03-231-3/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch supports following MVE ACLE scalar shift intrinsics. sqrshr, sqrshrl, sqrshrl_sat48, sqshl, sqshll, srshr, srshrl, uqrshl, uqrshll, uqrshll_sat48, uqshl, uqshll, urshr, urshrl, lsll, asrl. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics 2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com> * config/arm/arm-builtins.c (LSLL_QUALIFIERS): Define builtin qualifier. (UQSHL_QUALIFIERS): Likewise. (ASRL_QUALIFIERS): Likewise. (SQSHL_QUALIFIERS): Likewise. * config/arm/arm_mve.h (__ARM_BIG_ENDIAN): Check to not support MVE in Big-Endian Mode. (sqrshr): Define macro. (sqrshrl): Likewise. (sqrshrl_sat48): Likewise. (sqshl): Likewise. (sqshll): Likewise. (srshr): Likewise. (srshrl): Likewise. (uqrshl): Likewise. (uqrshll): Likewise. (uqrshll_sat48): Likewise. (uqshl): Likewise. (uqshll): Likewise. (urshr): Likewise. (urshrl): Likewise. (lsll): Likewise. (asrl): Likewise. (__arm_lsll): Define intrinsic. (__arm_asrl): Likewise. (__arm_uqrshll): Likewise. (__arm_uqrshll_sat48): Likewise. (__arm_sqrshrl): Likewise. (__arm_sqrshrl_sat48): Likewise. (__arm_uqshll): Likewise. (__arm_urshrl): Likewise. (__arm_srshrl): Likewise. (__arm_sqshll): Likewise. (__arm_uqrshl): Likewise. (__arm_sqrshr): Likewise. (__arm_uqshl): Likewise. (__arm_urshr): Likewise. (__arm_sqshl): Likewise. (__arm_srshr): Likewise. * config/arm/arm_mve_builtins.def (LSLL_QUALIFIERS): Use builtin qualifier. (UQSHL_QUALIFIERS): Likewise. (ASRL_QUALIFIERS): Likewise. (SQSHL_QUALIFIERS): Likewise. * config/arm/mve.md (mve_uqrshll_sat<supf>_di): Define RTL pattern. (mve_sqrshrl_sat<supf>_di): Likewise. (mve_uqrshl_si): Likewise. (mve_sqrshr_si): Likewise. (mve_uqshll_di): Likewise. (mve_urshrl_di): Likewise. (mve_uqshl_si): Likewise. (mve_urshr_si): Likewise. (mve_sqshl_si): Likewise. (mve_srshr_si): Likewise. (mve_srshrl_di): Likewise. (mve_sqshll_di): Likewise. gcc/testsuite/ChangeLog: 2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com> * gcc.target/arm/mve/intrinsics/asrl.c: New test. * gcc.target/arm/mve/intrinsics/lsll.c: Likewise. * gcc.target/arm/mve/intrinsics/sqrshr.c: Likewise. * gcc.target/arm/mve/intrinsics/sqrshrl_sat48.c: Likewise. * gcc.target/arm/mve/intrinsics/sqrshrl_sat64.c: Likewise. * gcc.target/arm/mve/intrinsics/sqshl.c: Likewise. * gcc.target/arm/mve/intrinsics/sqshll.c: Likewise. * gcc.target/arm/mve/intrinsics/srshr.c: Likewise. * gcc.target/arm/mve/intrinsics/srshrl.c: Likewise. * gcc.target/arm/mve/intrinsics/uqrshl.c: Likewise. * gcc.target/arm/mve/intrinsics/uqrshll_sat48.c: Likewise. * gcc.target/arm/mve/intrinsics/uqrshll_sat64.c: Likewise. * gcc.target/arm/mve/intrinsics/uqshl.c: Likewise. * gcc.target/arm/mve/intrinsics/uqshll.c: Likewise. * gcc.target/arm/mve/intrinsics/urshr.c: Likewise. * gcc.target/arm/mve/intrinsics/urshrl.c: Likewise. * lib/target-supports.exp: (check_effective_target_arm_v8_1m_mve_fp_ok_nocache): Modify to not support MVE floating point in Big Endian mode. (check_effective_target_arm_v8_1m_mve_ok_nocache): Modify to not support MVE integer in Big Endian mode.
* [ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane.Srinath Parvathaneni2020-03-231-0/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch supports following MVE ACLE intrinsics to get and set vector lane. vsetq_lane_f16, vsetq_lane_f32, vsetq_lane_s16, vsetq_lane_s32, vsetq_lane_s8, vsetq_lane_s64, vsetq_lane_u8, vsetq_lane_u16, vsetq_lane_u32, vsetq_lane_u64, vgetq_lane_f16, vgetq_lane_f32, vgetq_lane_s16, vgetq_lane_s32, vgetq_lane_s8, vgetq_lane_s64, vgetq_lane_u8, vgetq_lane_u16, vgetq_lane_u32, vgetq_lane_u64. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics 2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> Mihail Ionescu <mihail.ionescu@arm.com> * config/arm/arm_mve.h (vsetq_lane_f16): Define macro. (vsetq_lane_f32): Likewise. (vsetq_lane_s16): Likewise. (vsetq_lane_s32): Likewise. (vsetq_lane_s8): Likewise. (vsetq_lane_s64): Likewise. (vsetq_lane_u8): Likewise. (vsetq_lane_u16): Likewise. (vsetq_lane_u32): Likewise. (vsetq_lane_u64): Likewise. (vgetq_lane_f16): Likewise. (vgetq_lane_f32): Likewise. (vgetq_lane_s16): Likewise. (vgetq_lane_s32): Likewise. (vgetq_lane_s8): Likewise. (vgetq_lane_s64): Likewise. (vgetq_lane_u8): Likewise. (vgetq_lane_u16): Likewise. (vgetq_lane_u32): Likewise. (vgetq_lane_u64): Likewise. (__ARM_NUM_LANES): Likewise. (__ARM_LANEQ): Likewise. (__ARM_CHECK_LANEQ): Likewise. (__arm_vsetq_lane_s16): Define intrinsic. (__arm_vsetq_lane_s32): Likewise. (__arm_vsetq_lane_s8): Likewise. (__arm_vsetq_lane_s64): Likewise. (__arm_vsetq_lane_u8): Likewise. (__arm_vsetq_lane_u16): Likewise. (__arm_vsetq_lane_u32): Likewise. (__arm_vsetq_lane_u64): Likewise. (__arm_vgetq_lane_s16): Likewise. (__arm_vgetq_lane_s32): Likewise. (__arm_vgetq_lane_s8): Likewise. (__arm_vgetq_lane_s64): Likewise. (__arm_vgetq_lane_u8): Likewise. (__arm_vgetq_lane_u16): Likewise. (__arm_vgetq_lane_u32): Likewise. (__arm_vgetq_lane_u64): Likewise. (__arm_vsetq_lane_f16): Likewise. (__arm_vsetq_lane_f32): Likewise. (__arm_vgetq_lane_f16): Likewise. (__arm_vgetq_lane_f32): Likewise. (vgetq_lane): Define polymorphic variant. (vsetq_lane): Likewise. * config/arm/mve.md (mve_vec_extract<mode><V_elem_l>): Define RTL pattern. (mve_vec_extractv2didi): Likewise. (mve_vec_extract_sext_internal<mode>): Likewise. (mve_vec_extract_zext_internal<mode>): Likewise. (mve_vec_set<mode>_internal): Likewise. (mve_vec_setv2di_internal): Likewise. * config/arm/neon.md (vec_set<mode>): Move RTL pattern to vec-common.md file. (vec_extract<mode><V_elem_l>): Rename to "neon_vec_extract<mode><V_elem_l>". (vec_extractv2didi): Rename to "neon_vec_extractv2didi". * config/arm/vec-common.md (vec_extract<mode><V_elem_l>): Define RTL pattern common for MVE and NEON. (vec_set<mode>): Move RTL pattern from neon.md and modify to accept both MVE and NEON. gcc/testsuite/ChangeLog: 2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> Mihail Ionescu <mihail.ionescu@arm.com> * gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: New test. * gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise.
* arm: Add earlyclobber to MVE instructions that require itAndre Simoes Dias Vieira2020-03-231-34/+36
| | | | | | | | | | | | | This patch adds an earlyclobber to the MVE instructions that require it and were missing it. These are vrev64 and 32-bit element variants of vcadd, vhcadd vcmul, vmull[bt] and vqdmull[bt]. gcc/ChangeLog: 2020-03-23 Andre Vieira <andre.simoesdiasvieira@arm.com> * config/arm/mve.md (earlyclobber_32): New mode attribute. (mve_vrev64q_*, mve_vcaddq*, mve_vhcaddq_*, mve_vcmulq_*, mve_vmull[bt]q_*, mve_vqdmull[bt]q_*): Add appropriate early clobbers.
* [ARM][GCC][11x]: MVE ACLE vector interleaving store and deinterleaving load ↵Srinath Parvathaneni2020-03-201-1/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intrinsics and also aliases to vstr and vldr intrinsics. This patch supports following MVE ACLE intrinsics which are aliases of vstr and vldr intrinsics. vst1q_p_u8, vst1q_p_s8, vld1q_z_u8, vld1q_z_s8, vst1q_p_u16, vst1q_p_s16, vld1q_z_u16, vld1q_z_s16, vst1q_p_u32, vst1q_p_s32, vld1q_z_u32, vld1q_z_s32, vld1q_z_f16, vst1q_p_f16, vld1q_z_f32, vst1q_p_f32. This patch also supports following MVE ACLE vector deinterleaving loads and vector interleaving stores. vst2q_s8, vst2q_u8, vld2q_s8, vld2q_u8, vld4q_s8, vld4q_u8, vst2q_s16, vst2q_u16, vld2q_s16, vld2q_u16, vld4q_s16, vld4q_u16, vst2q_s32, vst2q_u32, vld2q_s32, vld2q_u32, vld4q_s32, vld4q_u32, vld4q_f16, vld2q_f16, vst2q_f16, vld4q_f32, vld2q_f32, vst2q_f32. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics 2020-03-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> Mihail Ionescu <mihail.ionescu@arm.com> * config/arm/arm_mve.h (vst1q_p_u8): Define macro. (vst1q_p_s8): Likewise. (vst2q_s8): Likewise. (vst2q_u8): Likewise. (vld1q_z_u8): Likewise. (vld1q_z_s8): Likewise. (vld2q_s8): Likewise. (vld2q_u8): Likewise. (vld4q_s8): Likewise. (vld4q_u8): Likewise. (vst1q_p_u16): Likewise. (vst1q_p_s16): Likewise. (vst2q_s16): Likewise. (vst2q_u16): Likewise. (vld1q_z_u16): Likewise. (vld1q_z_s16): Likewise. (vld2q_s16): Likewise. (vld2q_u16): Likewise. (vld4q_s16): Likewise. (vld4q_u16): Likewise. (vst1q_p_u32): Likewise. (vst1q_p_s32): Likewise. (vst2q_s32): Likewise. (vst2q_u32): Likewise. (vld1q_z_u32): Likewise. (vld1q_z_s32): Likewise. (vld2q_s32): Likewise. (vld2q_u32): Likewise. (vld4q_s32): Likewise. (vld4q_u32): Likewise. (vld4q_f16): Likewise. (vld2q_f16): Likewise. (vld1q_z_f16): Likewise. (vst2q_f16): Likewise. (vst1q_p_f16): Likewise. (vld4q_f32): Likewise. (vld2q_f32): Likewise. (vld1q_z_f32): Likewise. (vst2q_f32): Likewise. (vst1q_p_f32): Likewise. (__arm_vst1q_p_u8): Define intrinsic. (__arm_vst1q_p_s8): Likewise. (__arm_vst2q_s8): Likewise. (__arm_vst2q_u8): Likewise. (__arm_vld1q_z_u8): Likewise. (__arm_vld1q_z_s8): Likewise. (__arm_vld2q_s8): Likewise. (__arm_vld2q_u8): Likewise. (__arm_vld4q_s8): Likewise. (__arm_vld4q_u8): Likewise. (__arm_vst1q_p_u16): Likewise. (__arm_vst1q_p_s16): Likewise. (__arm_vst2q_s16): Likewise. (__arm_vst2q_u16): Likewise. (__arm_vld1q_z_u16): Likewise. (__arm_vld1q_z_s16): Likewise. (__arm_vld2q_s16): Likewise. (__arm_vld2q_u16): Likewise. (__arm_vld4q_s16): Likewise. (__arm_vld4q_u16): Likewise. (__arm_vst1q_p_u32): Likewise. (__arm_vst1q_p_s32): Likewise. (__arm_vst2q_s32): Likewise. (__arm_vst2q_u32): Likewise. (__arm_vld1q_z_u32): Likewise. (__arm_vld1q_z_s32): Likewise. (__arm_vld2q_s32): Likewise. (__arm_vld2q_u32): Likewise. (__arm_vld4q_s32): Likewise. (__arm_vld4q_u32): Likewise. (__arm_vld4q_f16): Likewise. (__arm_vld2q_f16): Likewise. (__arm_vld1q_z_f16): Likewise. (__arm_vst2q_f16): Likewise. (__arm_vst1q_p_f16): Likewise. (__arm_vld4q_f32): Likewise. (__arm_vld2q_f32): Likewise. (__arm_vld1q_z_f32): Likewise. (__arm_vst2q_f32): Likewise. (__arm_vst1q_p_f32): Likewise. (vld1q_z): Define polymorphic variant. (vld2q): Likewise. (vld4q): Likewise. (vst1q_p): Likewise. (vst2q): Likewise. * config/arm/arm_mve_builtins.def (STORE1): Use builtin qualifier. (LOAD1): Likewise. * config/arm/mve.md (mve_vst2q<mode>): Define RTL pattern. (mve_vld2q<mode>): Likewise. (mve_vld4q<mode>): Likewise. gcc/testsuite/ChangeLog: 2020-03-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com> Andre Vieira <andre.simoesdiasvieira@arm.com> Mihail Ionescu <mihail.ionescu@arm.com> * gcc.target/arm/mve/intrinsics/vld1q_z_f16.c: New test. * gcc.target/arm/mve/intrinsics/vld1q_z_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld1q_z_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vld2q_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld2q_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld2q_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld2q_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld2q_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vld2q_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld2q_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld2q_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vld4q_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld4q_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld4q_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld4q_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld4q_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vld4q_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vld4q_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vld4q_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vst1q_p_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vst1q_p_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst1q_p_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vst1q_p_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst1q_p_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vst1q_p_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vst1q_p_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst1q_p_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vst2q_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vst2q_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst2q_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vst2q_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst2q_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vst2q_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vst2q_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vst2q_u8.c: Likewise.