| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since MVE has a different set of vector comparison operators from
Neon, we have to update the expansion to take into account the new
ones, for instance 'NE' for which MVE does not require to use 'EQ'
with the inverted condition.
Conversely, Neon supports comparisons with #0, MVE does not.
For:
typedef long int vs32 __attribute__((vector_size(16)));
vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; }
we now generate:
cmp_eq_vs32_reg:
vldr.64 d4, .L123 @ 8 [c=8 l=4] *mve_movv4si/8
vldr.64 d5, .L123+8
vldr.64 d6, .L123+16 @ 9 [c=8 l=4] *mve_movv4si/8
vldr.64 d7, .L123+24
vcmp.i32 eq, q0, q1 @ 7 [c=16 l=4] mve_vcmpeqq_v4si
vpsel q0, q3, q2 @ 15 [c=8 l=4] mve_vpselq_sv4si
bx lr @ 26 [c=8 l=4] *thumb2_return
.L124:
.align 3
.L123:
.word 0
.word 0
.word 0
.word 0
.word 1
.word 1
.word 1
.word 1
For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) produces
a pair of vldr instead of vmov.i32, qX, #0
2021-05-17 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm-protos.h (arm_expand_vector_compare): Update
prototype.
* config/arm/arm.c (arm_expand_vector_compare): Add support for
MVE.
(arm_expand_vcond): Likewise.
* config/arm/iterators.md (supf): Remove VCMPNEQ_S, VCMPEQQ_S,
VCMPEQQ_N_S, VCMPNEQ_N_S.
(VCMPNEQ, VCMPEQQ, VCMPEQQ_N, VCMPNEQ_N): Remove.
* config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>): Add '@' prefix.
(@mve_vcmp<mve_cmp_op>q_f<mode>): Likewise.
(@mve_vcmp<mve_cmp_op>q_n_f<mode>): Likewise.
(@mve_vpselq_<supf><mode>): Likewise.
(@mve_vpselq_f<mode>"): Likewise.
* config/arm/neon.md (vec_cmp<mode><v_cmp_result): Enable for MVE
and move to vec-common.md.
(vec_cmpu<mode><mode>): Likewise.
(vcond<mode><mode>): Likewise.
(vcond<V_cvtto><mode>): Likewise.
(vcondu<mode><v_cmp_result>): Likewise.
(vcond_mask_<mode><v_cmp_result>): Likewise.
* config/arm/unspecs.md (VCMPNEQ_U, VCMPNEQ_S, VCMPEQQ_S)
(VCMPEQQ_N_S, VCMPNEQ_N_S, VCMPEQQ_U, CMPEQQ_N_U, VCMPNEQ_N_U)
(VCMPGEQ_N_S, VCMPGEQ_S, VCMPGTQ_N_S, VCMPGTQ_S, VCMPLEQ_N_S)
(VCMPLEQ_S, VCMPLTQ_N_S, VCMPLTQ_S, VCMPCSQ_N_U, VCMPCSQ_U)
(VCMPHIQ_N_U, VCMPHIQ_U): Remove.
* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result): Moved
from neon.md.
(vec_cmpu<mode><mode>): Likewise.
(vcond<mode><mode>): Likewise.
(vcond<V_cvtto><mode>): Likewise.
(vcondu<mode><v_cmp_result>): Likewise.
(vcond_mask_<mode><v_cmp_result>): Likewise. Added unsafe math
condition.
gcc/testsuite
* gcc.target/arm/simd/mve-compare-1.c: New test with GCC vectors.
* gcc.target/arm/simd/mve-compare-2.c: New test with GCC vectors.
* gcc.target/arm/simd/mve-compare-scalar-1.c: New test with GCC
vectors.
* gcc.target/arm/simd/mve-vcmp-f32.c: New test for
auto-vectorization.
* gcc.target/arm/simd/mve-vcmp.c: New test for auto-vectorization.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Like in the previous, we factorize the vcmp_*f* patterns to make
maintenance easier.
2021-05-10 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (MVE_FP_COMPARISONS): New.
* config/arm/mve.md (mve_vcmp<mve_cmp_op>q_f<mode>)
(mve_vcmp<mve_cmp_op>q_n_f<mode>): New, merge all vcmp_*f*
patterns.
(mve_vcmpeqq_f<mode>, mve_vcmpeqq_n_f<mode>, mve_vcmpgeq_f<mode>)
(mve_vcmpgeq_n_f<mode>, mve_vcmpgtq_f<mode>)
(mve_vcmpgtq_n_f<mode>, mve_vcmpleq_f<mode>)
(mve_vcmpleq_n_f<mode>, mve_vcmpltq_f<mode>)
(mve_vcmpltq_n_f<mode>, mve_vcmpneq_f<mode>)
(mve_vcmpneq_n_f<mode>): Remove.
* config/arm/unspecs.md (VCMPEQQ_F, VCMPEQQ_N_F, VCMPGEQ_F)
(VCMPGEQ_N_F, VCMPGTQ_F, VCMPGTQ_N_F, VCMPLEQ_F, VCMPLEQ_N_F)
(VCMPLTQ_F, VCMPLTQ_N_F, VCMPNEQ_F, VCMPNEQ_N_F): Remove.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After removing the signed and unsigned suffixes in the previous
patches, we can now factorize the vcmp* patterns: there is no longer
an asymmetry where operators do not have the same set of signed and
unsigned variants.
The will make maintenance easier.
MVE has a different set of vector comparison operators than Neon,
so we have to introduce dedicated iterators.
2021-05-10 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (MVE_COMPARISONS): New.
(mve_cmp_op): New.
(mve_cmp_type): New.
* config/arm/mve.md (mve_vcmp<mve_cmp_op>q_<mode>): New, merge all
mve_vcmp patterns.
(mve_vcmpneq_<mode>, mve_vcmpcsq_n_<mode>, mve_vcmpcsq_<mode>)
(mve_vcmpeqq_n_<mode>, mve_vcmpeqq_<mode>, mve_vcmpgeq_n_<mode>)
(mve_vcmpgeq_<mode>, mve_vcmpgtq_n_<mode>, mve_vcmpgtq_<mode>)
(mve_vcmphiq_n_<mode>, mve_vcmphiq_<mode>, mve_vcmpleq_n_<mode>)
(mve_vcmpleq_<mode>, mve_vcmpltq_n_<mode>, mve_vcmpltq_<mode>)
(mve_vcmpneq_n_<mode>, mve_vcmpltq_n_<mode>, mve_vcmpltq_<mode>)
(mve_vcmpneq_n_<mode>): Remove.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch brings more unification in the vector comparison builtins,
by removing the useless 's' (signed) suffix since we no longer need
unsigned versions.
2021-05-10 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm_mve.h (__arm_vcmp*): Remove 's' suffix.
* config/arm/arm_mve_builtins.def (vcmp*): Remove 's' suffix.
* config/arm/mve.md (mve_vcmp*): Remove 's' suffix in pattern
names.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After the previous patch, we no longer need to emit the unsigned
variants of vcmpneq/vcmpeqq. This patch removes them as well as the
corresponding iterator entries.
2021-05-10 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/arm_mve_builtins.def (vcmpneq_u): Remove.
(vcmpneq_n_u): Likewise.
(vcmpeqq_u,): Likewise.
(vcmpeqq_n_u): Likewise.
* config/arm/iterators.md (supf): Remove VCMPNEQ_U, VCMPEQQ_U,
VCMPEQQ_N_U and VCMPNEQ_N_U.
* config/arm/mve.md (mve_vcmpneq): Remove <supf> iteration.
(mve_vcmpeqq_n): Likewise.
(mve_vcmpeqq): Likewise.
(mve_vcmpneq_n): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the PR shows, we currently miscompile V2DImode loads and stores for
MVE. We're currently using 64-bit loads/stores, but need to be using
128-bit vector loads and stores. Fixed thusly.
Some intrinsics tests were checking that we (incorrectly) used the
64-bit loads/stores: these have been updated.
gcc/ChangeLog:
PR target/99960
* config/arm/mve.md (*mve_mov<mode>): Simplify output code. Use
vldrw.u32 and vstrw.32 for V2D[IF]mode loads and stores.
gcc/testsuite/ChangeLog:
PR target/99960
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c:
Update now that we're (correctly) using full 128-bit vector
loads/stores.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c:
Likewise.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c:
Likewise.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c:
Likewise.
* gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise.
* gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c:
Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes various issues with vec_duplicate in the MVE patterns.
Currently there are two patterns named *mve_mov<mode>. The second of
these is really a vector duplicate rather than a move, so I've renamed
it accordingly.
As it stands, there are several issues with this pattern:
1. The MVE_types iterator has an entry for TImode, but
vec_duplicate:TI is invalid.
2. The mode of the operand to vec_duplicate is SImode, but it should
vary according to the vector mode iterator.
3. The second alternative of this pattern is bogus: it allows matching
symbol_refs (the cause of the PR) and const_ints (which means that it
matches (vec_duplicate (const_int ...)) which is non-canonical: such
rtxes should be const_vectors instead and handled by the main vector
move pattern).
This patch fixes all of these issues, and removes the redundant
*mve_vec_duplicate<mode> pattern.
gcc/ChangeLog:
PR target/99647
* config/arm/iterators.md (MVE_vecs): New.
(V_elem): Also handle V2DF.
* config/arm/mve.md (*mve_mov<mode>): Rename to ...
(*mve_vdup<mode>): ... this. Remove second alternative since
vec_duplicate of const_int is not canonical RTL, and we don't
want to match symbol_refs.
(*mve_vec_duplicate<mode>): Delete (pattern is redundant).
gcc/testsuite/ChangeLog:
PR target/99647
* gcc.c-torture/compile/pr99647.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MVE has different constraints than Neon for load/store: we should use
the Ux constraint instead of Um.
2021-03-24 Christophe Lyon <christophe.lyon@linaro.org>
PR target/99727
gcc/
* config/arm/mve.md (movmisalign<mode>_mve_store): Use Ux
constraint.
(movmisalign<mode>_mve_load): Likewise.
gcc/testsuite/
* gcc.target/arm/pr99727.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes around 500 ICEs in the testsuite which can be seen when
testing with -march=armv8.1-m.main+mve -mfloat-abi=hard -mpure-code
(leaving the testsuite free of ICEs in this configuration). All of the
ICEs are in arm_print_operand (which is expecting a mem and gets another
rtx, e.g. a const_vector) when running the output code for
*mve_mov<mode> in alternative 4.
The issue is that MVE vector moves were relying on the arm_reorg pass to
move constant vectors that we can't easily synthesize to the literal
pool. This doesn't work for -mpure-code where the literal pool is
disabled. LLVM puts these in .rodata: I've chosen to do the same here.
With this change, for -mpure-code, we no longer want to allow a constant
on the RHS of a vector load in RA. To achieve this, I added a new
constraint which matches constants only if the literal pool is
available.
gcc/ChangeLog:
PR target/97252
* config/arm/arm-protos.h (neon_make_constant): Add generate
argument to guard emitting insns, default to true.
* config/arm/arm.c (arm_legitimate_constant_p_1): Reject
CONST_VECTORs which neon_make_constant can't handle.
(neon_vdup_constant): Add generate argument, avoid emitting
insns if it's not set.
(neon_make_constant): Plumb new generate argument through.
* config/arm/constraints.md (Ui): New. Use it...
* config/arm/mve.md (*mve_mov<mode>): ... here.
* config/arm/vec-common.md (movv8hf): Use neon_make_constant to
synthesize constants.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vornq instructions for auto-vectorization. MVE
vornq insns in mve.md are modified to use ior instead of unspec
expression.
2021-02-01 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (supf): Remove VORNQ_S and VORNQ_U.
(VORNQ): Remove.
* config/arm/mve.md (mve_vornq_s<mode>): New entry for vorn
instruction using expression ior.
(mve_vornq_u<mode>): New expander.
(mve_vornq_f<mode>): Use ior code instead of unspec.
* config/arm/unspecs.md (VORNQ_S, VORNQ_U, VORNQ_F): Remove.
gcc/testsuite/
* gcc.target/arm/simd/mve-vorn.c: Add vorn tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds implementation for the optabs for complex operations. With this the
following C code:
void g (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
{
for (int i=0; i < N; i++)
c[i] = a[i] * b[i];
}
generates
NEON:
g:
vmov.f32 q11, #0.0 @ v4sf
add r3, r2, #1600
.L2:
vmov q8, q11 @ v4sf
vld1.32 {q10}, [r1]!
vld1.32 {q9}, [r0]!
vcmla.f32 q8, q9, q10, #0
vcmla.f32 q8, q9, q10, #90
vst1.32 {q8}, [r2]!
cmp r3, r2
bne .L2
bx lr
MVE:
g:
push {lr}
mov lr, #100
dls lr, lr
.L2:
vldrw.32 q1, [r1], #16
vldrw.32 q2, [r0], #16
vcmul.f32 q3, q2, q1, #0
vcmla.f32 q3, q2, q1, #90
vstrw.32 q3, [r2], #16
le lr, .L2
ldr pc, [sp], #4
instead of
g:
add r3, r2, #1600
.L2:
vld2.32 {d20-d23}, [r0]!
vld2.32 {d16-d19}, [r1]!
vmul.f32 q14, q11, q9
vmul.f32 q15, q11, q8
vneg.f32 q14, q14
vfma.f32 q15, q10, q9
vfma.f32 q14, q10, q8
vmov q13, q15 @ v4sf
vmov q12, q14 @ v4sf
vst2.32 {d24-d27}, [r2]!
cmp r3, r2
bne .L2
bx lr
and
g:
add r3, r2, #1600
.L2:
vld2.32 {d20-d23}, [r0]!
vld2.32 {d16-d19}, [r1]!
vmul.f32 q15, q10, q8
vmul.f32 q14, q10, q9
vmls.f32 q15, q11, q9
vmla.f32 q14, q11, q8
vmov q12, q15 @ v4sf
vmov q13, q14 @ v4sf
vst2.32 {d24-d27}, [r2]!
cmp r3, r2
bne .L2
bx lr
respectively.
gcc/ChangeLog:
* config/arm/iterators.md (rotsplit1, rotsplit2, conj_op, fcmac1,
VCMLA_OP, VCMUL_OP): New.
* config/arm/mve.md (mve_vcmlaq<mve_rot><mode>): Support vec_dup 0.
* config/arm/neon.md (cmul<conj_op><mode>3): New.
* config/arm/unspecs.md (UNSPEC_VCMLA_CONJ, UNSPEC_VCMLA180_CONJ,
UNSPEC_VCMUL_CONJ): New.
* config/arm/vec-common.md (cmul<conj_op><mode>3, arm_vcmla<rot><mode>,
cml<fcmac1><conj_op><mode>4): New.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vshr instructions for auto-vectorization. New
MVE patterns are introduced that take a vector of constants as second
operand, all constants being equal.
The existing mve_vshrq_n_<supf><mode> is kept, as it takes a single
immediate as second operand, and is used by arm_mve.h.
The vashr<mode>3 and vlshr<mode>3 expanders are moved fron neon.md to
vec-common.md, updated to rely on the normal expansion scheme to
generate shifts by immediate.
2020-12-03 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/mve.md (mve_vshrq_n_s<mode>_imm): New entry.
(mve_vshrq_n_u<mode>_imm): Likewise.
* config/arm/neon.md (vashr<mode>3, vlshr<mode>3): Move to ...
* config/arm/vec-common.md: ... here.
gcc/testsuite/
* gcc.target/arm/simd/mve-vshr.c: Add tests for vshr.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vshlq instructions for auto-vectorization.
The existing mve_vshlq_n_<supf><mode> is kept, as it takes a single
immediate as second operand, and is used by arm_mve.h.
We move the vashl<mode>3 insn from neon.md to an expander in
vec-common.md, and the mve_vshlq_<supf><mode> insn from mve.md to
vec-common.md, adding the second alternative fron neon.md.
mve_vshlq_<supf><mode> will be used by a later patch enabling
vectorization for vshr, as a unified version of
ashl3<mode3>_[signed|unsigned] from neon.md. Keeping the use of unspec
VSHLQ enables to generate both 's' and 'u' variants.
It is not clear whether the neon_shift_[reg|imm]<q> attribute is still
suitable, since this insn is also used for MVE.
I kept the mve_vshlq_<supf><mode> naming instead of renaming it to
ashl3_<supf>_<mode> as discussed because the reference in
arm_mve_builtins.def automatically inserts the "mve_" prefix and I
didn't want to make a special case for this.
I haven't yet found why the v16qi and v8hi tests are not vectorized.
With dest[i] = a[i] << b[i] and:
{
int i;
unsigned int i.24_1;
unsigned int _2;
int16_t * _3;
short int _4;
int _5;
int16_t * _6;
short int _7;
int _8;
int _9;
int16_t * _10;
short int _11;
unsigned int ivtmp_42;
unsigned int ivtmp_43;
<bb 2> [local count: 119292720]:
<bb 3> [local count: 954449105]:
i.24_1 = (unsigned int) i_23;
_2 = i.24_1 * 2;
_3 = a_15(D) + _2;
_4 = *_3;
_5 = (int) _4;
_6 = b_16(D) + _2;
_7 = *_6;
_8 = (int) _7;
_9 = _5 << _8;
_10 = dest_17(D) + _2;
_11 = (short int) _9;
*_10 = _11;
i_19 = i_23 + 1;
ivtmp_42 = ivtmp_43 - 1;
if (ivtmp_42 != 0)
goto <bb 5>; [87.50%]
else
goto <bb 4>; [12.50%]
<bb 5> [local count: 835156386]:
goto <bb 3>; [100.00%]
<bb 4> [local count: 119292720]:
return;
}
the vectorizer says:
mve-vshl.c:37:96: note: ==> examining statement: _5 = (int) _4;
mve-vshl.c:37:96: note: vect_is_simple_use: operand *_3, type of def: internal
mve-vshl.c:37:96: note: vect_is_simple_use: vectype vector(8) short int
mve-vshl.c:37:96: missed: conversion not supported by target.
mve-vshl.c:37:96: note: vect_is_simple_use: operand *_3, type of def: internal
mve-vshl.c:37:96: note: vect_is_simple_use: vectype vector(8) short int
mve-vshl.c:37:96: note: vect_is_simple_use: operand *_3, type of def: internal
mve-vshl.c:37:96: note: vect_is_simple_use: vectype vector(8) short int
mve-vshl.c:37:117: missed: not vectorized: relevant stmt not supported: _5 = (int) _4;
mve-vshl.c:37:96: missed: bad operation or unsupported loop bound.
mve-vshl.c:37:96: note: ***** Analysis failed with vector mode V8HI
2020-12-03 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/mve.md (mve_vshlq_<supf><mode>): Move to
vec-commond.md.
* config/arm/neon.md (vashl<mode>3): Delete.
* config/arm/vec-common.md (mve_vshlq_<supf><mode>): New.
(vasl<mode>3): New expander.
gcc/testsuite/
* gcc.target/arm/simd/mve-vshl.c: Add tests for vshl.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds new movmisalign<mode>_mve_load and store patterns for
MVE to help vectorization. They are very similar to their Neon
counterparts, but use different iterators and instructions.
Indeed MVE supports less vectors modes than Neon, so we use the
MVE_VLD_ST iterator where Neon uses VQX.
Since the supported modes are different from the ones valid for
arithmetic operators, we introduce two new sets of macros:
ARM_HAVE_NEON_<MODE>_LDST
true if Neon has vector load/store instructions for <MODE>
ARM_HAVE_<MODE>_LDST
true if any vector extension has vector load/store instructions for <MODE>
We move the movmisalign<mode> expander from neon.md to vec-commond.md, and
replace the TARGET_NEON enabler with ARM_HAVE_<MODE>_LDST.
The patch also updates the mve-vneg.c test to scan for the better code
generation when loading and storing the vectors involved: it checks
that no 'orr' instruction is generated to cope with misalignment at
runtime.
This test was chosen among the other mve tests, but any other should
be OK. Using a plain vector copy loop (dest[i] = a[i]) is not a good
test because the compiler chooses to use memcpy.
For instance we now generate:
test_vneg_s32x4:
vldrw.32 q3, [r1]
vneg.s32 q3, q3
vstrw.32 q3, [r0]
bx lr
instead of:
test_vneg_s32x4:
orr r3, r1, r0
lsls r3, r3, #28
bne .L15
vldrw.32 q3, [r1]
vneg.s32 q3, q3
vstrw.32 q3, [r0]
bx lr
.L15:
push {r4, r5}
ldrd r2, r3, [r1, #8]
ldrd r5, r4, [r1]
rsbs r2, r2, #0
rsbs r5, r5, #0
rsbs r4, r4, #0
rsbs r3, r3, #0
strd r5, r4, [r0]
pop {r4, r5}
strd r2, r3, [r0, #8]
bx lr
2021-01-12 Christophe Lyon <christophe.lyon@linaro.org>
PR target/97875
gcc/
* config/arm/arm.h (ARM_HAVE_NEON_V8QI_LDST): New macro.
(ARM_HAVE_NEON_V16QI_LDST, ARM_HAVE_NEON_V4HI_LDST): Likewise.
(ARM_HAVE_NEON_V8HI_LDST, ARM_HAVE_NEON_V2SI_LDST): Likewise.
(ARM_HAVE_NEON_V4SI_LDST, ARM_HAVE_NEON_V4HF_LDST): Likewise.
(ARM_HAVE_NEON_V8HF_LDST, ARM_HAVE_NEON_V4BF_LDST): Likewise.
(ARM_HAVE_NEON_V8BF_LDST, ARM_HAVE_NEON_V2SF_LDST): Likewise.
(ARM_HAVE_NEON_V4SF_LDST, ARM_HAVE_NEON_DI_LDST): Likewise.
(ARM_HAVE_NEON_V2DI_LDST): Likewise.
(ARM_HAVE_V8QI_LDST, ARM_HAVE_V16QI_LDST): Likewise.
(ARM_HAVE_V4HI_LDST, ARM_HAVE_V8HI_LDST): Likewise.
(ARM_HAVE_V2SI_LDST, ARM_HAVE_V4SI_LDST, ARM_HAVE_V4HF_LDST): Likewise.
(ARM_HAVE_V8HF_LDST, ARM_HAVE_V4BF_LDST, ARM_HAVE_V8BF_LDST): Likewise.
(ARM_HAVE_V2SF_LDST, ARM_HAVE_V4SF_LDST, ARM_HAVE_DI_LDST): Likewise.
(ARM_HAVE_V2DI_LDST): Likewise.
* config/arm/mve.md (*movmisalign<mode>_mve_store): New pattern.
(*movmisalign<mode>_mve_load): New pattern.
* config/arm/neon.md (movmisalign<mode>): Move to ...
* config/arm/vec-common.md: ... here.
PR target/97875
gcc/testsuite/
* gcc.target/arm/simd/mve-vneg.c: Update test.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This refactors the complex numbers bits of MVE to go through the same unspecs
as the NEON variant.
This is pre-work to allow code to be shared between NEON and MVE for the complex
vectorization patches.
gcc/ChangeLog:
* config/arm/arm_mve.h (__arm_vcmulq_rot90_f16):
(__arm_vcmulq_rot270_f16, _arm_vcmulq_rot180_f16, __arm_vcmulq_f16,
__arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32,
__arm_vcmulq_rot180_f32, __arm_vcmulq_f32, __arm_vcmlaq_f16,
__arm_vcmlaq_rot180_f16, __arm_vcmlaq_rot270_f16,
__arm_vcmlaq_rot90_f16, __arm_vcmlaq_f32, __arm_vcmlaq_rot180_f32,
__arm_vcmlaq_rot270_f32, __arm_vcmlaq_rot90_f32): Update builtin calls.
* config/arm/arm_mve_builtins.def (vcmulq_f, vcmulq_rot90_f,
vcmulq_rot180_f, vcmulq_rot270_f, vcmlaq_f, vcmlaq_rot90_f,
vcmlaq_rot180_f, vcmlaq_rot270_f): Removed.
(vcmulq, vcmulq_rot90, vcmulq_rot180, vcmulq_rot270, vcmlaq,
vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270): New.
* config/arm/iterators.md (mve_rot): Add UNSPEC_VCMLA, UNSPEC_VCMLA90,
UNSPEC_VCMLA180, UNSPEC_VCMLA270, UNSPEC_VCMUL, UNSPEC_VCMUL90,
UNSPEC_VCMUL180, UNSPEC_VCMUL270.
(VCMUL): New.
* config/arm/mve.md (mve_vcmulq_f<mode, mve_vcmulq_rot180_f<mode>,
mve_vcmulq_rot270_f<mode>, mve_vcmulq_rot90_f<mode>, mve_vcmlaq_f<mode>,
mve_vcmlaq_rot180_f<mode>, mve_vcmlaq_rot270_f<mode>,
mve_vcmlaq_rot90_f<mode>): Removed.
(mve_vcmlaq<mve_rot><mode>, mve_vcmulq<mve_rot><mode>,
mve_vcaddq<mve_rot><mode>, cadd<rot><mode>3, mve_vcaddq<mve_rot><mode>):
New.
* config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270, UNSPEC_VCMUL,
UNSPEC_VCMUL180): New.
(VCMULQ_F, VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F,
VCMLAQ_F, VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F): Removed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds implementation for the optabs for complex additions. With this the
following C code:
void f90 (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
{
for (int i=0; i < N; i++)
c[i] = a[i] + (b[i] * I);
}
generates
f90:
add r3, r2, #1600
.L2:
vld1.32 {q8}, [r0]!
vld1.32 {q9}, [r1]!
vcadd.f32 q8, q8, q9, #90
vst1.32 {q8}, [r2]!
cmp r3, r2
bne .L2
bx lr
instead of
f90:
add r3, r2, #1600
.L2:
vld2.32 {d24-d27}, [r0]!
vld2.32 {d20-d23}, [r1]!
vsub.f32 q8, q12, q11
vadd.f32 q9, q13, q10
vst2.32 {d16-d19}, [r2]!
cmp r3, r2
bne .L2
bx lr
gcc/ChangeLog:
* config/arm/arm_mve.h (__arm_vcaddq_rot90_u8, __arm_vcaddq_rot270_u8,
__arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8,
__arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16,
__arm_vcaddq_rot90_s16, __arm_vcaddq_rot270_s16,
__arm_vcaddq_rot90_u32, __arm_vcaddq_rot270_u32,
__arm_vcaddq_rot90_s32, __arm_vcaddq_rot270_s32,
__arm_vcaddq_rot90_f16, __arm_vcaddq_rot270_f16,
__arm_vcaddq_rot90_f32, __arm_vcaddq_rot270_f32): Update builtin calls.
* config/arm/arm_mve_builtins.def (vcaddq_rot90_u, vcaddq_rot270_u,
vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f, vcaddq_rot270_f):
Removed.
(vcaddq_rot90, vcaddq_rot270): New.
* config/arm/constraints.md (Dz): Include MVE.
* config/arm/iterators.md (mve_rot): New.
(supf): Remove VCADDQ_ROT270_S, VCADDQ_ROT270_U, VCADDQ_ROT90_S,
VCADDQ_ROT90_U.
(VCADDQ_ROT270, VCADDQ_ROT90): Removed.
* config/arm/mve.md (mve_vcaddq_rot270_<supf><mode,
mve_vcaddq_rot90_<supf><mode>, mve_vcaddq_rot270_f<mode>,
mve_vcaddq_rot90_f<mode>): Removed.
(mve_vcaddq<mve_rot><mode>, mve_vcaddq<mve_rot><mode>): New.
* config/arm/unspecs.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S,
VCADDQ_ROT270_U, VCADDQ_ROT90_U, VCADDQ_ROT270_F,
VCADDQ_ROT90_F): Removed.
* config/arm/vec-common.md (cadd<rot><mode>3): New.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vneg instructions for auto-vectorization. MVE
vnegq insns in mve.md are modified to use 'neg' instead of unspec
expression. The neg<mode>2 expander is added to vec-common.md.
Existing patterns in neon.md are prefixed with neon_.
It's not clear why we have different patterns for VDQW
and VH in neon.md, when WDQWH handles both, and patterns
with VDQ have provision for attributes for FP modes.
Another question is why <absneg_str><mode>2 always sets
neon_abs<q> type when it also handles neon_neq<q> cases.
2020-12-11 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/mve.md (mve_vnegq_f): Use 'neg' instead of unspec.
(mve_vnegq_s): Likewise.
* config/arm/neon.md (neg<mode>2): Rename into neon_neg<mode>2.
(<absneg_str><mode>2): Rename into neon_<absneg_str><mode>2.
(neon_v<absneg_str><mode>): Call gen_neon_<absneg_str><mode>2.
(vashr<mode>3): Call gen_neon_neg<mode>2.
(vlshr<mode>3): Call gen_neon_neg<mode>2.
(neon_vneg<mode>): Call gen_neon_neg<mode>2.
* config/arm/unspecs.md (VNEGQ_F, VNEGQ_S): Remove.
* config/arm/vec-common.md (neg<mode>2): New expander.
gcc/testsuite/
* gcc.target/arm/simd/mve-vneg.c: Add tests for vneg.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vmvnq instructions for auto-vectorization. MVE
vmvnq insns in mve.md are modified to use 'not' instead of unspec
expression to support one_cmpl<mode>2. The one_cmpl<mode>2 expander
is added to vec-common.md.
2020-12-11 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (VDQNOTM2): New mode iterator.
(supf): Remove VMVNQ_S and VMVNQ_U.
(VMVNQ): Remove.
* config/arm/mve.md (mve_vmvnq_u<mode>): New entry for vmvn
instruction using expression not.
(mve_vmvnq_s<mode>): New expander.
* config/arm/neon.md (one_cmpl<mode>2): Renamed into
one_cmpl<mode>2_neon.
* config/arm/unspecs.md (VMVNQ_S, VMVNQ_U): Remove.
* config/arm/vec-common.md (one_cmpl<mode>2): New expander.
gcc/testsuite/
* gcc.target/arm/simd/mve-vmvn.c: Add tests for vmvn.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vbic instructions for auto-vectorization. MVE
vbicq insns in mve.md are modified to use 'and not' instead of unspec
expression.
2020-12-11 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (supf): Remove VBICQ_S and VBICQ_U.
(VBICQ): Remove.
* config/arm/mve.md (mve_vbicq_u<mode>): New entry for vbic
instruction using expression and not.
(mve_vbicq_s<mode>): New expander.
(mve_vbicq_f<mode>): Replace use of unspec by 'and not'.
* config/arm/unspecs.md (VBICQ_S, VBICQ_U, VBICQ_F): Remove.
gcc/testsuite/
* gcc.target/arm/simd/mve-vbic.c: Add tests for vbic.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE veorq instructions for auto-vectorization. MVE
veorq insns in mve.md are modified to use xor instead of unspec
expression to support xor<mode>3. The xor<mode>3 expander is added to
vec-common.md
2020-12-11 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (supf): Remove VEORQ_S and VEORQ_U.
(VEORQ): Remove.
* config/arm/mve.md (mve_veorq_u<mode>): New entry for veor
instruction using expression xor.
(mve_veorq_s<mode>): New expander.
(mve_veorq_f<mode>): Use 'xor' code instead of unspec.
* config/arm/neon.md (xor<mode>3): Renamed into xor<mode>3_neon.
* config/arm/unspecs.md (VEORQ_S, VEORQ_U, VEORQ_F): Remove.
* config/arm/vec-common.md (xor<mode>3): New expander.
gcc/testsuite/
* gcc.target/arm/simd/mve-veor.c: Add tests for veor.
|
|
|
|
|
|
|
|
| |
and FMA."
This reverts commit 3b8a82f97dd48e153ce93b317c44254839e11461.
Has a dependency on the AArch64 patch which hasn't been approved yet.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds implementation for the optabs for complex additions. With this the
following C code:
void f90 (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
{
for (int i=0; i < N; i++)
c[i] = a[i] + (b[i] * I);
}
generates
f90:
add r3, r2, #1600
.L2:
vld1.32 {q8}, [r0]!
vld1.32 {q9}, [r1]!
vcadd.f32 q8, q8, q9, #90
vst1.32 {q8}, [r2]!
cmp r3, r2
bne .L2
bx lr
instead of
f90:
add r3, r2, #1600
.L2:
vld2.32 {d24-d27}, [r0]!
vld2.32 {d20-d23}, [r1]!
vsub.f32 q8, q12, q11
vadd.f32 q9, q13, q10
vst2.32 {d16-d19}, [r2]!
cmp r3, r2
bne .L2
bx lr
gcc/ChangeLog:
* config/arm/arm_mve.h (__arm_vcaddq_rot90_u8, __arm_vcaddq_rot270_u8,
, __arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8,
__arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16, __arm_vcaddq_rot90_s16,
__arm_vcaddq_rot270_s16, __arm_vcaddq_rot90_u32,
__arm_vcaddq_rot270_u32, __arm_vcaddq_rot90_s32,
__arm_vcaddq_rot270_s32, __arm_vcmulq_rot90_f16,
__arm_vcmulq_rot270_f16, __arm_vcmulq_rot180_f16,
__arm_vcmulq_f16, __arm_vcaddq_rot90_f16, __arm_vcaddq_rot270_f16,
__arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32,
__arm_vcmulq_rot180_f32, __arm_vcmulq_f32, __arm_vcaddq_rot90_f32,
__arm_vcaddq_rot270_f32, __arm_vcmlaq_f16, __arm_vcmlaq_rot180_f16,
__arm_vcmlaq_rot270_f16, __arm_vcmlaq_rot90_f16, __arm_vcmlaq_f32,
__arm_vcmlaq_rot180_f32, __arm_vcmlaq_rot270_f32,
__arm_vcmlaq_rot90_f32): Update builtin calls.
* config/arm/arm_mve_builtins.def (vcaddq_rot90_u, vcaddq_rot270_u,
vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f, vcaddq_rot270_f,
vcmulq_f, vcmulq_rot90_f, vcmulq_rot180_f, vcmulq_rot270_f,
vcmlaq_f, vcmlaq_rot90_f, vcmlaq_rot180_f, vcmlaq_rot270_f): Removed.
(vcaddq_rot90, vcaddq_rot270, vcmulq, vcmulq_rot90, vcmulq_rot180,
vcmulq_rot270, vcmlaq, vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270):
New.
* config/arm/constraints.md (Dz): Include MVE.
* config/arm/iterators.md (mve_rotsplit1, mve_rotsplit2): New.
(rot): Add UNSPEC_VCMLS, UNSPEC_VCMUL and UNSPEC_VCMUL180.
(rot_op, rotsplit1, rotsplit2, fcmac1, VCMLA_OP, VCMUL_OP): New.
* config/arm/mve.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S, VCADDQ_ROT270_U,
VCADDQ_ROT90_U, VCADDQ_ROT270_F, VCADDQ_ROT90_F, VCMULQ_F,
VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F, VCMLAQ_F,
VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F, VCADDQ_ROT270_S,
VCADDQ_ROT270, VCADDQ_ROT90): Removed.
(mve_rot, VCMUL): New.
(mve_vcaddq_rot270_<supf><mode, mve_vcaddq_rot90_<supf><mode>,
mve_vcaddq_rot270_f<mode>, mve_vcaddq_rot90_f<mode>, mve_vcmulq_f<mode,
mve_vcmulq_rot180_f<mode>, mve_vcmulq_rot270_f<mode>,
mve_vcmulq_rot90_f<mode>, mve_vcmlaq_f<mode>, mve_vcmlaq_rot180_f<mode>,
mve_vcmlaq_rot270_f<mode>, mve_vcmlaq_rot90_f<mode>): Removed.
(mve_vcmlaq<mve_rot><mode>, mve_vcmulq<mve_rot><mode>,
mve_vcaddq<mve_rot><mode>, cadd<rot><mode>3, mve_vcaddq<mve_rot><mode>):
New.
(cmul<rot_op><mode>3): Exclude MVE types.
* config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270): New.
* config/arm/vec-common.md (cadd<rot><mode>3, cmul<rot_op><mode>3,
arm_vcmla<rot><mode>, cml<fcmac1><rot_op><mode>4): New.
* config/arm/unspecs.md (UNSPEC_VCMUL, UNSPEC_VCMUL180, UNSPEC_VCMLS,
UNSPEC_VCMLS180): New.
* config/arm/neon.md (cmul<rot_op><mode>3): New.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vorrq instructions for auto-vectorization. MVE
vorrq insns in mve.md are modified to use ior instead of unspec
expression to support ior<mode>3. The ior<mode>3 expander is added to
vec-common.md
2020-12-03 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (supf): Remove VORRQ_S and VORRQ_U.
(VORRQ): Remove.
* config/arm/mve.md (mve_vorrq_s<mode>): New entry for vorr
instruction using expression ior.
(mve_vorrq_u<mode>): New expander.
(mve_vorrq_f<mode>): Use ior code instead of unspec.
* config/arm/neon.md (ior<mode>3): Renamed into ior<mode>3_neon.
* config/arm/predicates.md (imm_for_neon_logic_operand): Enable
for MVE.
* config/arm/unspecs.md (VORRQ_S, VORRQ_U, VORRQ_F): Remove.
* config/arm/vec-common.md (ior<mode>3): New expander.
gcc/testsuite/
* gcc.target/arm/simd/mve-vorr.c: Add vorr tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vandq instructions for auto-vectorization. MVE
vandq insns in mve.md are modified to use 'and' instead of unspec
expression to support and<mode>3. The and<mode>3 expander is added to
vec-common.md
2020-12-03 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
(VANQ): Remove.
(VDQ): Add TARGET_HAVE_MVE condition where relevant.
* config/arm/mve.md (mve_vandq_u<mode>): New entry for vand
instruction using expression 'and'.
(mve_vandq_s<mode>): New expander.
(mve_vaddq_n_f<mode>): Use 'and' code instead of unspec.
* config/arm/neon.md (and<mode>3): Rename into and<mode>3_neon.
* config/arm/predicates.md (imm_for_neon_inv_logic_operand):
Enable for MVE.
* config/arm/unspecs.md (VANDQ_S, VANDQ_U, VANDQ_F): Remove.
* config/arm/vec-common.md (and<mode>3): New expander.
gcc/testsuite/
* gcc.target/arm/simd/mve-vand.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vsub instructions for auto-vectorization.
The sub<mode>3 in vec-common.md is modified to use new mode macros
to include MVE extension for vectorization. MVE vsub insns in mve.md are
modified to use 'minus' instead of unspec expression to support
sub<mode>3. Use VDQ instead fo VALL to cover all supported modes. The
redundant sub<mode>3 insns in neon.md are then removed.
gcc/ChangeLog:
2020-10-23 Dennis Zhang <dennis.zhang@arm.com>
* config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
using expression 'minus'.
(mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
* config/arm/neon.md (sub<mode>3, sub<mode>3_fp16): Removed.
(neon_vsub<mode>): Use gen_sub<mode>3 instead of gen_sub<mode>3_fp16.
* config/arm/vec-common.md (sub<mode>3): Use the new mode macros
ARM_HAVE_<MODE>_ARITH. Use iterator VDQ instead of VALL.
gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/mve-vsub_1.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vmin/vmax instructions for auto-vectorization.
MVE target is included in expander smin<mode>3, umin<mode>3, smax<mode>3
and umax<mode>3 for vectorization. Related insns for vmin/vmax in mve.md
are modified to use smin, umin, smax and umax expressions instead of
unspec to support the expanders.
gcc/ChangeLog:
2020-10-22 Dennis Zhang <dennis.zhang@arm.com>
* config/arm/mve.md (mve_vmaxq_<supf><mode>): Replace with ...
(mve_vmaxq_s<mode>, mve_vmaxq_u<mode>): ... these new insns to
use smax/umax instead of VMAXQ.
(mve_vminq_<supf><mode>): Replace with ...
(mve_vminq_s<mode>, mve_vminq_u<mode>): ... these new insns to
use smin/umin instead of VMINQ.
(mve_vmaxnmq_f<mode>): Use smax instead of VMAXNMQ_F.
(mve_vminnmq_f<mode>): Use smin instead of VMINNMQ_F.
* config/arm/vec-common.md (smin<mode>3): Use the new mode macros
ARM_HAVE_<MODE>_ARITH.
(umin<mode>3, smax<mode>3, umax<mode>3): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/mve-vminmax_1.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables MVE vmul instructions for auto-vectorization.
It includes MVE in expander mul<mode>3 to enable vectorization for MVE.
Related MVE vmul insns are modified to support the expander by using
expression 'mult' instead of unspec.
The mul<mode>3 for vectorization in vec-common.md uses mode iterator
VDQWH instead of VALLW to cover all supported modes.
The macros ARM_HAVE_NEON_<MODE>_ARITH are used to select supported
modes for different targets.
The redundant mul<mode>3 in neon.md is removed.
gcc/ChangeLog:
2020-10-22 Dennis Zhang <dennis.zhang@arm.com>
* config/arm/mve.md (mve_vmulq<mode>): New entry for vmul instruction
using expression 'mult'.
(mve_vmulq_f<mode>): Use mult instead of VMULQ_F.
* config/arm/neon.md (mul<mode>3): Removed.
* config/arm/vec-common.md (mul<mode>3): Use the new mode macros
ARM_HAVE_<MODE>_ARITH. Use mode iterator VDQWH instead of VALLW.
gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/mve-vmul_1.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
intrinsics with -O2 (PR97271).
This patch fixes (PR97271) the wrong code-gen for mve scatter store with writeback intrinsics with -O2.
$cat bug.c
void
foo (uint32x4_t * addr, const int offset, int32x4_t value)
{
vstrwq_scatter_base_wb_s32 (addr, 8, value);
}
$ arm-none-eabi-gcc bug.c -S -O2 -march=armv8.1-m.main+mve -mfloat-abi=hard -o -
Without this patch:
...
foo:
vldrw.32 q3, [r0]
vstrw.u32 q0, [q3, #8]! ---> (A)
vldr.64 d4, .L3
vldr.64 d5, .L3+8
vldrw.32 q3, [r0]
vstrw.u32 q2, [q3, #8]! ---> (B)
bx lr
...
With this patch:
...
foo:
vldrw.32 q3, [r0]
vstrw.u32 q0, [q3, #8]! --> (C)
vstrw.32 q3, [r0]
bx lr
...
Without this patch 2 vstrw assembly instructions (A and B) are generated for vstrwq_scatter_base_wb_s32
intrinsic where as fix generates only one vstrw assembly instruction (C).
gcc/ChangeLog:
2020-10-06 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
PR target/97291
* config/arm/arm-builtins.c (arm_strsbwbs_qualifiers): Modify array.
(arm_strsbwbu_qualifiers): Likewise.
(arm_strsbwbs_p_qualifiers): Likewise.
(arm_strsbwbu_p_qualifiers): Likewise.
* config/arm/arm_mve.h (__arm_vstrdq_scatter_base_wb_s64): Modify
function definition.
(__arm_vstrdq_scatter_base_wb_u64): Likewise.
(__arm_vstrdq_scatter_base_wb_p_s64): Likewise.
(__arm_vstrdq_scatter_base_wb_p_u64): Likewise.
(__arm_vstrwq_scatter_base_wb_p_s32): Likewise.
(__arm_vstrwq_scatter_base_wb_p_u32): Likewise.
(__arm_vstrwq_scatter_base_wb_s32): Likewise.
(__arm_vstrwq_scatter_base_wb_u32): Likewise.
(__arm_vstrwq_scatter_base_wb_f32): Likewise.
(__arm_vstrwq_scatter_base_wb_p_f32): Likewise.
* config/arm/arm_mve_builtins.def (vstrwq_scatter_base_wb_add_u): Remove
expansion for the builtin.
(vstrwq_scatter_base_wb_add_s): Likewise.
(vstrwq_scatter_base_wb_add_f): Likewise.
(vstrdq_scatter_base_wb_add_u): Likewise.
(vstrdq_scatter_base_wb_add_s): Likewise.
(vstrwq_scatter_base_wb_p_add_u): Likewise.
(vstrwq_scatter_base_wb_p_add_s): Likewise.
(vstrwq_scatter_base_wb_p_add_f): Likewise.
(vstrdq_scatter_base_wb_p_add_u): Likewise.
(vstrdq_scatter_base_wb_p_add_s): Likewise.
* config/arm/mve.md (mve_vstrwq_scatter_base_wb_<supf>v4si): Remove
expand.
(mve_vstrwq_scatter_base_wb_add_<supf>v4si): Likewise.
(mve_vstrwq_scatter_base_wb_<supf>v4si_insn): Rename pattern to ...
(mve_vstrwq_scatter_base_wb_<supf>v4si): This.
(mve_vstrwq_scatter_base_wb_p_<supf>v4si): Remove expand.
(mve_vstrwq_scatter_base_wb_p_add_<supf>v4si): Likewise.
(mve_vstrwq_scatter_base_wb_p_<supf>v4si_insn): Rename pattern to ...
(mve_vstrwq_scatter_base_wb_p_<supf>v4si): This.
(mve_vstrwq_scatter_base_wb_fv4sf): Remove expand.
(mve_vstrwq_scatter_base_wb_add_fv4sf): Likewise.
(mve_vstrwq_scatter_base_wb_fv4sf_insn): Rename pattern to ...
(mve_vstrwq_scatter_base_wb_fv4sf): This.
(mve_vstrwq_scatter_base_wb_p_fv4sf): Remove expand.
(mve_vstrwq_scatter_base_wb_p_add_fv4sf): Likewise.
(mve_vstrwq_scatter_base_wb_p_fv4sf_insn): Rename pattern to ...
(mve_vstrwq_scatter_base_wb_p_fv4sf): This.
(mve_vstrdq_scatter_base_wb_<supf>v2di): Remove expand.
(mve_vstrdq_scatter_base_wb_add_<supf>v2di): Likewise.
(mve_vstrdq_scatter_base_wb_<supf>v2di_insn): Rename pattern to ...
(mve_vstrdq_scatter_base_wb_<supf>v2di): This.
(mve_vstrdq_scatter_base_wb_p_<supf>v2di): Remove expand.
(mve_vstrdq_scatter_base_wb_p_add_<supf>v2di): Likewise.
(mve_vstrdq_scatter_base_wb_p_<supf>v2di_insn): Rename pattern to ...
(mve_vstrdq_scatter_base_wb_p_<supf>v2di): This.
gcc/testsuite/ChangeLog:
PR target/97291
* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_s64.c: Modify.
* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_p_u64.c:
Likewise.
* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrdq_scatter_base_wb_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c:
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c:
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c:
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few MVE intrinsics had an unsigned variant implement while they are
supported by the hardware. This patch removes them:
__arm_vqrdmlashq_n_u8
__arm_vqrdmlahq_n_u8
__arm_vqdmlahq_n_u8
__arm_vqrdmlashq_n_u16
__arm_vqrdmlahq_n_u16
__arm_vqdmlahq_n_u16
__arm_vqrdmlashq_n_u32
__arm_vqrdmlahq_n_u32
__arm_vqdmlahq_n_u32
__arm_vmlaldavaxq_p_u32
__arm_vmlaldavaxq_p_u16
2020-10-08 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
PR target/96914
* config/arm/arm_mve.h (vqrdmlashq_n_u8, vqrdmlashq_n_u16)
(vqrdmlashq_n_u32, vqrdmlahq_n_u8, vqrdmlahq_n_u16)
(vqrdmlahq_n_u32, vqdmlahq_n_u8, vqdmlahq_n_u16, vqdmlahq_n_u32)
(vmlaldavaxq_p_u16, vmlaldavaxq_p_u32): Remove.
* config/arm/arm_mve_builtins.def (vqrdmlashq_n_u, vqrdmlahq_n_u)
(vqdmlahq_n_u, vmlaldavaxq_p_u): Remove.
* config/arm/unspecs.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
(VQRDMLASHQ_N_U)
(VMLALDAVAXQ_P_U): Remove unspecs.
* config/arm/iterators.md (VQDMLAHQ_N_U, VQRDMLAHQ_N_U)
(VQRDMLASHQ_N_U, VMLALDAVAXQ_P_U): Remove attributes.
(VQDMLAHQ_N, VQRDMLAHQ_N, VQRDMLASHQ_N, VMLALDAVAXQ_P): Remove
unsigned variants from iterators.
* config/arm/mve.md (mve_vqdmlahq_n_<supf><mode>)
(mve_vqrdmlahq_n_<supf><mode>)
(mve_vqrdmlashq_n_<supf><mode>, mve_vmlaldavaxq_p_<supf><mode>):
Update comment.
gcc/testsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_u8.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlahq_n_u8.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u16.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u32.c: Remove.
* gcc.target/arm/mve/intrinsics/vqrdmlashq_n_u8.c: Remove.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds:
vqdmlashq_m_n_s16
vqdmlashq_m_n_s32
vqdmlashq_m_n_s8
vqdmlashq_n_s16
vqdmlashq_n_s32
vqdmlashq_n_s8
2020-10-08 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
PR target/96914
* config/arm/arm_mve.h (vqdmlashq, vqdmlashq_m): Define.
* config/arm/arm_mve_builtins.def (vqdmlashq_n_s)
(vqdmlashq_m_n_s,): New.
* config/arm/unspecs.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New
unspecs.
* config/arm/iterators.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New
attributes.
(VQDMLASHQ_N): New iterator.
* config/arm/mve.md (mve_vqdmlashq_n_, mve_vqdmlashq_m_n_s): New
patterns.
gcc/testsuite/
PR target/96914
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
consistency.
To maintain consistency with other Arm Architectures backend, iterators and iterator attributes are moved
from mve.md file to iterators.md. Also move enumerators for MVE unspecs from mve.md file to unspecs.md file.
gcc/ChangeLog:
2020-10-06 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
* config/arm/iterators.md (MVE_types): Move mode iterator from mve.md to
iterators.md.
(MVE_VLD_ST): Likewise.
(MVE_0): Likewise.
(MVE_1): Likewise.
(MVE_3): Likewise.
(MVE_2): Likewise.
(MVE_5): Likewise.
(MVE_6): Likewise.
(MVE_CNVT): Move mode attribute iterator from mve.md to iterators.md.
(MVE_LANES): Likewise.
(MVE_constraint): Likewise.
(MVE_constraint1): Likewise.
(MVE_constraint2): Likewise.
(MVE_constraint3): Likewise.
(MVE_pred): Likewise.
(MVE_pred1): Likewise.
(MVE_pred2): Likewise.
(MVE_pred3): Likewise.
(MVE_B_ELEM): Likewise.
(MVE_H_ELEM): Likewise.
(V_sz_elem1): Likewise.
(V_extr_elem): Likewise.
(earlyclobber_32): Likewise.
(supf): Move int attribute from mve.md to iterators.md.
(mode1): Likewise.
(VCVTQ_TO_F): Move int iterator from mve.md to iterators.md.
(VMVNQ_N): Likewise.
(VREV64Q): Likewise.
(VCVTQ_FROM_F): Likewise.
(VREV16Q): Likewise.
(VCVTAQ): Likewise.
(VMVNQ): Likewise.
(VDUPQ_N): Likewise.
(VCLZQ): Likewise.
(VADDVQ): Likewise.
(VREV32Q): Likewise.
(VMOVLBQ): Likewise.
(VMOVLTQ): Likewise.
(VCVTPQ): Likewise.
(VCVTNQ): Likewise.
(VCVTMQ): Likewise.
(VADDLVQ): Likewise.
(VCTPQ): Likewise.
(VCTPQ_M): Likewise.
(VCVTQ_N_TO_F): Likewise.
(VCREATEQ): Likewise.
(VSHRQ_N): Likewise.
(VCVTQ_N_FROM_F): Likewise.
(VADDLVQ_P): Likewise.
(VCMPNEQ): Likewise.
(VSHLQ): Likewise.
(VABDQ): Likewise.
(VADDQ_N): Likewise.
(VADDVAQ): Likewise.
(VADDVQ_P): Likewise.
(VANDQ): Likewise.
(VBICQ): Likewise.
(VBRSRQ_N): Likewise.
(VCADDQ_ROT270): Likewise.
(VCADDQ_ROT90): Likewise.
(VCMPEQQ): Likewise.
(VCMPEQQ_N): Likewise.
(VCMPNEQ_N): Likewise.
(VEORQ): Likewise.
(VHADDQ): Likewise.
(VHADDQ_N): Likewise.
(VHSUBQ): Likewise.
(VHSUBQ_N): Likewise.
(VMAXQ): Likewise.
(VMAXVQ): Likewise.
(VMINQ): Likewise.
(VMINVQ): Likewise.
(VMLADAVQ): Likewise.
(VMULHQ): Likewise.
(VMULLBQ_INT): Likewise.
(VMULLTQ_INT): Likewise.
(VMULQ): Likewise.
(VMULQ_N): Likewise.
(VORNQ): Likewise.
(VORRQ): Likewise.
(VQADDQ): Likewise.
(VQADDQ_N): Likewise.
(VQRSHLQ): Likewise.
(VQRSHLQ_N): Likewise.
(VQSHLQ): Likewise.
(VQSHLQ_N): Likewise.
(VQSHLQ_R): Likewise.
(VQSUBQ): Likewise.
(VQSUBQ_N): Likewise.
(VRHADDQ): Likewise.
(VRMULHQ): Likewise.
(VRSHLQ): Likewise.
(VRSHLQ_N): Likewise.
(VRSHRQ_N): Likewise.
(VSHLQ_N): Likewise.
(VSHLQ_R): Likewise.
(VSUBQ): Likewise.
(VSUBQ_N): Likewise.
(VADDLVAQ): Likewise.
(VBICQ_N): Likewise.
(VMLALDAVQ): Likewise.
(VMLALDAVXQ): Likewise.
(VMOVNBQ): Likewise.
(VMOVNTQ): Likewise.
(VORRQ_N): Likewise.
(VQMOVNBQ): Likewise.
(VQMOVNTQ): Likewise.
(VSHLLBQ_N): Likewise.
(VSHLLTQ_N): Likewise.
(VRMLALDAVHQ): Likewise.
(VBICQ_M_N): Likewise.
(VCVTAQ_M): Likewise.
(VCVTQ_M_TO_F): Likewise.
(VQRSHRNBQ_N): Likewise.
(VABAVQ): Likewise.
(VSHLCQ): Likewise.
(VRMLALDAVHAQ): Likewise.
(VADDVAQ_P): Likewise.
(VCLZQ_M): Likewise.
(VCMPEQQ_M_N): Likewise.
(VCMPEQQ_M): Likewise.
(VCMPNEQ_M_N): Likewise.
(VCMPNEQ_M): Likewise.
(VDUPQ_M_N): Likewise.
(VMAXVQ_P): Likewise.
(VMINVQ_P): Likewise.
(VMLADAVAQ): Likewise.
(VMLADAVQ_P): Likewise.
(VMLAQ_N): Likewise.
(VMLASQ_N): Likewise.
(VMVNQ_M): Likewise.
(VPSELQ): Likewise.
(VQDMLAHQ_N): Likewise.
(VQRDMLAHQ_N): Likewise.
(VQRDMLASHQ_N): Likewise.
(VQRSHLQ_M_N): Likewise.
(VQSHLQ_M_R): Likewise.
(VREV64Q_M): Likewise.
(VRSHLQ_M_N): Likewise.
(VSHLQ_M_R): Likewise.
(VSLIQ_N): Likewise.
(VSRIQ_N): Likewise.
(VMLALDAVQ_P): Likewise.
(VQMOVNBQ_M): Likewise.
(VMOVLTQ_M): Likewise.
(VMOVNBQ_M): Likewise.
(VRSHRNTQ_N): Likewise.
(VORRQ_M_N): Likewise.
(VREV32Q_M): Likewise.
(VREV16Q_M): Likewise.
(VQRSHRNTQ_N): Likewise.
(VMOVNTQ_M): Likewise.
(VMOVLBQ_M): Likewise.
(VMLALDAVAQ): Likewise.
(VQSHRNBQ_N): Likewise.
(VSHRNBQ_N): Likewise.
(VRSHRNBQ_N): Likewise.
(VMLALDAVXQ_P): Likewise.
(VQMOVNTQ_M): Likewise.
(VMVNQ_M_N): Likewise.
(VQSHRNTQ_N): Likewise.
(VMLALDAVAXQ): Likewise.
(VSHRNTQ_N): Likewise.
(VCVTMQ_M): Likewise.
(VCVTNQ_M): Likewise.
(VCVTPQ_M): Likewise.
(VCVTQ_M_N_FROM_F): Likewise.
(VCVTQ_M_FROM_F): Likewise.
(VRMLALDAVHQ_P): Likewise.
(VADDLVAQ_P): Likewise.
(VABAVQ_P): Likewise.
(VSHLQ_M): Likewise.
(VSRIQ_M_N): Likewise.
(VSUBQ_M): Likewise.
(VCVTQ_M_N_TO_F): Likewise.
(VHSUBQ_M): Likewise.
(VSLIQ_M_N): Likewise.
(VRSHLQ_M): Likewise.
(VMINQ_M): Likewise.
(VMULLBQ_INT_M): Likewise.
(VMULHQ_M): Likewise.
(VMULQ_M): Likewise.
(VHSUBQ_M_N): Likewise.
(VHADDQ_M_N): Likewise.
(VORRQ_M): Likewise.
(VRMULHQ_M): Likewise.
(VQADDQ_M): Likewise.
(VRSHRQ_M_N): Likewise.
(VQSUBQ_M_N): Likewise.
(VADDQ_M): Likewise.
(VORNQ_M): Likewise.
(VRHADDQ_M): Likewise.
(VQSHLQ_M): Likewise.
(VANDQ_M): Likewise.
(VBICQ_M): Likewise.
(VSHLQ_M_N): Likewise.
(VCADDQ_ROT270_M): Likewise.
(VQRSHLQ_M): Likewise.
(VQADDQ_M_N): Likewise.
(VADDQ_M_N): Likewise.
(VMAXQ_M): Likewise.
(VQSUBQ_M): Likewise.
(VMLASQ_M_N): Likewise.
(VMLADAVAQ_P): Likewise.
(VBRSRQ_M_N): Likewise.
(VMULQ_M_N): Likewise.
(VCADDQ_ROT90_M): Likewise.
(VMULLTQ_INT_M): Likewise.
(VEORQ_M): Likewise.
(VSHRQ_M_N): Likewise.
(VSUBQ_M_N): Likewise.
(VHADDQ_M): Likewise.
(VABDQ_M): Likewise.
(VMLAQ_M_N): Likewise.
(VQSHLQ_M_N): Likewise.
(VMLALDAVAQ_P): Likewise.
(VMLALDAVAXQ_P): Likewise.
(VQRSHRNBQ_M_N): Likewise.
(VQRSHRNTQ_M_N): Likewise.
(VQSHRNBQ_M_N): Likewise.
(VQSHRNTQ_M_N): Likewise.
(VRSHRNBQ_M_N): Likewise.
(VRSHRNTQ_M_N): Likewise.
(VSHLLBQ_M_N): Likewise.
(VSHLLTQ_M_N): Likewise.
(VSHRNBQ_M_N): Likewise.
(VSHRNTQ_M_N): Likewise.
(VSTRWSBQ): Likewise.
(VSTRBSOQ): Likewise.
(VSTRBQ): Likewise.
(VLDRBGOQ): Likewise.
(VLDRBQ): Likewise.
(VLDRWGBQ): Likewise.
(VLD1Q): Likewise.
(VLDRHGOQ): Likewise.
(VLDRHGSOQ): Likewise.
(VLDRHQ): Likewise.
(VLDRWQ): Likewise.
(VLDRDGBQ): Likewise.
(VLDRDGOQ): Likewise.
(VLDRDGSOQ): Likewise.
(VLDRWGOQ): Likewise.
(VLDRWGSOQ): Likewise.
(VST1Q): Likewise.
(VSTRHSOQ): Likewise.
(VSTRHSSOQ): Likewise.
(VSTRHQ): Likewise.
(VSTRWQ): Likewise.
(VSTRDSBQ): Likewise.
(VSTRDSOQ): Likewise.
(VSTRDSSOQ): Likewise.
(VSTRWSOQ): Likewise.
(VSTRWSSOQ): Likewise.
(VSTRWSBWBQ): Likewise.
(VLDRWGBWBQ): Likewise.
(VSTRDSBWBQ): Likewise.
(VLDRDGBWBQ): Likewise.
(VADCIQ): Likewise.
(VADCIQ_M): Likewise.
(VSBCQ): Likewise.
(VSBCQ_M): Likewise.
(VSBCIQ): Likewise.
(VSBCIQ_M): Likewise.
(VADCQ): Likewise.
(VADCQ_M): Likewise.
(UQRSHLLQ): Likewise.
(SQRSHRLQ): Likewise.
(VSHLCQ_M): Likewise.
* config/arm/mve.md (MVE_types): Move mode iterator to iterators.md from mve.md.
(MVE_VLD_ST): Likewise.
(MVE_0): Likewise.
(MVE_1): Likewise.
(MVE_3): Likewise.
(MVE_2): Likewise.
(MVE_5): Likewise.
(MVE_6): Likewise.
(MVE_CNVT): Move mode attribute iterator to iterators.md from mve.md.
(MVE_LANES): Likewise.
(MVE_constraint): Likewise.
(MVE_constraint1): Likewise.
(MVE_constraint2): Likewise.
(MVE_constraint3): Likewise.
(MVE_pred): Likewise.
(MVE_pred1): Likewise.
(MVE_pred2): Likewise.
(MVE_pred3): Likewise.
(MVE_B_ELEM): Likewise.
(MVE_H_ELEM): Likewise.
(V_sz_elem1): Likewise.
(V_extr_elem): Likewise.
(earlyclobber_32): Likewise.
(supf): Move int attribute to iterators.md from mve.md.
(mode1): Likewise.
(VCVTQ_TO_F): Move int iterator to iterators.md from mve.md.
(VMVNQ_N): Likewise.
(VREV64Q): Likewise.
(VCVTQ_FROM_F): Likewise.
(VREV16Q): Likewise.
(VCVTAQ): Likewise.
(VMVNQ): Likewise.
(VDUPQ_N): Likewise.
(VCLZQ): Likewise.
(VADDVQ): Likewise.
(VREV32Q): Likewise.
(VMOVLBQ): Likewise.
(VMOVLTQ): Likewise.
(VCVTPQ): Likewise.
(VCVTNQ): Likewise.
(VCVTMQ): Likewise.
(VADDLVQ): Likewise.
(VCTPQ): Likewise.
(VCTPQ_M): Likewise.
(VCVTQ_N_TO_F): Likewise.
(VCREATEQ): Likewise.
(VSHRQ_N): Likewise.
(VCVTQ_N_FROM_F): Likewise.
(VADDLVQ_P): Likewise.
(VCMPNEQ): Likewise.
(VSHLQ): Likewise.
(VABDQ): Likewise.
(VADDQ_N): Likewise.
(VADDVAQ): Likewise.
(VADDVQ_P): Likewise.
(VANDQ): Likewise.
(VBICQ): Likewise.
(VBRSRQ_N): Likewise.
(VCADDQ_ROT270): Likewise.
(VCADDQ_ROT90): Likewise.
(VCMPEQQ): Likewise.
(VCMPEQQ_N): Likewise.
(VCMPNEQ_N): Likewise.
(VEORQ): Likewise.
(VHADDQ): Likewise.
(VHADDQ_N): Likewise.
(VHSUBQ): Likewise.
(VHSUBQ_N): Likewise.
(VMAXQ): Likewise.
(VMAXVQ): Likewise.
(VMINQ): Likewise.
(VMINVQ): Likewise.
(VMLADAVQ): Likewise.
(VMULHQ): Likewise.
(VMULLBQ_INT): Likewise.
(VMULLTQ_INT): Likewise.
(VMULQ): Likewise.
(VMULQ_N): Likewise.
(VORNQ): Likewise.
(VORRQ): Likewise.
(VQADDQ): Likewise.
(VQADDQ_N): Likewise.
(VQRSHLQ): Likewise.
(VQRSHLQ_N): Likewise.
(VQSHLQ): Likewise.
(VQSHLQ_N): Likewise.
(VQSHLQ_R): Likewise.
(VQSUBQ): Likewise.
(VQSUBQ_N): Likewise.
(VRHADDQ): Likewise.
(VRMULHQ): Likewise.
(VRSHLQ): Likewise.
(VRSHLQ_N): Likewise.
(VRSHRQ_N): Likewise.
(VSHLQ_N): Likewise.
(VSHLQ_R): Likewise.
(VSUBQ): Likewise.
(VSUBQ_N): Likewise.
(VADDLVAQ): Likewise.
(VBICQ_N): Likewise.
(VMLALDAVQ): Likewise.
(VMLALDAVXQ): Likewise.
(VMOVNBQ): Likewise.
(VMOVNTQ): Likewise.
(VORRQ_N): Likewise.
(VQMOVNBQ): Likewise.
(VQMOVNTQ): Likewise.
(VSHLLBQ_N): Likewise.
(VSHLLTQ_N): Likewise.
(VRMLALDAVHQ): Likewise.
(VBICQ_M_N): Likewise.
(VCVTAQ_M): Likewise.
(VCVTQ_M_TO_F): Likewise.
(VQRSHRNBQ_N): Likewise.
(VABAVQ): Likewise.
(VSHLCQ): Likewise.
(VRMLALDAVHAQ): Likewise.
(VADDVAQ_P): Likewise.
(VCLZQ_M): Likewise.
(VCMPEQQ_M_N): Likewise.
(VCMPEQQ_M): Likewise.
(VCMPNEQ_M_N): Likewise.
(VCMPNEQ_M): Likewise.
(VDUPQ_M_N): Likewise.
(VMAXVQ_P): Likewise.
(VMINVQ_P): Likewise.
(VMLADAVAQ): Likewise.
(VMLADAVQ_P): Likewise.
(VMLAQ_N): Likewise.
(VMLASQ_N): Likewise.
(VMVNQ_M): Likewise.
(VPSELQ): Likewise.
(VQDMLAHQ_N): Likewise.
(VQRDMLAHQ_N): Likewise.
(VQRDMLASHQ_N): Likewise.
(VQRSHLQ_M_N): Likewise.
(VQSHLQ_M_R): Likewise.
(VREV64Q_M): Likewise.
(VRSHLQ_M_N): Likewise.
(VSHLQ_M_R): Likewise.
(VSLIQ_N): Likewise.
(VSRIQ_N): Likewise.
(VMLALDAVQ_P): Likewise.
(VQMOVNBQ_M): Likewise.
(VMOVLTQ_M): Likewise.
(VMOVNBQ_M): Likewise.
(VRSHRNTQ_N): Likewise.
(VORRQ_M_N): Likewise.
(VREV32Q_M): Likewise.
(VREV16Q_M): Likewise.
(VQRSHRNTQ_N): Likewise.
(VMOVNTQ_M): Likewise.
(VMOVLBQ_M): Likewise.
(VMLALDAVAQ): Likewise.
(VQSHRNBQ_N): Likewise.
(VSHRNBQ_N): Likewise.
(VRSHRNBQ_N): Likewise.
(VMLALDAVXQ_P): Likewise.
(VQMOVNTQ_M): Likewise.
(VMVNQ_M_N): Likewise.
(VQSHRNTQ_N): Likewise.
(VMLALDAVAXQ): Likewise.
(VSHRNTQ_N): Likewise.
(VCVTMQ_M): Likewise.
(VCVTNQ_M): Likewise.
(VCVTPQ_M): Likewise.
(VCVTQ_M_N_FROM_F): Likewise.
(VCVTQ_M_FROM_F): Likewise.
(VRMLALDAVHQ_P): Likewise.
(VADDLVAQ_P): Likewise.
(VABAVQ_P): Likewise.
(VSHLQ_M): Likewise.
(VSRIQ_M_N): Likewise.
(VSUBQ_M): Likewise.
(VCVTQ_M_N_TO_F): Likewise.
(VHSUBQ_M): Likewise.
(VSLIQ_M_N): Likewise.
(VRSHLQ_M): Likewise.
(VMINQ_M): Likewise.
(VMULLBQ_INT_M): Likewise.
(VMULHQ_M): Likewise.
(VMULQ_M): Likewise.
(VHSUBQ_M_N): Likewise.
(VHADDQ_M_N): Likewise.
(VORRQ_M): Likewise.
(VRMULHQ_M): Likewise.
(VQADDQ_M): Likewise.
(VRSHRQ_M_N): Likewise.
(VQSUBQ_M_N): Likewise.
(VADDQ_M): Likewise.
(VORNQ_M): Likewise.
(VRHADDQ_M): Likewise.
(VQSHLQ_M): Likewise.
(VANDQ_M): Likewise.
(VBICQ_M): Likewise.
(VSHLQ_M_N): Likewise.
(VCADDQ_ROT270_M): Likewise.
(VQRSHLQ_M): Likewise.
(VQADDQ_M_N): Likewise.
(VADDQ_M_N): Likewise.
(VMAXQ_M): Likewise.
(VQSUBQ_M): Likewise.
(VMLASQ_M_N): Likewise.
(VMLADAVAQ_P): Likewise.
(VBRSRQ_M_N): Likewise.
(VMULQ_M_N): Likewise.
(VCADDQ_ROT90_M): Likewise.
(VMULLTQ_INT_M): Likewise.
(VEORQ_M): Likewise.
(VSHRQ_M_N): Likewise.
(VSUBQ_M_N): Likewise.
(VHADDQ_M): Likewise.
(VABDQ_M): Likewise.
(VMLAQ_M_N): Likewise.
(VQSHLQ_M_N): Likewise.
(VMLALDAVAQ_P): Likewise.
(VMLALDAVAXQ_P): Likewise.
(VQRSHRNBQ_M_N): Likewise.
(VQRSHRNTQ_M_N): Likewise.
(VQSHRNBQ_M_N): Likewise.
(VQSHRNTQ_M_N): Likewise.
(VRSHRNBQ_M_N): Likewise.
(VRSHRNTQ_M_N): Likewise.
(VSHLLBQ_M_N): Likewise.
(VSHLLTQ_M_N): Likewise.
(VSHRNBQ_M_N): Likewise.
(VSHRNTQ_M_N): Likewise.
(VSTRWSBQ): Likewise.
(VSTRBSOQ): Likewise.
(VSTRBQ): Likewise.
(VLDRBGOQ): Likewise.
(VLDRBQ): Likewise.
(VLDRWGBQ): Likewise.
(VLD1Q): Likewise.
(VLDRHGOQ): Likewise.
(VLDRHGSOQ): Likewise.
(VLDRHQ): Likewise.
(VLDRWQ): Likewise.
(VLDRDGBQ): Likewise.
(VLDRDGOQ): Likewise.
(VLDRDGSOQ): Likewise.
(VLDRWGOQ): Likewise.
(VLDRWGSOQ): Likewise.
(VST1Q): Likewise.
(VSTRHSOQ): Likewise.
(VSTRHSSOQ): Likewise.
(VSTRHQ): Likewise.
(VSTRWQ): Likewise.
(VSTRDSBQ): Likewise.
(VSTRDSOQ): Likewise.
(VSTRDSSOQ): Likewise.
(VSTRWSOQ): Likewise.
(VSTRWSSOQ): Likewise.
(VSTRWSBWBQ): Likewise.
(VLDRWGBWBQ): Likewise.
(VSTRDSBWBQ): Likewise.
(VLDRDGBWBQ): Likewise.
(VADCIQ): Likewise.
(VADCIQ_M): Likewise.
(VSBCQ): Likewise.
(VSBCQ_M): Likewise.
(VSBCIQ): Likewise.
(VSBCIQ_M): Likewise.
(VADCQ): Likewise.
(VADCQ_M): Likewise.
(UQRSHLLQ): Likewise.
(SQRSHRLQ): Likewise.
(VSHLCQ_M): Likewise.
(define_c_enum "unspec"): Move MVE enumerator to unspecs.md from mve.md.
* config/arm/unspecs.md (define_c_enum "unspec"): Move MVE enumerator from
mve.md to unspecs.md.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the machine description patterns for vst1q accepted a generic memory
operand for the destination, which could lead to an unrecognised builtin when
expanding vst1q* intrinsics. This change fixes the pattern to only accept MVE
memory operands.
gcc/ChangeLog:
PR target/96683
* config/arm/mve.md (mve_vst1q_f<mode>): Require MVE memory operand for
destination.
(mve_vst1q_<supf><mode>): Likewise.
gcc/testsuite/ChangeLog:
PR target/96683
* gcc.target/arm/mve/intrinsics/vst1q_f16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_s8.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_u16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_u8.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch modifies the MVE scalar shift RTL patterns. The current patterns
have wrong constraints and predicates due to which the values returned from
MVE scalar shift instructions are overwritten in the code-gen.
example:
$ cat x.c
int32_t foo(int64_t acc, int shift)
{
return sqrshrl_sat48 (acc, shift);
}
Code-gen before applying this patch:
$ arm-none-eabi-gcc -march=armv8.1-m.main+mve -mfloat-abi=hard -O2 -S
$ cat x.s
foo:
push {r4, r5}
sqrshrl r0, r1, #48, r2 ----> (a)
mov r0, r4 ----> (b)
pop {r4, r5}
bx lr
Code-gen after applying this patch:
foo:
sqrshrl r0, r1, #48, r2
bx lr
In the current compiler the return value (r0) from sqrshrl (a) is getting
overwritten by the mov statement (b).
This patch fixes above issue.
2020-06-12 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
gcc/
* config/arm/mve.md (mve_uqrshll_sat<supf>_di): Correct the predicate
and constraint of all the operands.
(mve_sqrshrl_sat<supf>_di): Likewise.
(mve_uqrshl_si): Likewise.
(mve_sqrshr_si): Likewise.
(mve_uqshll_di): Likewise.
(mve_urshrl_di): Likewise.
(mve_uqshl_si): Likewise.
(mve_urshr_si): Likewise.
(mve_sqshl_si): Likewise.
(mve_srshr_si): Likewise.
(mve_srshrl_di): Likewise.
(mve_sqshll_di): Likewise.
* config/arm/predicates.md (arm_low_register_operand): Define.
gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_scalar_shifts1.c: New test.
* gcc.target/arm/mve/intrinsics/mve_scalar_shifts2.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_scalar_shifts3.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_scalar_shifts4.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
intrinsics (PR94735).
The operands in RTL patterns of MVE vector scatter store intrinsics are wrongly grouped,
because of which few vector loads and stores instructions are wrongly getting optimized
out with -O2.
A new predicate "mve_scatter_memory" is defined in this patch, this predicate returns TRUE on
matching: (mem(reg)) for MVE scatter store intrinsics.
This patch fixes the issue by adding define_expand pattern with "mve_scatter_memory" predicate
and calls the corresponding define_insn by passing register_operand as first argument.
This register_operand is extracted from the operand with "mve_scatter_memory" predicate in
define_expand pattern.
gcc/ChangeLog:
2020-06-01 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
PR target/94735
* config/arm/predicates.md (mve_scatter_memory): Define to
match (mem (reg)) for scatter store memory.
* config/arm/mve.md (mve_vstrbq_scatter_offset_<supf><mode>): Modify
define_insn to define_expand.
(mve_vstrbq_scatter_offset_p_<supf><mode>): Likewise.
(mve_vstrhq_scatter_offset_<supf><mode>): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_<supf><mode>): Likewise.
(mve_vstrhq_scatter_shifted_offset_<supf><mode>): Likewise.
(mve_vstrdq_scatter_offset_p_<supf>v2di): Likewise.
(mve_vstrdq_scatter_offset_<supf>v2di): Likewise.
(mve_vstrdq_scatter_shifted_offset_p_<supf>v2di): Likewise.
(mve_vstrdq_scatter_shifted_offset_<supf>v2di): Likewise.
(mve_vstrhq_scatter_offset_fv8hf): Likewise.
(mve_vstrhq_scatter_offset_p_fv8hf): Likewise.
(mve_vstrhq_scatter_shifted_offset_fv8hf): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_fv8hf): Likewise.
(mve_vstrwq_scatter_offset_fv4sf): Likewise.
(mve_vstrwq_scatter_offset_p_fv4sf): Likewise.
(mve_vstrwq_scatter_offset_p_<supf>v4si): Likewise.
(mve_vstrwq_scatter_offset_<supf>v4si): Likewise.
(mve_vstrwq_scatter_shifted_offset_fv4sf): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_fv4sf): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_<supf>v4si): Likewise.
(mve_vstrwq_scatter_shifted_offset_<supf>v4si): Likewise.
(mve_vstrbq_scatter_offset_<supf><mode>_insn): Define insn for scatter
stores.
(mve_vstrbq_scatter_offset_p_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_offset_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_<supf><mode>_insn): Likewise.
(mve_vstrdq_scatter_offset_p_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_offset_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_shifted_offset_<supf>v2di_insn): Likewise.
(mve_vstrhq_scatter_offset_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_offset_p_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn): Likewise.
(mve_vstrwq_scatter_offset_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_offset_p_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_offset_p_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_offset_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn): Likewise.
gcc/testsuite/ChangeLog:
2020-06-01 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
PR target/94735
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_base.c: New test.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_base_p.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_offset.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_offset_p.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_shifted_offset.c:
Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_shifted_offset_p.c:
Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR94959).
Few MVE intrinsics like vldrbq_s32, vldrhq_s32 etc., the assembler instructions
generated by current compiler are wrong.
eg: vldrbq_s32 generates an assembly instructions `vldrb.s32 q0,[ip]`.
But as per Arm-arm second argument in above instructions must also be a low
register (<= r7). This patch fixes this issue by creating a new predicate
"mve_memory_operand" and constraint "Ux" which allows low registers as arguments
to the generated instructions depending on the mode of the argument. A new constraint
"Ul" is created to handle loading to PC-relative addressing modes for vector
store/load intrinsiscs.
All the corresponding MVE intrinsic generating wrong code-gen as vldrbq_s32
are modified in this patch.
gcc/ChangeLog:
2020-05-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>
PR target/94959
* config/arm/arm-protos.h (arm_mode_base_reg_class): Function
declaration.
(mve_vector_mem_operand): Likewise.
* config/arm/arm.c (thumb2_legitimate_address_p): For MVE target check
the load from memory to a core register is legitimate for give mode.
(mve_vector_mem_operand): Define function.
(arm_print_operand): Modify comment.
(arm_mode_base_reg_class): Define.
* config/arm/arm.h (MODE_BASE_REG_CLASS): Modify to add check for
TARGET_HAVE_MVE and expand to arm_mode_base_reg_class on TRUE.
* config/arm/constraints.md (Ux): Likewise.
(Ul): Likewise.
* config/arm/mve.md (mve_mov): Replace constraint Us with Ux and also
add support for missing Vector Store Register and Vector Load Register.
Add a new alternative to support load from memory to PC (or label) in
vector store/load.
(mve_vstrbq_<supf><mode>): Modify constraint Us to Ux.
(mve_vldrbq_<supf><mode>): Modify constriant Us to Ux, predicate to
mve_memory_operand and also modify the MVE instructions to emit.
(mve_vldrbq_z_<supf><mode>): Modify constraint Us to Ux.
(mve_vldrhq_fv8hf): Modify constriant Us to Ux, predicate to
mve_memory_operand and also modify the MVE instructions to emit.
(mve_vldrhq_<supf><mode>): Modify constriant Us to Ux, predicate to
mve_memory_operand and also modify the MVE instructions to emit.
(mve_vldrhq_z_fv8hf): Likewise.
(mve_vldrhq_z_<supf><mode>): Likewise.
(mve_vldrwq_fv4sf): Likewise.
(mve_vldrwq_<supf>v4si): Likewise.
(mve_vldrwq_z_fv4sf): Likewise.
(mve_vldrwq_z_<supf>v4si): Likewise.
(mve_vld1q_f<mode>): Modify constriant Us to Ux.
(mve_vld1q_<supf><mode>): Likewise.
(mve_vstrhq_fv8hf): Modify constriant Us to Ux, predicate to
mve_memory_operand.
(mve_vstrhq_p_fv8hf): Modify constriant Us to Ux, predicate to
mve_memory_operand and also modify the MVE instructions to emit.
(mve_vstrhq_p_<supf><mode>): Likewise.
(mve_vstrhq_<supf><mode>): Modify constriant Us to Ux, predicate to
mve_memory_operand.
(mve_vstrwq_fv4sf): Modify constriant Us to Ux.
(mve_vstrwq_p_fv4sf): Modify constriant Us to Ux and also modify the MVE
instructions to emit.
(mve_vstrwq_p_<supf>v4si): Likewise.
(mve_vstrwq_<supf>v4si): Likewise.Modify constriant Us to Ux.
* config/arm/predicates.md (mve_memory_operand): Define.
gcc/testsuite/ChangeLog:
2020-05-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
PR target/94959
* gcc.target/arm/mve/intrinsics/mve_vector_float2.c: Modify.
* gcc.target/arm/mve/intrinsics/mve_vldr.c: New test.
* gcc.target/arm/mve/intrinsics/mve_vldr_z.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstr.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstr_p.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_f16.c: Modify.
* gcc.target/arm/mve/intrinsics/vld1q_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_z_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrbq_z_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrhq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrhq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrhq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrhq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrhq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrhq_z_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrhq_z_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrhq_z_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrhq_z_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrhq_z_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_z_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_z_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_z_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vuninitializedq_float.c: Likewise.
* gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise.
* gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches changes the constraint "e" to "Te".
gcc/ChangeLog:
2020-04-24 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
* config/arm/constraints.md (e): Remove constraint.
(Te): Define constraint.
* config/arm/mve.md (vaddvq_<supf><mode>): Modify constraint in
operand 0 from "e" to "Te".
(vaddvaq_<supf><mode>): Likewise.
(vaddvq_p_<supf><mode>): Likewise.
(vmladavq_<supf><mode>): Likewise.
(vmladavxq_s<mode>): Likewise.
(vmlsdavq_s<mode>): Likewise.
(vmlsdavxq_s<mode>): Likewise.
(vaddvaq_p_<supf><mode>): Likewise.
(vmladavaq_<supf><mode>): Likewise.
(vmladavq_p_<supf><mode>): Likewise.
(vmladavxq_p_s<mode>): Likewise.
(vmlsdavq_p_s<mode>): Likewise.
(vmlsdavxq_p_s<mode>): Likewise.
(vmlsdavaxq_s<mode>): Likewise.
(vmlsdavaq_s<mode>): Likewise.
(vmladavaxq_s<mode>): Likewise.
(vmladavaq_p_<supf><mode>): Likewise.
(vmladavaxq_p_s<mode>): Likewise.
(vmlsdavaq_p_s<mode>): Likewise.
(vmlsdavaxq_p_s<mode>): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes an ICE we were seeing due to a missing vec_duplicate pattern.
gcc/ChangeLog:
2020-04-15 Andre Vieira <andre.simoesdiasvieira@arm.com>
* config/arm/mve.md (mve_vec_duplicate<mode>): New pattern.
(V_sz_elem2): Remove unused mode attribute.
gcc/testsuite/ChangeLog:
2020-04-15 Andre Vieira <andre.simoesdiasvieira@arm.com>
Srinath Parvathaneni <srinath.parvathaneni@arm.com>
* gcc.target/arm/mve/intrinsics/mve_vec_duplicate.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These intrinsics are the predicated version of the intrinsics inroduced
in https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542725.html.
These are not yet public on developer.arm.com but we have reached
internal consensus on them.
The approach follows the same method as for the CDE intrinsics for MVE
registers, most notably using the same arm_resolve_overloaded_builtin
function with minor modifications.
The resolver hook has been moved from arm-builtins.c to arm-c.c so it
can access the c-common function build_function_call_vec. This function
is needed to perform the same checks on arguments as a normal C or C++
function would perform.
It is fine to put this resolver in arm-c.c since it's only use is for
the ACLE functions, and these are only available in C/C++.
So that the resolver function has access to information it needs from
the builtins, we put two query functions into arm-builtins.c and use
them from arm-c.c.
We rely on the order that the builtins are defined in
gcc/config/arm/arm_cde_builtins.def, knowing that the predicated
versions come after the non-predicated versions.
The machine description patterns for these builtins are simpler than
those for the non-predicated versions, since the accumulator versions
*and* non-accumulator versions both need an input vector now.
The input vector is needed for the non-accumulator version to describe
the original values for those lanes that are not updated during the
merge operation.
We additionally need to introduce qualifiers for these new builtins,
which follow the same pattern as the non-predicated versions but with an
extra argument to describe the predicate.
Error message changes:
- We directly mention the builtin argument when complaining that an
argument is not in the correct range.
This more closely matches the C error messages.
- We ensure the resolver complains about *all* invalid arguments to a
function instead of just the first one.
- The resolver error messages index arguments from 1 instead of 0 to
match the arguments coming from the C/C++ frontend.
In order to allow the user to give an argument for the merging predicate
when they don't care what data is stored in the 'false' lanes, we also
move the __arm_vuninitializedq* intrinsics from arm_mve.h to
arm_mve_types.h which is shared with arm_cde.h.
We only move the fully type-specified `__arm_vuninitializedq*`
intrinsics and not the polymorphic versions, since moving the
polymorphic versions requires moving the _Generic framework as well as
just the intrinsics we're interested in. This matches the approach taken
for the `__arm_vreinterpret*` functions in this include file.
This patch also contains a slight change in spacing of an existing
assembly instruction to be emitted.
This is just to help writing tests -- vmsr usually has a tab and a space
between the mnemonic and the first argument, but in one case it just has
a tab -- making all the same helps make test regexps simpler.
Testing Done:
Bootstrap and full regtest on arm-none-linux-gnueabihf
Full regtest on arm-none-eabi
All testing done with a local fix for the bugzilla PR below.
That bugzilla currently causes multiple ICE's on the tests added in
this patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94341
gcc/ChangeLog:
2020-04-02 Matthew Malcomson <matthew.malcomson@arm.com>
* config/arm/arm-builtins.c (CX_UNARY_UNONE_QUALIFIERS): New.
(CX_BINARY_UNONE_QUALIFIERS): New.
(CX_TERNARY_UNONE_QUALIFIERS): New.
(arm_resolve_overloaded_builtin): Move to arm-c.c.
(arm_expand_builtin_args): Update error message.
(enum resolver_ident): New.
(arm_describe_resolver): New.
(arm_cde_end_args): New.
* config/arm/arm-builtins.h: New file.
* config/arm/arm-c.c (arm_resolve_overloaded_builtin): New.
(arm_resolve_cde_builtin): Moved from arm-builtins.c.
* config/arm/arm_cde.h (__arm_vcx1q_m, __arm_vcx1qa_m,
__arm_vcx2q_m, __arm_vcx2qa_m, __arm_vcx3q_m, __arm_vcx3qa_m):
New.
* config/arm/arm_cde_builtins.def (vcx1q_p_, vcx1qa_p_,
vcx2q_p_, vcx2qa_p_, vcx3q_p_, vcx3qa_p_): New builtin defs.
* config/arm/iterators.md (CDE_VCX): New int iterator.
(a) New int attribute.
* config/arm/mve.md (arm_vcx1q<a>_p_v16qi, arm_vcx2q<a>_p_v16qi,
arm_vcx3q<a>_p_v16qi): New patterns.
* config/arm/vfp.md (thumb2_movhi_fp16): Extra space in assembly.
gcc/testsuite/ChangeLog:
2020-04-02 Matthew Malcomson <matthew.malcomson@arm.com>
* gcc.target/arm/acle/cde-errors.c: Add predicated forms.
* gcc.target/arm/acle/cde-mve-error-1.c: Add predicated forms.
* gcc.target/arm/acle/cde-mve-error-2.c: Add predicated forms.
* gcc.target/arm/acle/cde-mve-error-3.c: Add predicated forms.
* gcc.target/arm/acle/cde-mve-full-assembly.c: Add predicated
forms.
* gcc.target/arm/acle/cde-mve-tests.c: Add predicated forms.
* gcc.target/arm/acle/cde_v_1_err.c (test_imm_range): Update for
error message format change.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c:
Update scan-assembler regexp.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement CDE intrinsics on MVE registers.
Other than the basics required for adding intrinsics this patch consists
of three changes.
** We separate out the MVE types and casts from the arm_mve.h header.
This is so that the types can be used in arm_cde.h without the need to include
the entire arm_mve.h header.
The only type that arm_cde.h needs is `uint8x16_t`, so this separation could be
avoided by using a `typedef` in this file.
Since the introduced intrinsics are all defined to act on the full range of MVE
types, declaring all such types seems intuitive since it will provide their
declaration to the user too.
This arm_mve_types.h header not only includes the MVE types, but also
the conversion intrinsics between them.
Some of the conversion intrinsics are needed for arm_cde.h, but most are
not. We include all conversion intrinsics to keep the definition of
such conversion functions all in one place, on the understanding that
extra conversion functions being defined when including `arm_cde.h` is
not a problem.
** We define the TARGET_RESOLVE_OVERLOADED_BUILTIN hook for the Arm backend.
This is needed to implement the polymorphism for the required intrinsics.
The intrinsics have no specialised version, and the resulting assembly
instruction for all different types should be exactly the same.
Due to this we have implemented these intrinsics via one builtin on one type.
All other calls to the intrinsic with different types are implicitly cast to
the one type that is defined, and hence are all expanded to the same RTL
pattern that is only defined for one machine mode.
** We seperate the initialisation of the CDE intrinsics from others.
This allows us to ensure that the CDE intrinsics acting on MVE registers
are only created when both CDE and MVE are available.
Only initialising these builtins when both features are available is
especially important since they require a type that is only initialised
when the target supports hard float. Hence trying to initialise these
builtins on a soft float target would cause an ICE.
Testing done:
Full bootstrap and regtest on arm-none-linux-gnueabihf
Regression test on arm-none-eabi
Ok for trunk?
gcc/ChangeLog:
2020-03-10 Matthew Malcomson <matthew.malcomson@arm.com>
* config.gcc (arm_mve_types.h): New extra_header for arm.
* config/arm/arm-builtins.c (arm_resolve_overloaded_builtin): New.
(arm_init_cde_builtins): New.
(arm_init_acle_builtins): Remove initialisation of CDE builtins.
(arm_init_builtins): Call arm_init_cde_builtins when target
supports CDE.
* config/arm/arm-c.c (arm_resolve_overloaded_builtin): New declaration.
(arm_register_target_pragmas): Initialise resolve_overloaded_builtin
hook to the implementation for the arm backend.
* config/arm/arm.h (ARM_MVE_CDE_CONST_1): New.
(ARM_MVE_CDE_CONST_2): New.
(ARM_MVE_CDE_CONST_3): New.
* config/arm/arm_cde.h (__arm_vcx1q_u8): New.
(__arm_vcx1qa): New.
(__arm_vcx2q): New.
(__arm_vcx2q_u8): New.
(__arm_vcx2qa): New.
(__arm_vcx3q): New.
(__arm_vcx3q_u8): New.
(__arm_vcx3qa): New.
* config/arm/arm_cde_builtins.def (vcx1q, vcx1qa, vcx2q, vcx2qa, vcx3q,
vcx3qa): New builtins defined.
* config/arm/arm_mve.h: Move typedefs and conversion intrinsics
to arm_mve_types.h header.
* config/arm/arm_mve_types.h: New file.
* config/arm/mve.md (arm_vcx1qv16qi, arm_vcx1qav16qi, arm_vcx2qv16qi,
arm_vcx2qav16qi, arm_vcx3qv16qi, arm_vcx3qav16qi): New patterns.
* config/arm/predicates.md (const_int_mve_cde1_operand,
const_int_mve_cde2_operand, const_int_mve_cde3_operand): New.
gcc/testsuite/ChangeLog:
2020-03-23 Matthew Malcomson <matthew.malcomson@arm.com>
Dennis Zhang <dennis.zhang@arm.com>
* gcc.target/arm/acle/cde-mve-error-1.c: New test.
* gcc.target/arm/acle/cde-mve-error-2.c: New test.
* gcc.target/arm/acle/cde-mve-error-3.c: New test.
* gcc.target/arm/acle/cde-mve-full-assembly.c: New test.
* gcc.target/arm/acle/cde-mve-tests.c: New test.
* lib/target-supports.exp (arm_v8_1m_main_cde_mve_fp): New check
effective.
(arm_v8_1m_main_cde_mve, arm_v8m_main_cde_fp): Use -mfpu=auto
so we only check configurations that make sense.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes vec extracts to memory that can arise from code as seen in the
testcase added. The patch fixes this by allowing mem operands in the set of
mve_vec_extract patterns, which given the only '=r' constraint will lead to the
scalar value being written to a register and then stored in memory using scalar
store pattern.
gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>
* config/arm/mve.md (mve_vec_extract*): Allow memory operands in set.
gcc/testsuite/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>
* gcc.target/arm/mve/intrinsics/mve_vec_extracts_from_memory.c: New
test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hi,
This patch fixes the immediate checks on vcvt and vqshr(u)n[bt] instructions.
It also removes the 'arm_mve_immediate_check' as the check was wrong and the
error message is not much better than the constraint one, which albeit isn't
great either.
gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>
* config/arm/arm.c (arm_mve_immediate_check): Removed.
* config/arm/mve.md (MVE_pred2, MVE_constraint2): Added FP types.
(mve_vcvtq_n_to_f_*, mve_vcvtq_n_from_f_*, mve_vqshrnbq_n_*,
mve_vqshrntq_n_*, mve_vqshrunbq_n_s*, mve_vqshruntq_n_s*,
mve_vcvtq_m_n_from_f_*, mve_vcvtq_m_n_to_f_*, mve_vqshrnbq_m_n_*,
mve_vqrshruntq_m_n_s*, mve_vqshrunbq_m_n_s*,
mve_vqshruntq_m_n_s*): Fixed immediate constraints.
gcc/testsuite/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>
* gcc.target/arm/mve/intrinsics/mve_immediates_1_n.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes v[id]wdup intrinsics. They had two issues:
1) the predicated versions did not link the incoming inactive vector parameter
to the output
2) The backend didn't enforce the wrap limit operand be in an odd register.
1) was fixed like we did for all other predicated intrinsics
2) requires a temporary hack where we pass the value in the top end of DImode
operand. The proper fix would be to add a register CLASS but this interacted
badly with other existing targets codegen. We will look to fix this properly in GCC 11.
gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>
* config/arm/arm_mve.h: Fix v[id]wdup intrinsics.
* config/arm/mve/md: Fix v[id]wdup patterns.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes the constant load pattern for MVE, this was not accounting
correctly for label + offset cases.
Added test that ICE'd before and removed the scan assemblers for the mve_vector*
tests as they were too fragile.
gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>
* config/arm/arm.c (output_move_neon): Deal with label + offset cases.
* config/arm/mve.md (*mve_mov<mode>): Handle const vectors.
gcc/testsuite/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>
* gcc.target/arm/mve/intrinsics/mve_load_from_array.c: New test.
* gcc.target/arm/mve/intrinsics/mve_vector_float.c: Remove
scan-assembler.
* gcc.target/arm/mve/intrinsics/mve_vector_float1.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_int1.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_int2.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Following MVE ACLE intrinsics have an issue with writeback to the base address.
vldrdq_gather_base_wb_s64, vldrdq_gather_base_wb_u64, vldrdq_gather_base_wb_z_s64, vldrdq_gather_base_wb_z_u64, vldrwq_gather_base_wb_s32, vldrwq_gather_base_wb_u32, vldrwq_gather_base_wb_z_s32, vldrwq_gather_base_wb_z_u32, vldrwq_gather_base_wb_f32, vldrwq_gather_base_wb_z_f32.
This patch fixes the bug reported in PR94317 by adding separate builtin calls to update the result and writeback to base address for the above intrinsics.
2020-04-02 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
PR target/94317
* config/arm/arm-builtins.c (LDRGBWBXU_QUALIFIERS): Define.
(LDRGBWBXU_Z_QUALIFIERS): Likewise.
* config/arm/arm_mve.h (__arm_vldrdq_gather_base_wb_s64): Modify
intrinsic defintion by adding a new builtin call to writeback into base
address.
(__arm_vldrdq_gather_base_wb_u64): Likewise.
(__arm_vldrdq_gather_base_wb_z_s64): Likewise.
(__arm_vldrdq_gather_base_wb_z_u64): Likewise.
(__arm_vldrwq_gather_base_wb_s32): Likewise.
(__arm_vldrwq_gather_base_wb_u32): Likewise.
(__arm_vldrwq_gather_base_wb_z_s32): Likewise.
(__arm_vldrwq_gather_base_wb_z_u32): Likewise.
(__arm_vldrwq_gather_base_wb_f32): Likewise.
(__arm_vldrwq_gather_base_wb_z_f32): Likewise.
* config/arm/arm_mve_builtins.def (vldrwq_gather_base_wb_z_u): Modify
builtin's qualifier.
(vldrdq_gather_base_wb_z_u): Likewise.
(vldrwq_gather_base_wb_u): Likewise.
(vldrdq_gather_base_wb_u): Likewise.
(vldrwq_gather_base_wb_z_s): Likewise.
(vldrwq_gather_base_wb_z_f): Likewise.
(vldrdq_gather_base_wb_z_s): Likewise.
(vldrwq_gather_base_wb_s): Likewise.
(vldrwq_gather_base_wb_f): Likewise.
(vldrdq_gather_base_wb_s): Likewise.
(vldrwq_gather_base_nowb_z_u): Define builtin.
(vldrdq_gather_base_nowb_z_u): Likewise.
(vldrwq_gather_base_nowb_u): Likewise.
(vldrdq_gather_base_nowb_u): Likewise.
(vldrwq_gather_base_nowb_z_s): Likewise.
(vldrwq_gather_base_nowb_z_f): Likewise.
(vldrdq_gather_base_nowb_z_s): Likewise.
(vldrwq_gather_base_nowb_s): Likewise.
(vldrwq_gather_base_nowb_f): Likewise.
(vldrdq_gather_base_nowb_s): Likewise.
* config/arm/mve.md (mve_vldrwq_gather_base_nowb_<supf>v4si): Define RTL
pattern.
(mve_vldrwq_gather_base_wb_<supf>v4si): Modify RTL pattern.
(mve_vldrwq_gather_base_nowb_z_<supf>v4si): Define RTL pattern.
(mve_vldrwq_gather_base_wb_z_<supf>v4si): Modify RTL pattern.
(mve_vldrwq_gather_base_wb_fv4sf): Modify RTL pattern.
(mve_vldrwq_gather_base_nowb_fv4sf): Define RTL pattern.
(mve_vldrwq_gather_base_wb_z_fv4sf): Modify RTL pattern.
(mve_vldrwq_gather_base_nowb_z_fv4sf): Define RTL pattern.
(mve_vldrdq_gather_base_nowb_<supf>v4di): Define RTL pattern.
(mve_vldrdq_gather_base_wb_<supf>v4di): Modify RTL pattern.
(mve_vldrdq_gather_base_nowb_z_<supf>v4di): Define RTL pattern.
(mve_vldrdq_gather_base_wb_z_<supf>v4di): Modify RTL pattern.
gcc/testsuite/ChangeLog:
2020-04-02 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
PR target/94317
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: Modify.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch supports following MVE ACLE whole vector left shift with carry intrinsics.
vshlcq_m_s8, vshlcq_m_s16, vshlcq_m_s32, vshlcq_m_u8, vshlcq_m_u16, vshlcq_m_u32.
Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>
Mihail Ionescu <mihail.ionescu@arm.com>
* config/arm/arm_mve.h (vshlcq_m_s8): Define macro.
(vshlcq_m_u8): Likewise.
(vshlcq_m_s16): Likewise.
(vshlcq_m_u16): Likewise.
(vshlcq_m_s32): Likewise.
(vshlcq_m_u32): Likewise.
(__arm_vshlcq_m_s8): Define intrinsic.
(__arm_vshlcq_m_u8): Likewise.
(__arm_vshlcq_m_s16): Likewise.
(__arm_vshlcq_m_u16): Likewise.
(__arm_vshlcq_m_s32): Likewise.
(__arm_vshlcq_m_u32): Likewise.
(vshlcq_m): Define polymorphic variant.
* config/arm/arm_mve_builtins.def (QUADOP_NONE_NONE_UNONE_IMM_UNONE):
Use builtin qualifier.
(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE): Likewise.
* config/arm/mve.md (mve_vshlcq_m_vec_<supf><mode>): Define RTL pattern.
(mve_vshlcq_m_carry_<supf><mode>): Likewise.
(mve_vshlcq_m_<supf><mode>): Likewise.
gcc/testsuite/ChangeLog:
2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>
Mihail Ionescu <mihail.ionescu@arm.com>
* gcc.target/arm/mve/intrinsics/vshlcq_m_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vshlcq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vshlcq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vshlcq_m_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vshlcq_m_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vshlcq_m_u8.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch supports following MVE ACLE scalar shift intrinsics.
sqrshr, sqrshrl, sqrshrl_sat48, sqshl, sqshll, srshr, srshrl, uqrshl, uqrshll, uqrshll_sat48, uqshl, uqshll, urshr, urshrl, lsll, asrl.
Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
* config/arm/arm-builtins.c (LSLL_QUALIFIERS): Define builtin qualifier.
(UQSHL_QUALIFIERS): Likewise.
(ASRL_QUALIFIERS): Likewise.
(SQSHL_QUALIFIERS): Likewise.
* config/arm/arm_mve.h (__ARM_BIG_ENDIAN): Check to not support MVE in
Big-Endian Mode.
(sqrshr): Define macro.
(sqrshrl): Likewise.
(sqrshrl_sat48): Likewise.
(sqshl): Likewise.
(sqshll): Likewise.
(srshr): Likewise.
(srshrl): Likewise.
(uqrshl): Likewise.
(uqrshll): Likewise.
(uqrshll_sat48): Likewise.
(uqshl): Likewise.
(uqshll): Likewise.
(urshr): Likewise.
(urshrl): Likewise.
(lsll): Likewise.
(asrl): Likewise.
(__arm_lsll): Define intrinsic.
(__arm_asrl): Likewise.
(__arm_uqrshll): Likewise.
(__arm_uqrshll_sat48): Likewise.
(__arm_sqrshrl): Likewise.
(__arm_sqrshrl_sat48): Likewise.
(__arm_uqshll): Likewise.
(__arm_urshrl): Likewise.
(__arm_srshrl): Likewise.
(__arm_sqshll): Likewise.
(__arm_uqrshl): Likewise.
(__arm_sqrshr): Likewise.
(__arm_uqshl): Likewise.
(__arm_urshr): Likewise.
(__arm_sqshl): Likewise.
(__arm_srshr): Likewise.
* config/arm/arm_mve_builtins.def (LSLL_QUALIFIERS): Use builtin
qualifier.
(UQSHL_QUALIFIERS): Likewise.
(ASRL_QUALIFIERS): Likewise.
(SQSHL_QUALIFIERS): Likewise.
* config/arm/mve.md (mve_uqrshll_sat<supf>_di): Define RTL pattern.
(mve_sqrshrl_sat<supf>_di): Likewise.
(mve_uqrshl_si): Likewise.
(mve_sqrshr_si): Likewise.
(mve_uqshll_di): Likewise.
(mve_urshrl_di): Likewise.
(mve_uqshl_si): Likewise.
(mve_urshr_si): Likewise.
(mve_sqshl_si): Likewise.
(mve_srshr_si): Likewise.
(mve_srshrl_di): Likewise.
(mve_sqshll_di): Likewise.
gcc/testsuite/ChangeLog:
2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
* gcc.target/arm/mve/intrinsics/asrl.c: New test.
* gcc.target/arm/mve/intrinsics/lsll.c: Likewise.
* gcc.target/arm/mve/intrinsics/sqrshr.c: Likewise.
* gcc.target/arm/mve/intrinsics/sqrshrl_sat48.c: Likewise.
* gcc.target/arm/mve/intrinsics/sqrshrl_sat64.c: Likewise.
* gcc.target/arm/mve/intrinsics/sqshl.c: Likewise.
* gcc.target/arm/mve/intrinsics/sqshll.c: Likewise.
* gcc.target/arm/mve/intrinsics/srshr.c: Likewise.
* gcc.target/arm/mve/intrinsics/srshrl.c: Likewise.
* gcc.target/arm/mve/intrinsics/uqrshl.c: Likewise.
* gcc.target/arm/mve/intrinsics/uqrshll_sat48.c: Likewise.
* gcc.target/arm/mve/intrinsics/uqrshll_sat64.c: Likewise.
* gcc.target/arm/mve/intrinsics/uqshl.c: Likewise.
* gcc.target/arm/mve/intrinsics/uqshll.c: Likewise.
* gcc.target/arm/mve/intrinsics/urshr.c: Likewise.
* gcc.target/arm/mve/intrinsics/urshrl.c: Likewise.
* lib/target-supports.exp:
(check_effective_target_arm_v8_1m_mve_fp_ok_nocache): Modify to not
support MVE floating point in Big Endian mode.
(check_effective_target_arm_v8_1m_mve_ok_nocache): Modify to not
support MVE integer in Big Endian mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch supports following MVE ACLE intrinsics to get and set vector lane.
vsetq_lane_f16, vsetq_lane_f32, vsetq_lane_s16, vsetq_lane_s32, vsetq_lane_s8, vsetq_lane_s64, vsetq_lane_u8, vsetq_lane_u16, vsetq_lane_u32, vsetq_lane_u64, vgetq_lane_f16, vgetq_lane_f32, vgetq_lane_s16, vgetq_lane_s32, vgetq_lane_s8, vgetq_lane_s64, vgetq_lane_u8, vgetq_lane_u16, vgetq_lane_u32, vgetq_lane_u64.
Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>
Mihail Ionescu <mihail.ionescu@arm.com>
* config/arm/arm_mve.h (vsetq_lane_f16): Define macro.
(vsetq_lane_f32): Likewise.
(vsetq_lane_s16): Likewise.
(vsetq_lane_s32): Likewise.
(vsetq_lane_s8): Likewise.
(vsetq_lane_s64): Likewise.
(vsetq_lane_u8): Likewise.
(vsetq_lane_u16): Likewise.
(vsetq_lane_u32): Likewise.
(vsetq_lane_u64): Likewise.
(vgetq_lane_f16): Likewise.
(vgetq_lane_f32): Likewise.
(vgetq_lane_s16): Likewise.
(vgetq_lane_s32): Likewise.
(vgetq_lane_s8): Likewise.
(vgetq_lane_s64): Likewise.
(vgetq_lane_u8): Likewise.
(vgetq_lane_u16): Likewise.
(vgetq_lane_u32): Likewise.
(vgetq_lane_u64): Likewise.
(__ARM_NUM_LANES): Likewise.
(__ARM_LANEQ): Likewise.
(__ARM_CHECK_LANEQ): Likewise.
(__arm_vsetq_lane_s16): Define intrinsic.
(__arm_vsetq_lane_s32): Likewise.
(__arm_vsetq_lane_s8): Likewise.
(__arm_vsetq_lane_s64): Likewise.
(__arm_vsetq_lane_u8): Likewise.
(__arm_vsetq_lane_u16): Likewise.
(__arm_vsetq_lane_u32): Likewise.
(__arm_vsetq_lane_u64): Likewise.
(__arm_vgetq_lane_s16): Likewise.
(__arm_vgetq_lane_s32): Likewise.
(__arm_vgetq_lane_s8): Likewise.
(__arm_vgetq_lane_s64): Likewise.
(__arm_vgetq_lane_u8): Likewise.
(__arm_vgetq_lane_u16): Likewise.
(__arm_vgetq_lane_u32): Likewise.
(__arm_vgetq_lane_u64): Likewise.
(__arm_vsetq_lane_f16): Likewise.
(__arm_vsetq_lane_f32): Likewise.
(__arm_vgetq_lane_f16): Likewise.
(__arm_vgetq_lane_f32): Likewise.
(vgetq_lane): Define polymorphic variant.
(vsetq_lane): Likewise.
* config/arm/mve.md (mve_vec_extract<mode><V_elem_l>): Define RTL
pattern.
(mve_vec_extractv2didi): Likewise.
(mve_vec_extract_sext_internal<mode>): Likewise.
(mve_vec_extract_zext_internal<mode>): Likewise.
(mve_vec_set<mode>_internal): Likewise.
(mve_vec_setv2di_internal): Likewise.
* config/arm/neon.md (vec_set<mode>): Move RTL pattern to vec-common.md
file.
(vec_extract<mode><V_elem_l>): Rename to
"neon_vec_extract<mode><V_elem_l>".
(vec_extractv2didi): Rename to "neon_vec_extractv2didi".
* config/arm/vec-common.md (vec_extract<mode><V_elem_l>): Define RTL
pattern common for MVE and NEON.
(vec_set<mode>): Move RTL pattern from neon.md and modify to accept both
MVE and NEON.
gcc/testsuite/ChangeLog:
2020-03-23 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>
Mihail Ionescu <mihail.ionescu@arm.com>
* gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: New test.
* gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds an earlyclobber to the MVE instructions that require it and were
missing it. These are vrev64 and 32-bit element variants of vcadd, vhcadd vcmul,
vmull[bt] and vqdmull[bt].
gcc/ChangeLog:
2020-03-23 Andre Vieira <andre.simoesdiasvieira@arm.com>
* config/arm/mve.md (earlyclobber_32): New mode attribute.
(mve_vrev64q_*, mve_vcaddq*, mve_vhcaddq_*, mve_vcmulq_*,
mve_vmull[bt]q_*, mve_vqdmull[bt]q_*): Add appropriate early clobbers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
intrinsics and also aliases to vstr and vldr intrinsics.
This patch supports following MVE ACLE intrinsics which are aliases of vstr and
vldr intrinsics.
vst1q_p_u8, vst1q_p_s8, vld1q_z_u8, vld1q_z_s8, vst1q_p_u16, vst1q_p_s16,
vld1q_z_u16, vld1q_z_s16, vst1q_p_u32, vst1q_p_s32, vld1q_z_u32, vld1q_z_s32,
vld1q_z_f16, vst1q_p_f16, vld1q_z_f32, vst1q_p_f32.
This patch also supports following MVE ACLE vector deinterleaving loads and vector
interleaving stores.
vst2q_s8, vst2q_u8, vld2q_s8, vld2q_u8, vld4q_s8, vld4q_u8, vst2q_s16, vst2q_u16,
vld2q_s16, vld2q_u16, vld4q_s16, vld4q_u16, vst2q_s32, vst2q_u32, vld2q_s32,
vld2q_u32, vld4q_s32, vld4q_u32, vld4q_f16, vld2q_f16, vst2q_f16, vld4q_f32,
vld2q_f32, vst2q_f32.
Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
2020-03-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>
Mihail Ionescu <mihail.ionescu@arm.com>
* config/arm/arm_mve.h (vst1q_p_u8): Define macro.
(vst1q_p_s8): Likewise.
(vst2q_s8): Likewise.
(vst2q_u8): Likewise.
(vld1q_z_u8): Likewise.
(vld1q_z_s8): Likewise.
(vld2q_s8): Likewise.
(vld2q_u8): Likewise.
(vld4q_s8): Likewise.
(vld4q_u8): Likewise.
(vst1q_p_u16): Likewise.
(vst1q_p_s16): Likewise.
(vst2q_s16): Likewise.
(vst2q_u16): Likewise.
(vld1q_z_u16): Likewise.
(vld1q_z_s16): Likewise.
(vld2q_s16): Likewise.
(vld2q_u16): Likewise.
(vld4q_s16): Likewise.
(vld4q_u16): Likewise.
(vst1q_p_u32): Likewise.
(vst1q_p_s32): Likewise.
(vst2q_s32): Likewise.
(vst2q_u32): Likewise.
(vld1q_z_u32): Likewise.
(vld1q_z_s32): Likewise.
(vld2q_s32): Likewise.
(vld2q_u32): Likewise.
(vld4q_s32): Likewise.
(vld4q_u32): Likewise.
(vld4q_f16): Likewise.
(vld2q_f16): Likewise.
(vld1q_z_f16): Likewise.
(vst2q_f16): Likewise.
(vst1q_p_f16): Likewise.
(vld4q_f32): Likewise.
(vld2q_f32): Likewise.
(vld1q_z_f32): Likewise.
(vst2q_f32): Likewise.
(vst1q_p_f32): Likewise.
(__arm_vst1q_p_u8): Define intrinsic.
(__arm_vst1q_p_s8): Likewise.
(__arm_vst2q_s8): Likewise.
(__arm_vst2q_u8): Likewise.
(__arm_vld1q_z_u8): Likewise.
(__arm_vld1q_z_s8): Likewise.
(__arm_vld2q_s8): Likewise.
(__arm_vld2q_u8): Likewise.
(__arm_vld4q_s8): Likewise.
(__arm_vld4q_u8): Likewise.
(__arm_vst1q_p_u16): Likewise.
(__arm_vst1q_p_s16): Likewise.
(__arm_vst2q_s16): Likewise.
(__arm_vst2q_u16): Likewise.
(__arm_vld1q_z_u16): Likewise.
(__arm_vld1q_z_s16): Likewise.
(__arm_vld2q_s16): Likewise.
(__arm_vld2q_u16): Likewise.
(__arm_vld4q_s16): Likewise.
(__arm_vld4q_u16): Likewise.
(__arm_vst1q_p_u32): Likewise.
(__arm_vst1q_p_s32): Likewise.
(__arm_vst2q_s32): Likewise.
(__arm_vst2q_u32): Likewise.
(__arm_vld1q_z_u32): Likewise.
(__arm_vld1q_z_s32): Likewise.
(__arm_vld2q_s32): Likewise.
(__arm_vld2q_u32): Likewise.
(__arm_vld4q_s32): Likewise.
(__arm_vld4q_u32): Likewise.
(__arm_vld4q_f16): Likewise.
(__arm_vld2q_f16): Likewise.
(__arm_vld1q_z_f16): Likewise.
(__arm_vst2q_f16): Likewise.
(__arm_vst1q_p_f16): Likewise.
(__arm_vld4q_f32): Likewise.
(__arm_vld2q_f32): Likewise.
(__arm_vld1q_z_f32): Likewise.
(__arm_vst2q_f32): Likewise.
(__arm_vst1q_p_f32): Likewise.
(vld1q_z): Define polymorphic variant.
(vld2q): Likewise.
(vld4q): Likewise.
(vst1q_p): Likewise.
(vst2q): Likewise.
* config/arm/arm_mve_builtins.def (STORE1): Use builtin qualifier.
(LOAD1): Likewise.
* config/arm/mve.md (mve_vst2q<mode>): Define RTL pattern.
(mve_vld2q<mode>): Likewise.
(mve_vld4q<mode>): Likewise.
gcc/testsuite/ChangeLog:
2020-03-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
Andre Vieira <andre.simoesdiasvieira@arm.com>
Mihail Ionescu <mihail.ionescu@arm.com>
* gcc.target/arm/mve/intrinsics/vld1q_z_f16.c: New test.
* gcc.target/arm/mve/intrinsics/vld1q_z_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld1q_z_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld2q_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld2q_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld2q_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld2q_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld2q_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld2q_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld2q_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld2q_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld4q_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld4q_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld4q_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld4q_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld4q_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld4q_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld4q_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vld4q_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst1q_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst2q_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst2q_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst2q_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst2q_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst2q_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst2q_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst2q_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vst2q_u8.c: Likewise.
|