diff options
author | irar <irar@138bc75d-0d04-0410-961f-82ee72b054a4> | 2006-11-22 08:46:03 +0000 |
---|---|---|
committer | irar <irar@138bc75d-0d04-0410-961f-82ee72b054a4> | 2006-11-22 08:46:03 +0000 |
commit | 6b8dbb533d1711e0ca80c4526a2ef57344d8067c (patch) | |
tree | 4b2ea3085ae0c7d9acece7d444f408bf4d827a97 | |
parent | f88c47bea82f0ece38a76a3fb236d3e49080c9dc (diff) | |
download | gcc-6b8dbb533d1711e0ca80c4526a2ef57344d8067c.tar.gz |
* doc/c-tree.texi: Document new tree codes.
* doc/md.texi: Document new optabs.
* tree-pretty-print.c (dump_generic_node): Handle print of new tree
codes.
* optabs.c (optab_for_tree_code, init_optabs): Handle new optabs.
* optabs.h (optab_index): Add new.
(vec_extract_even_optab, vec_extract_odd_optab,
vec_interleave_high_optab, vec_interleave_low_optab): New optabs.
* genopinit.c (vec_extract_even_optab, vec_extract_odd_optab,
vec_interleave_high_optab, vec_interleave_low_optab): Initialize
new optabs.
* expr.c (expand_expr_real_1): Add implementation for new tree codes.
* tree-vectorizer.c (new_stmt_vec_info): Initialize new fields.
* tree-vectorizer.h (stmt_vec_info): Add new fields for interleaving
along with macros for their access.
* tree-data-ref.h (first_location_in_loop, data_reference): Update
comment.
* tree-vect-analyze.c (toplev.h): Include.
(vect_determine_vectorization_factor): Fix indentation.
(vect_insert_into_interleaving_chain,
vect_update_interleaving_chain, vect_equal_offsets): New functions.
(vect_analyze_data_ref_dependence): Add argument for interleaving
check. Check for interleaving if it's true.
(vect_check_dependences): New function.
(vect_analyze_data_ref_dependences): Call vect_check_dependences for
every ddr. Call vect_analyze_data_ref_dependence with new argument.
(vect_update_misalignment_for_peel): Update for interleaving.
(vect_verify_datarefs_alignment): Check only first data-ref for
interleaving.
(vect_enhance_data_refs_alignment): Update for interleaving. Check
only first data-ref for interleaving.
(vect_analyze_data_ref_access): Check interleaving, update
interleaving data.
(vect_analyze_data_refs): Call compute_data_dependences_for_loop
with different parameters.
* tree.def (VEC_EXTRACT_EVEN_EXPR, VEC_EXTRACT_ODD_EXPR,
VEC_INTERLEAVE_HIGH_EXPR, VEC_INTERLEAVE_LOW_EXPR): New tree codes.
* tree-inline.c (estimate_num_insns_1): Add cases for new codes.
* tree-vect-transform.c (vect_create_addr_base_for_vector_ref):
Update step in case of interleaving.
(vect_strided_store_supported, vect_permute_store_chain): New
functions.
(vectorizable_store): Handle strided stores.
(vect_strided_load_supported, vect_permute_load_chain,
vect_transform_strided_load): New functions.
(vectorizable_load): Handle strided loads.
(vect_transform_stmt): Add argument. Handle strided stores. Check
that vectorized stmt exists for patterns.
(vect_gen_niters_for_prolog_loop): Update calculation for
interleaving.
(vect_transform_loop): Remove stmt_vec_info for strided stores after
whole chain vectorization.
* config/rs6000/altivec.md (UNSPEC_EXTEVEN, UNSPEC_EXTODD,
UNSPEC_INTERHI, UNSPEC_INTERLO): New constants.
(vpkuhum_nomode, vpkuwum_nomode, vec_extract_even<mode>,
vec_extract_odd<mode>, altivec_vmrghsf, altivec_vmrglsf,
vec_interleave_high<mode>, vec_interleave_low<mode>): Implement.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@119088 138bc75d-0d04-0410-961f-82ee72b054a4
43 files changed, 3766 insertions, 198 deletions
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 1a72573b2db..665fe81fe28 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,63 @@ +2006-11-22 Ira Rosen <irar@il.ibm.com> + + * doc/c-tree.texi: Document new tree codes. + * doc/md.texi: Document new optabs. + * tree-pretty-print.c (dump_generic_node): Handle print of new tree + codes. + * optabs.c (optab_for_tree_code, init_optabs): Handle new optabs. + * optabs.h (optab_index): Add new. + (vec_extract_even_optab, vec_extract_odd_optab, + vec_interleave_high_optab, vec_interleave_low_optab): New optabs. + * genopinit.c (vec_extract_even_optab, vec_extract_odd_optab, + vec_interleave_high_optab, vec_interleave_low_optab): Initialize + new optabs. + * expr.c (expand_expr_real_1): Add implementation for new tree codes. + * tree-vectorizer.c (new_stmt_vec_info): Initialize new fields. + * tree-vectorizer.h (stmt_vec_info): Add new fields for interleaving + along with macros for their access. + * tree-data-ref.h (first_location_in_loop, data_reference): Update + comment. + * tree-vect-analyze.c (toplev.h): Include. + (vect_determine_vectorization_factor): Fix indentation. + (vect_insert_into_interleaving_chain, + vect_update_interleaving_chain, vect_equal_offsets): New functions. + (vect_analyze_data_ref_dependence): Add argument for interleaving + check. Check for interleaving if it's true. + (vect_check_dependences): New function. + (vect_analyze_data_ref_dependences): Call vect_check_dependences for + every ddr. Call vect_analyze_data_ref_dependence with new argument. + (vect_update_misalignment_for_peel): Update for interleaving. + (vect_verify_datarefs_alignment): Check only first data-ref for + interleaving. + (vect_enhance_data_refs_alignment): Update for interleaving. Check + only first data-ref for interleaving. + (vect_analyze_data_ref_access): Check interleaving, update + interleaving data. + (vect_analyze_data_refs): Call compute_data_dependences_for_loop + with different parameters. + * tree.def (VEC_EXTRACT_EVEN_EXPR, VEC_EXTRACT_ODD_EXPR, + VEC_INTERLEAVE_HIGH_EXPR, VEC_INTERLEAVE_LOW_EXPR): New tree codes. + * tree-inline.c (estimate_num_insns_1): Add cases for new codes. + * tree-vect-transform.c (vect_create_addr_base_for_vector_ref): + Update step in case of interleaving. + (vect_strided_store_supported, vect_permute_store_chain): New + functions. + (vectorizable_store): Handle strided stores. + (vect_strided_load_supported, vect_permute_load_chain, + vect_transform_strided_load): New functions. + (vectorizable_load): Handle strided loads. + (vect_transform_stmt): Add argument. Handle strided stores. Check + that vectorized stmt exists for patterns. + (vect_gen_niters_for_prolog_loop): Update calculation for + interleaving. + (vect_transform_loop): Remove stmt_vec_info for strided stores after + whole chain vectorization. + * config/rs6000/altivec.md (UNSPEC_EXTEVEN, UNSPEC_EXTODD, + UNSPEC_INTERHI, UNSPEC_INTERLO): New constants. + (vpkuhum_nomode, vpkuwum_nomode, vec_extract_even<mode>, + vec_extract_odd<mode>, altivec_vmrghsf, altivec_vmrglsf, + vec_interleave_high<mode>, vec_interleave_low<mode>): Implement. + 2006-11-22 Steven Bosscher <steven@gcc.gnu.org> * cse.c (enum taken): Remove PATH_AROUND. diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 7a78a9405f3..c10615e7315 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -115,6 +115,22 @@ (UNSPEC_REALIGN_LOAD 215) (UNSPEC_REDUC_PLUS 217) (UNSPEC_VECSH 219) + (UNSPEC_EXTEVEN_V4SI 220) + (UNSPEC_EXTEVEN_V8HI 221) + (UNSPEC_EXTEVEN_V16QI 222) + (UNSPEC_EXTEVEN_V4SF 223) + (UNSPEC_EXTODD_V4SI 224) + (UNSPEC_EXTODD_V8HI 225) + (UNSPEC_EXTODD_V16QI 226) + (UNSPEC_EXTODD_V4SF 227) + (UNSPEC_INTERHI_V4SI 228) + (UNSPEC_INTERHI_V8HI 229) + (UNSPEC_INTERHI_V16QI 230) + (UNSPEC_INTERHI_V4SF 231) + (UNSPEC_INTERLO_V4SI 232) + (UNSPEC_INTERLO_V8HI 233) + (UNSPEC_INTERLO_V16QI 234) + (UNSPEC_INTERLO_V4SF 235) (UNSPEC_VCOND_V4SI 301) (UNSPEC_VCOND_V4SF 302) (UNSPEC_VCOND_V8HI 303) @@ -136,7 +152,9 @@ (UNSPEC_VUPKLUH 319) (UNSPEC_VPERMSI 320) (UNSPEC_VPERMHI 321) - ]) + (UNSPEC_INTERHI 322) + (UNSPEC_INTERLO 323) +]) (define_constants [(UNSPECV_SET_VRSAVE 30) @@ -855,6 +873,23 @@ "vmrghw %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrghsf" + [(set (match_operand:V4SF 0 "register_operand" "=v") + (vec_merge:V4SF (vec_select:V4SF (match_operand:V4SF 1 "register_operand" "v") + (parallel [(const_int 0) + (const_int 2) + (const_int 1) + (const_int 3)])) + (vec_select:V4SF (match_operand:V4SF 2 "register_operand" "v") + (parallel [(const_int 2) + (const_int 0) + (const_int 3) + (const_int 1)])) + (const_int 5)))] + "TARGET_ALTIVEC" + "vmrghw %0,%1,%2" + [(set_attr "type" "vecperm")]) + (define_insn "altivec_vmrglb" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_merge:V16QI (vec_select:V16QI (match_operand:V16QI 1 "register_operand" "v") @@ -938,6 +973,23 @@ "vmrglw %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrglsf" + [(set (match_operand:V4SF 0 "register_operand" "=v") + (vec_merge:V4SF (vec_select:V4SF (match_operand:V4SF 1 "register_operand" "v") + (parallel [(const_int 2) + (const_int 0) + (const_int 3) + (const_int 1)])) + (vec_select:V4SF (match_operand:V4SF 2 "register_operand" "v") + (parallel [(const_int 0) + (const_int 2) + (const_int 1) + (const_int 3)])) + (const_int 5)))] + "TARGET_ALTIVEC" + "vmrglw %0,%1,%2" + [(set_attr "type" "vecperm")]) + (define_insn "altivec_vmuleub" [(set (match_operand:V8HI 0 "register_operand" "=v") (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v") @@ -2601,3 +2653,290 @@ DONE; }") + +(define_expand "vec_extract_evenv4si" + [(set (match_operand:V4SI 0 "register_operand" "") + (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "") + (match_operand:V4SI 2 "register_operand" "")] + UNSPEC_EXTEVEN_V4SI))] + "TARGET_ALTIVEC" + " +{ + rtx mask = gen_reg_rtx (V16QImode); + rtvec v = rtvec_alloc (16); + + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 0); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 1); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 2); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 3); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 8); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 9); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 10); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 18); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 19); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 24); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 25); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 26); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 27); + emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); + emit_insn (gen_altivec_vperm_v4si (operands[0], operands[1], operands[2], mask)); + + DONE; +}") + +(define_expand "vec_extract_evenv4sf" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V8HI [(match_operand:V4SF 1 "register_operand" "") + (match_operand:V4SF 2 "register_operand" "")] + UNSPEC_EXTEVEN_V4SF))] + "TARGET_ALTIVEC" + " +{ + rtx mask = gen_reg_rtx (V16QImode); + rtvec v = rtvec_alloc (16); + + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 0); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 1); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 2); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 3); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 8); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 9); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 10); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 18); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 19); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 24); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 25); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 26); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 27); + emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); + emit_insn (gen_altivec_vperm_v4sf (operands[0], operands[1], operands[2], mask)); + + DONE; +}") + +(define_expand "vec_extract_evenv8hi" + [(set (match_operand:V4SI 0 "register_operand" "") + (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "") + (match_operand:V8HI 2 "register_operand" "")] + UNSPEC_EXTEVEN_V8HI))] + "TARGET_ALTIVEC" + " +{ + rtx mask = gen_reg_rtx (V16QImode); + rtvec v = rtvec_alloc (16); + + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 0); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 1); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 4); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 5); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 8); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 9); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 12); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 13); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 20); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 21); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 24); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 25); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 28); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 29); + emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); + emit_insn (gen_altivec_vperm_v8hi (operands[0], operands[1], operands[2], mask)); + + DONE; +}") + +(define_expand "vec_extract_evenv16qi" + [(set (match_operand:V4SI 0 "register_operand" "") + (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "") + (match_operand:V16QI 2 "register_operand" "")] + UNSPEC_EXTEVEN_V16QI))] + "TARGET_ALTIVEC" + " +{ + rtx mask = gen_reg_rtx (V16QImode); + rtvec v = rtvec_alloc (16); + + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 0); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 2); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 4); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 6); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 8); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 10); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 12); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 14); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 18); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 20); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 22); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 24); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 26); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 28); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 30); + emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); + emit_insn (gen_altivec_vperm_v16qi (operands[0], operands[1], operands[2], mask)); + + DONE; +}") + +(define_expand "vec_extract_oddv4si" + [(set (match_operand:V4SI 0 "register_operand" "") + (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "") + (match_operand:V4SI 2 "register_operand" "")] + UNSPEC_EXTODD_V4SI))] + "TARGET_ALTIVEC" + " +{ + rtx mask = gen_reg_rtx (V16QImode); + rtvec v = rtvec_alloc (16); + + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 4); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 5); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 6); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 7); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 12); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 13); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 14); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 15); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 20); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 21); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 22); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 23); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 28); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 29); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 30); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 31); + emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); + emit_insn (gen_altivec_vperm_v4si (operands[0], operands[1], operands[2], mask)); + + DONE; +}") + +(define_expand "vec_extract_oddv4sf" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V8HI [(match_operand:V4SF 1 "register_operand" "") + (match_operand:V4SF 2 "register_operand" "")] + UNSPEC_EXTODD_V4SF))] + "TARGET_ALTIVEC" + " +{ + rtx mask = gen_reg_rtx (V16QImode); + rtvec v = rtvec_alloc (16); + + RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 4); + RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 5); + RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 6); + RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 7); + RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 12); + RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 13); + RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 14); + RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 15); + RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 20); + RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 21); + RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 22); + RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 23); + RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 28); + RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 29); + RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 30); + RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 31); + emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v))); + emit_insn (gen_altivec_vperm_v4sf (operands[0], operands[1], operands[2], mask)); + + DONE; +}") + +(define_insn "vpkuhum_nomode" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (unspec:V16QI [(match_operand 1 "register_operand" "v") + (match_operand 2 "register_operand" "v")] + UNSPEC_VPKUHUM))] + "TARGET_ALTIVEC" + "vpkuhum %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "vpkuwum_nomode" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (unspec:V8HI [(match_operand 1 "register_operand" "v") + (match_operand 2 "register_operand" "v")] + UNSPEC_VPKUWUM))] + "TARGET_ALTIVEC" + "vpkuwum %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_expand "vec_extract_oddv8hi" + [(set (match_operand:V8HI 0 "register_operand" "") + (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "") + (match_operand:V8HI 2 "register_operand" "")] + UNSPEC_EXTODD_V8HI))] + "TARGET_ALTIVEC" + " +{ + emit_insn (gen_vpkuwum_nomode (operands[0], operands[1], operands[2])); + DONE; +}") + +(define_expand "vec_extract_oddv16qi" + [(set (match_operand:V16QI 0 "register_operand" "") + (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "") + (match_operand:V16QI 2 "register_operand" "")] + UNSPEC_EXTODD_V16QI))] + "TARGET_ALTIVEC" + " +{ + emit_insn (gen_vpkuhum_nomode (operands[0], operands[1], operands[2])); + DONE; +}") +(define_expand "vec_interleave_highv4sf" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "") + (match_operand:V4SF 2 "register_operand" "")] + UNSPEC_INTERHI_V4SF))] + "TARGET_ALTIVEC" + " +{ + emit_insn (gen_altivec_vmrghsf (operands[0], operands[1], operands[2])); + DONE; +}") + +(define_expand "vec_interleave_lowv4sf" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "") + (match_operand:V4SF 2 "register_operand" "")] + UNSPEC_INTERLO_V4SF))] + "TARGET_ALTIVEC" + " +{ + emit_insn (gen_altivec_vmrglsf (operands[0], operands[1], operands[2])); + DONE; +}") + +(define_expand "vec_interleave_high<mode>" + [(set (match_operand:VI 0 "register_operand" "") + (unspec:VI [(match_operand:VI 1 "register_operand" "") + (match_operand:VI 2 "register_operand" "")] + UNSPEC_INTERHI))] + "TARGET_ALTIVEC" + " +{ + emit_insn (gen_altivec_vmrgh<VI_char> (operands[0], operands[1], operands[2])); + DONE; +}") + +(define_expand "vec_interleave_low<mode>" + [(set (match_operand:VI 0 "register_operand" "") + (unspec:VI [(match_operand:VI 1 "register_operand" "") + (match_operand:VI 2 "register_operand" "")] + UNSPEC_INTERLO))] + "TARGET_ALTIVEC" + " +{ + emit_insn (gen_altivec_vmrgl<VI_char> (operands[0], operands[1], operands[2])); + DONE; +}") diff --git a/gcc/doc/c-tree.texi b/gcc/doc/c-tree.texi index 486e71d7779..b8a4faf0548 100644 --- a/gcc/doc/c-tree.texi +++ b/gcc/doc/c-tree.texi @@ -1936,6 +1936,10 @@ This macro returns the attributes on the type @var{type}. @tindex VEC_UNPACK_LO_EXPR @tindex VEC_PACK_MOD_EXPR @tindex VEC_PACK_SAT_EXPR +@tindex VEC_EXTRACT_EVEN_EXPR +@tindex VEC_EXTRACT_ODD_EXPR +@tindex VEC_INTERLEAVE_HIGH_EXPR +@tindex VEC_INTERLEAVE_LOW_EXPR The internal representation for expressions is for the most part quite straightforward. However, there are a few facts that one must bear in @@ -2783,4 +2787,21 @@ elements, of an integral type whose size is half as wide. In both cases the elements of the two vectors are demoted and merged (concatenated) to form the output vector. +@item VEC_EXTRACT_EVEN_EXPR +@item VEC_EXTRACT_ODD_EXPR +These nodes represent extracting of the even/odd elements of the two input +vectors, respectively. Their operands and result are vectors that contain the +same number of elements of the same type. + +@item VEC_INTERLEAVE_HIGH_EXPR +@item VEC_INTERLEAVE_LOW_EXPR +These nodes represent merging and interleaving of the high/low elements of the +two input vectors, respectively. The operands and the result are vectors that +contain the same number of elements (@code{N}) of the same type. +In the case of @code{VEC_INTERLEAVE_HIGH_EXPR}, the high @code{N/2} elements of +the first input vector are interleaved with the high @code{N/2} elements of the +second input vector. In the case of @code{VEC_INTERLEAVE_LOW_EXPR}, the low +@code{N/2} elements of the first input vector are interleaved with the low +@code{N/2} elements of the second input vector. + @end table diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 6e7a55a34ec..9f840b00d1d 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -3465,6 +3465,34 @@ operand 1 is new value of field and operand 2 specify the field index. Extract given field from the vector value. Operand 1 is the vector, operand 2 specify field index and operand 0 place to store value into. +@cindex @code{vec_extract_even@var{m}} instruction pattern +@item @samp{vec_extract_even@var{m}} +Extract even elements from the input vectors (operand 1 and operand 2). +The even elements of operand 2 are concatenated to the even elements of operand +1 in their original order. The result is stored in operand 0. +The output and input vectors should have the same modes. + +@cindex @code{vec_extract_odd@var{m}} instruction pattern +@item @samp{vec_extract_odd@var{m}} +Extract odd elements from the input vectors (operand 1 and operand 2). +The odd elements of operand 2 are concatenated to the odd elements of operand +1 in their original order. The result is stored in operand 0. +The output and input vectors should have the same modes. + +@cindex @code{vec_interleave_high@var{m}} instruction pattern +@item @samp{vec_interleave_high@var{m}} +Merge high elements of the two input vectors into the output vector. The output +and input vectors should have the same modes (@code{N} elements). The high +@code{N/2} elements of the first input vector are interleaved with the high +@code{N/2} elements of the second input vector. + +@cindex @code{vec_interleave_low@var{m}} instruction pattern +@item @samp{vec_interleave_low@var{m}} +Merge low elements of the two input vectors into the output vector. The output +and input vectors should have the same modes (@code{N} elements). The low +@code{N/2} elements of the first input vector are interleaved with the low +@code{N/2} elements of the second input vector. + @cindex @code{vec_init@var{m}} instruction pattern @item @samp{vec_init@var{m}} Initialize the vector to given values. Operand 0 is the vector to initialize diff --git a/gcc/expr.c b/gcc/expr.c index 74627882a91..f3372333cf0 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -8745,6 +8745,30 @@ expand_expr_real_1 (tree exp, rtx target, enum machine_mode tmode, return temp; } + case VEC_EXTRACT_EVEN_EXPR: + case VEC_EXTRACT_ODD_EXPR: + { + expand_operands (TREE_OPERAND (exp, 0), TREE_OPERAND (exp, 1), + NULL_RTX, &op0, &op1, 0); + this_optab = optab_for_tree_code (code, type); + temp = expand_binop (mode, this_optab, op0, op1, target, unsignedp, + OPTAB_WIDEN); + gcc_assert (temp); + return temp; + } + + case VEC_INTERLEAVE_HIGH_EXPR: + case VEC_INTERLEAVE_LOW_EXPR: + { + expand_operands (TREE_OPERAND (exp, 0), TREE_OPERAND (exp, 1), + NULL_RTX, &op0, &op1, 0); + this_optab = optab_for_tree_code (code, type); + temp = expand_binop (mode, this_optab, op0, op1, target, unsignedp, + OPTAB_WIDEN); + gcc_assert (temp); + return temp; + } + case VEC_LSHIFT_EXPR: case VEC_RSHIFT_EXPR: { diff --git a/gcc/genopinit.c b/gcc/genopinit.c index ceac9fe9507..ad72637b902 100644 --- a/gcc/genopinit.c +++ b/gcc/genopinit.c @@ -199,6 +199,10 @@ static const char * const optabs[] = "sync_lock_release[$A] = CODE_FOR_$(sync_lock_release$I$a$)", "vec_set_optab->handlers[$A].insn_code = CODE_FOR_$(vec_set$a$)", "vec_extract_optab->handlers[$A].insn_code = CODE_FOR_$(vec_extract$a$)", + "vec_extract_even_optab->handlers[$A].insn_code = CODE_FOR_$(vec_extract_even$a$)", + "vec_extract_odd_optab->handlers[$A].insn_code = CODE_FOR_$(vec_extract_odd$a$)", + "vec_interleave_high_optab->handlers[$A].insn_code = CODE_FOR_$(vec_interleave_high$a$)", + "vec_interleave_low_optab->handlers[$A].insn_code = CODE_FOR_$(vec_interleave_low$a$)", "vec_init_optab->handlers[$A].insn_code = CODE_FOR_$(vec_init$a$)", "vec_shl_optab->handlers[$A].insn_code = CODE_FOR_$(vec_shl_$a$)", "vec_shr_optab->handlers[$A].insn_code = CODE_FOR_$(vec_shr_$a$)", diff --git a/gcc/optabs.c b/gcc/optabs.c index a638056d0b1..9a7731d2214 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -359,6 +359,18 @@ optab_for_tree_code (enum tree_code code, tree type) case ABS_EXPR: return trapv ? absv_optab : abs_optab; + case VEC_EXTRACT_EVEN_EXPR: + return vec_extract_even_optab; + + case VEC_EXTRACT_ODD_EXPR: + return vec_extract_odd_optab; + + case VEC_INTERLEAVE_HIGH_EXPR: + return vec_interleave_high_optab; + + case VEC_INTERLEAVE_LOW_EXPR: + return vec_interleave_low_optab; + default: return NULL; } @@ -5384,6 +5396,10 @@ init_optabs (void) udot_prod_optab = init_optab (UNKNOWN); vec_extract_optab = init_optab (UNKNOWN); + vec_extract_even_optab = init_optab (UNKNOWN); + vec_extract_odd_optab = init_optab (UNKNOWN); + vec_interleave_high_optab = init_optab (UNKNOWN); + vec_interleave_low_optab = init_optab (UNKNOWN); vec_set_optab = init_optab (UNKNOWN); vec_init_optab = init_optab (UNKNOWN); vec_shl_optab = init_optab (UNKNOWN); diff --git a/gcc/optabs.h b/gcc/optabs.h index d197766c772..85d9ca7f32f 100644 --- a/gcc/optabs.h +++ b/gcc/optabs.h @@ -27,6 +27,7 @@ Boston, MA 02110-1301, USA. */ /* Optabs are tables saying how to generate insn bodies for various machine modes and numbers of operands. Each optab applies to one operation. + For example, add_optab applies to addition. The insn_code slot is the enum insn_code that says how to @@ -253,6 +254,12 @@ enum optab_index OTI_vec_set, /* Extract specified field of vector operand. */ OTI_vec_extract, + /* Extract even/odd fields of vector operands. */ + OTI_vec_extract_even, + OTI_vec_extract_odd, + /* Interleave fields of vector operands. */ + OTI_vec_interleave_high, + OTI_vec_interleave_low, /* Initialize vector operand. */ OTI_vec_init, /* Whole vector shift. The shift amount is in bits. */ @@ -397,6 +404,10 @@ extern GTY(()) optab optab_table[OTI_MAX]; #define vec_set_optab (optab_table[OTI_vec_set]) #define vec_extract_optab (optab_table[OTI_vec_extract]) +#define vec_extract_even_optab (optab_table[OTI_vec_extract_even]) +#define vec_extract_odd_optab (optab_table[OTI_vec_extract_odd]) +#define vec_interleave_high_optab (optab_table[OTI_vec_interleave_high]) +#define vec_interleave_low_optab (optab_table[OTI_vec_interleave_low]) #define vec_init_optab (optab_table[OTI_vec_init]) #define vec_shl_optab (optab_table[OTI_vec_shl]) #define vec_shr_optab (optab_table[OTI_vec_shr]) diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index ae046ed0c3e..2e65881667c 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,29 @@ +2006-11-22 Ira Rosen <irar@il.ibm.com> + + * gcc.dg/vect/vect-1.c: Additional loop is now vectorizable on + platforms that have interleaving support. + * gcc.dg/vect/vect-107.c, gcc.dg/vect/vect-98.c: Likewise. + * gcc.dg/vect/vect-strided-a-u16-i2.c, + gcc.dg/vect/vect-strided-a-u16-i4.c, gcc.dg/vect/vect-strided-u16-i2.c, + gcc.dg/vect/vect-strided-u16-i4.c, gcc.dg/vect/vect-strided-u32-i4.c, + gcc.dg/vect/vect-strided-u32-i8.c, gcc.dg/vect/vect-strided-u8-i2.c, + gcc.dg/vect/vect-strided-u8-i2-gap.c, + gcc.dg/vect/vect-strided-u8-i8.c, + gcc.dg/vect/vect-strided-u8-i8-gap2.c, + gcc.dg/vect/vect-strided-u8-i8-gap4.c, + gcc.dg/vect/vect-strided-u8-i8-gap7.c, + gcc.dg/vect/vect-strided-float.c, + gcc.dg/vect/vect-strided-a-mult.c, + gcc.dg/vect/vect-strided-mult-char-ls.c, + gcc.dg/vect/vect-strided-a-u16-mult.c, + gcc.dg/vect/vect-strided-a-u32-mult.c, + gcc.dg/vect/vect-strided-a-u8-i2-gap.c, + gcc.dg/vect/vect-strided-a-u8-i8-gap2.c, + gcc.dg/vect/vect-strided-a-u8-i8-gap7.c, + gcc.dg/vect/vect-strided-mult.c, + gcc.dg/vect/vect-strided-u32-mult.c: New testcases. + * lib/target-supports.exp (vect_extract_even_odd, vect_interleave): New. + 2006-11-22 Paul Thomas <pault@gcc.gnu.org> PR fortran/25087 diff --git a/gcc/testsuite/gcc.dg/vect/vect-1.c b/gcc/testsuite/gcc.dg/vect/vect-1.c index 6df6af078f5..1ec195c5352 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-1.c +++ b/gcc/testsuite/gcc.dg/vect/vect-1.c @@ -59,7 +59,8 @@ foo (int n) fbar (a); - /* Not vectorizable yet (access pattern). */ + /* Strided access. Vectorizable on platforms that support load of strided + accesses (extract of even/odd vector elements). */ for (i = 0; i < N/2; i++){ a[i] = b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i]; d[i] = b[2*i] * c[2*i+1] + b[2*i+1] * c[2*i]; @@ -85,6 +86,6 @@ foo (int n) fbar (a); } -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target vect_extract_even_odd } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail vect_extract_even_odd } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-107.c b/gcc/testsuite/gcc.dg/vect/vect-107.c index f8031afad11..e4f823f310d 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-107.c +++ b/gcc/testsuite/gcc.dg/vect/vect-107.c @@ -14,7 +14,8 @@ main1 (void) float c[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; float d[N] = {0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30}; - /* Strided access pattern. */ + /* Strided access. Vectorizable on platforms that support load of strided + accesses (extract of even/odd vector elements). */ for (i = 0; i < N/2; i++) { a[i] = b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i]; @@ -38,5 +39,6 @@ int main (void) return main1 (); } -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-98.c b/gcc/testsuite/gcc.dg/vect/vect-98.c index e1548bbc73b..56437e26d82 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-98.c +++ b/gcc/testsuite/gcc.dg/vect/vect-98.c @@ -36,6 +36,7 @@ int main (void) return main1 (ia); } -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times "not vectorized: complicated access pattern" 1 "vect" } } */ +/* Needs interleaving support. */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail { vect_interleave && vect_extract_even_odd } } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c new file mode 100644 index 00000000000..f269c9d8075 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c @@ -0,0 +1,76 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + unsigned short a; + unsigned short b; +} s; + +typedef struct { + unsigned int a; + unsigned int b; +} ii; + +int +main1 () +{ + s arr[N]; + s *ptr = arr; + ii iarr[N]; + ii *iptr = iarr; + s res[N]; + ii ires[N]; + int i; + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + iarr[i].a = i; + iarr[i].b = i * 3; + if (arr[i].a == 178) + abort(); + } + + for (i = 0; i < N; i++) + { + ires[i].a = iptr->b - iptr->a; + ires[i].b = iptr->b + iptr->a; + res[i].b = ptr->b - ptr->a; + res[i].a = ptr->b + ptr->a; + iptr++; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].b != arr[i].b - arr[i].a + || ires[i].a != iarr[i].b - iarr[i].a + || res[i].a != arr[i].b + arr[i].a + || ires[i].b != iarr[i].b + iarr[i].a +) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + + check_vect (); + + main1 (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c new file mode 100644 index 00000000000..6cc62b47f34 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c @@ -0,0 +1,60 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + unsigned short a; + unsigned short b; +} s; + +int +main1 () +{ + s arr[N]; + s *ptr = arr; + s res[N]; + int i; + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + if (arr[i].a == 178) + abort(); + } + + for (i = 0; i < N; i++) + { + res[i].a = ptr->b - ptr->a; + res[i].b = ptr->b + ptr->a; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].a != arr[i].b - arr[i].a + || res[i].b != arr[i].a + arr[i].b) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + + check_vect (); + + main1 (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c new file mode 100644 index 00000000000..140f963e2ab --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c @@ -0,0 +1,73 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + unsigned short a; + unsigned short b; + unsigned short c; + unsigned short d; +} s; + +int +main1 () +{ + s arr[N]; + s *ptr = arr; + s res[N]; + int i; + unsigned short x, y, z, w; + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + arr[i].c = 17; + arr[i].d = i+34; + if (arr[i].a == 178) + abort(); + } + + for (i = 0; i < N; i++) + { + x = ptr->b - ptr->a; + y = ptr->d - ptr->c; + res[i].c = x + y; + z = ptr->a + ptr->c; + w = ptr->b + ptr->d; + res[i].a = z + w; + res[i].d = x + y; + res[i].b = x + y; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].a != arr[i].a + arr[i].c + arr[i].b + arr[i].d + || res[i].d != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].b != arr[i].b - arr[i].a + arr[i].d - arr[i].c) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + + check_vect (); + + main1 (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c new file mode 100644 index 00000000000..5d45bf84c9d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c @@ -0,0 +1,67 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + unsigned short a; + unsigned short b; +} s; + +int +main1 () +{ + s arr[N]; + s *ptr = arr; + unsigned int iarr[N]; + unsigned int *iptr = iarr; + s res[N]; + unsigned int ires[N]; + int i; + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + iarr[i] = i * 3; + if (arr[i].a == 178) + abort(); + } + + for (i = 0; i < N; i++) + { + ires[i] = *iptr; + res[i].b = ptr->b - ptr->a; + res[i].a = ptr->b + ptr->a; + iptr++; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].b != arr[i].b - arr[i].a + || ires[i] != iarr[i] + || res[i].a != arr[i].b + arr[i].a) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + + check_vect (); + + main1 (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c new file mode 100644 index 00000000000..0fd0fdbd5bc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c @@ -0,0 +1,67 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + unsigned int a; + unsigned int b; +} ii; + +int +main1 () +{ + unsigned short arr[N]; + unsigned short *ptr = arr; + ii iarr[N]; + ii *iptr = iarr; + unsigned short res[N]; + ii ires[N]; + int i; + + for (i = 0; i < N; i++) + { + arr[i] = i; + iarr[i].a = i; + iarr[i].b = i * 3; + if (arr[i] == 178) + abort(); + } + + for (i = 0; i < N; i++) + { + ires[i].a = iptr->b - iptr->a; + ires[i].b = iptr->b + iptr->a; + res[i] = *ptr; + iptr++; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i] != arr[i] + || ires[i].a != iarr[i].b - iarr[i].a + || ires[i].b != iarr[i].b + iarr[i].a) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + + check_vect (); + + main1 (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c new file mode 100644 index 00000000000..671b7d2b6db --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c @@ -0,0 +1,74 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 64 + +typedef struct { + unsigned char a; + unsigned char b; +} s; + +int +main1 () +{ + s arr[N]; + s *ptr = arr; + s res[N]; + int i; + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + if (arr[i].a == 178) + abort(); + } + + for (i = 0; i < N; i++) + { + res[i].a = ptr->a; + res[i].b = ptr->a; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].a != arr[i].a + || res[i].b != arr[i].a) + abort (); + } + + ptr = arr; + /* Not vectorizable: gap in store. */ + for (i = 0; i < N; i++) + { + res[i].a = ptr->b; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].a != arr[i].b) + abort (); + } + + + return 0; +} + +int main (void) +{ + check_vect (); + + main1 (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c new file mode 100644 index 00000000000..ce567955ae5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c @@ -0,0 +1,82 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include <stdio.h> +#include "tree-vect.h" + +#define N 16 + +typedef struct { + unsigned char a; + unsigned char b; + unsigned char c; + unsigned char d; + unsigned char e; + unsigned char f; + unsigned char g; + unsigned char h; +} s; + +int +main1 () +{ + int i; + s arr[N]; + s *ptr = arr; + s res[N]; + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + arr[i].c = 17; + arr[i].d = i+34; + arr[i].e = i + 5; + arr[i].f = i * 2 + 2; + arr[i].g = i - 3; + arr[i].h = 56; + if (arr[i].a == 178) + abort(); + } + + for (i = 0; i < N; i++) + { + res[i].c = ptr->a; + res[i].a = ptr->f + ptr->a; + res[i].d = ptr->f - ptr->a; + res[i].b = ptr->f; + res[i].f = ptr->a; + res[i].e = ptr->f - ptr->a; + res[i].h = ptr->f; + res[i].g = ptr->f - ptr->a; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].c != arr[i].a + || res[i].a != arr[i].f + arr[i].a + || res[i].d != arr[i].f - arr[i].a + || res[i].b != arr[i].f + || res[i].f != arr[i].a + || res[i].e != arr[i].f - arr[i].a + || res[i].h != arr[i].f + || res[i].g != arr[i].f - arr[i].a) + abort(); + } +} + + +int main (void) +{ + check_vect (); + + main1 (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c new file mode 100644 index 00000000000..740d0568dce --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c @@ -0,0 +1,86 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 16 + +typedef struct { + unsigned char a; + unsigned char b; + unsigned char c; + unsigned char d; + unsigned char e; + unsigned char f; + unsigned char g; + unsigned char h; +} s; + +int +main1 () +{ + int i; + s arr[N]; + s *ptr = arr; + s res[N]; + unsigned char u, t, s, x, y, z, w; + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + arr[i].c = 17; + arr[i].d = i+34; + arr[i].e = i * 3 + 5; + arr[i].f = i * 5; + arr[i].g = i - 3; + arr[i].h = 67; + if (arr[i].a == 178) + abort(); + } + + for (i = 0; i < N; i++) + { + u = ptr->b - ptr->a; + t = ptr->d - ptr->c; + res[i].c = u + t; + x = ptr->b + ptr->d; + res[i].a = ptr->a + x; + res[i].d = u + t; + s = ptr->h - ptr->a; + res[i].b = s + t; + res[i].f = ptr->f + ptr->h; + res[i].e = ptr->b + ptr->e; + res[i].h = ptr->d; + res[i].g = u + t; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].a != arr[i].a + arr[i].b + arr[i].d + || res[i].d != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].b != arr[i].h - arr[i].a + arr[i].d - arr[i].c + || res[i].f != arr[i].f + arr[i].h + || res[i].e != arr[i].b + arr[i].e + || res[i].h != arr[i].d + || res[i].g != arr[i].b - arr[i].a + arr[i].d - arr[i].c) + abort(); + } +} + + +int main (void) +{ + check_vect (); + + main1 (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-float.c b/gcc/testsuite/gcc.dg/vect/vect-strided-float.c new file mode 100644 index 00000000000..f2e4484563b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-float.c @@ -0,0 +1,44 @@ +/* { dg-require-effective-target vect_float } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 16 + +int +main1 (void) +{ + int i; + float a[N*2]; + float b[N*2] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57,60,63,66,69,72,75,78,81,84,87,90,93}; + float c[N*2] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; + + /* Strided access pattern. */ + for (i = 0; i < N/2; i++) + { + a[i*2] = b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i]; + a[i*2+1] = b[2*i+8] * c[2*i+9] + b[2*i+9] * c[2*i+8]; + } + + /* Check results. */ + for (i = 0; i < N/2; i++) + { + if (a[i*2] != b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i] + || a[i*2+1] != b[2*i+8] * c[2*i+9] + b[2*i+9] * c[2*i+8]) + abort(); + } + + return 0; +} + +int main (void) +{ + check_vect (); + return main1 (); +} + +/* Needs interleaving support. */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c b/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c new file mode 100644 index 00000000000..29d752d3c55 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c @@ -0,0 +1,76 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 32 + +typedef struct { + unsigned char a; + unsigned char b; +} s; + +typedef struct { + unsigned int a; + unsigned int b; +} ii; + +int +main1 (s *arr, ii *iarr) +{ + s *ptr = arr; + ii *iptr = iarr; + s res[N]; + ii ires[N]; + int i; + + for (i = 0; i < N; i++) + { + ires[i].a = iptr->b; + ires[i].b = iptr->a; + res[i].b = ptr->b - ptr->a; + res[i].a = ptr->b + ptr->a; + iptr++; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].b != arr[i].b - arr[i].a + || ires[i].a != iarr[i].b + || res[i].a != arr[i].b + arr[i].a + || ires[i].b != iarr[i].a +) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + s arr[N]; + ii iarr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + iarr[i].a = i; + iarr[i].b = i * 3; + if (arr[i].a == 178) + abort(); + } + + main1 (arr, iarr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c new file mode 100644 index 00000000000..823444ebeb4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c @@ -0,0 +1,76 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + unsigned short a; + unsigned short b; +} s; + +typedef struct { + unsigned int a; + unsigned int b; +} ii; + +int +main1 (s *arr, ii *iarr) +{ + s *ptr = arr; + ii *iptr = iarr; + s res[N]; + ii ires[N]; + int i; + + for (i = 0; i < N; i++) + { + ires[i].a = iptr->b - iptr->a; + ires[i].b = iptr->b + iptr->a; + res[i].b = ptr->b - ptr->a; + res[i].a = ptr->b + ptr->a; + iptr++; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].b != arr[i].b - arr[i].a + || ires[i].a != iarr[i].b - iarr[i].a + || res[i].a != arr[i].b + arr[i].a + || ires[i].b != iarr[i].b + iarr[i].a +) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + s arr[N]; + ii iarr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + iarr[i].a = i; + iarr[i].b = i * 3; + if (arr[i].a == 178) + abort(); + } + + main1 (arr, iarr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c new file mode 100644 index 00000000000..3c76410f3e0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c @@ -0,0 +1,60 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + unsigned short a; + unsigned short b; +} s; + +int +main1 (s *arr) +{ + s *ptr = arr; + s res[N]; + int i; + + for (i = 0; i < N; i++) + { + res[i].a = ptr->b - ptr->a; + res[i].b = ptr->b + ptr->a; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].a != arr[i].b - arr[i].a + || res[i].b != arr[i].a + arr[i].b) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + if (arr[i].a == 178) + abort(); + } + + main1 (arr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c new file mode 100644 index 00000000000..199e3633c77 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c @@ -0,0 +1,73 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + unsigned short a; + unsigned short b; + unsigned short c; + unsigned short d; +} s; + +int +main1 (s *arr) +{ + int i; + s *ptr = arr; + s res[N]; + unsigned short x, y, z, w; + + for (i = 0; i < N; i++) + { + x = ptr->b - ptr->a; + y = ptr->d - ptr->c; + res[i].c = x + y; + z = ptr->a + ptr->c; + w = ptr->b + ptr->d; + res[i].a = z + w; + res[i].d = x + y; + res[i].b = x + y; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].a != arr[i].a + arr[i].c + arr[i].b + arr[i].d + || res[i].d != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].b != arr[i].b - arr[i].a + arr[i].d - arr[i].c) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + arr[i].c = 17; + arr[i].d = i+34; + if (arr[i].a == 178) + abort(); + } + + main1 (arr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c new file mode 100644 index 00000000000..e872b97571a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c @@ -0,0 +1,68 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + int a; + int b; + int c; + int d; +} s; + +int +main1 (s *arr) +{ + int i; + s *ptr = arr; + s res[N]; + + for (i = 0; i < N; i++) + { + res[i].c = ptr->b - ptr->a + ptr->d - ptr->c; + res[i].a = ptr->a + ptr->c + ptr->b + ptr->d; + res[i].d = ptr->b - ptr->a + ptr->d - ptr->c; + res[i].b = ptr->b - ptr->a + ptr->d - ptr->c; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].a != arr[i].a + arr[i].c + arr[i].b + arr[i].d + || res[i].d != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].b != arr[i].b - arr[i].a + arr[i].d - arr[i].c) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + arr[i].c = 17; + arr[i].d = i+34; + if (arr[i].a == 178) + abort(); + } + + main1 (arr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c new file mode 100644 index 00000000000..7e8888f128c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c @@ -0,0 +1,82 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + int a; + int b; + int c; + int d; + int e; + int f; + int g; + int h; +} s; + +int +main1 (s *arr) +{ + int i; + s *ptr = arr; + s res[N]; + + for (i = 0; i < N; i++) + { + res[i].c = ptr->b - ptr->a + ptr->d - ptr->c; + res[i].a = ptr->a + ptr->g + ptr->b + ptr->d; + res[i].d = ptr->b - ptr->a + ptr->d - ptr->c; + res[i].b = ptr->h - ptr->a + ptr->d - ptr->c; + res[i].f = ptr->f + ptr->h; + res[i].e = ptr->b - ptr->e; + res[i].h = ptr->d - ptr->g; + res[i].g = ptr->b - ptr->a + ptr->d - ptr->c; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].a != arr[i].a + arr[i].g + arr[i].b + arr[i].d + || res[i].d != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].b != arr[i].h - arr[i].a + arr[i].d - arr[i].c + || res[i].f != arr[i].f + arr[i].h + || res[i].e != arr[i].b - arr[i].e + || res[i].h != arr[i].d - arr[i].g + || res[i].g != arr[i].b - arr[i].a + arr[i].d - arr[i].c) + abort(); + } +} + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + arr[i].c = 17; + arr[i].d = i+34; + arr[i].e = i * 3 + 5; + arr[i].f = i * 5; + arr[i].g = i - 3; + arr[i].h = 56; + if (arr[i].a == 178) + abort(); + } + + main1 (arr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c new file mode 100644 index 00000000000..188bef86f98 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c @@ -0,0 +1,66 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 128 + +typedef struct { + unsigned int a; + unsigned int b; +} ii; + +int +main1 (unsigned short *arr, ii *iarr) +{ + unsigned short *ptr = arr; + ii *iptr = iarr; + unsigned short res[N]; + ii ires[N]; + int i; + + for (i = 0; i < N; i++) + { + ires[i].a = iptr->b - iptr->a; + ires[i].b = iptr->b + iptr->a; + res[i] = *ptr; + iptr++; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i] != arr[i] + || ires[i].a != iarr[i].b - iarr[i].a + || ires[i].b != iarr[i].b + iarr[i].a) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + unsigned short arr[N]; + ii iarr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i] = i; + iarr[i].a = i; + iarr[i].b = i * 3; + if (arr[i] == 178) + abort(); + } + main1 (arr, iarr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c new file mode 100644 index 00000000000..86e86158e0c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c @@ -0,0 +1,76 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 64 + +typedef struct { + unsigned char a; + unsigned char b; +} s; + +int +main1 (s *arr) +{ + s *ptr = arr; + s res[N]; + int i; + + for (i = 0; i < N; i++) + { + res[i].a = ptr->b; + res[i].b = ptr->b; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].a != arr[i].b + || res[i].b != arr[i].b) + abort (); + } + + ptr = arr; + /* Not vectorizable: gap in store. */ + for (i = 0; i < N; i++) + { + res[i].a = ptr->b; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].a != arr[i].b) + abort (); + } + + + return 0; +} + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + if (arr[i].a == 178) + abort(); + } + + main1 (arr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c new file mode 100644 index 00000000000..b9dcbba6b6b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c @@ -0,0 +1,59 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 64 + +typedef struct { + unsigned char a; + unsigned char b; +} s; + +int +main1 (s *arr) +{ + s *ptr = arr; + s res[N]; + int i; + + for (i = 0; i < N; i++) + { + res[i].a = ptr->b - ptr->a; + res[i].b = ptr->b + ptr->a; + ptr++; + } + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].a != arr[i].b - arr[i].a + || res[i].b != arr[i].a + arr[i].b) + abort (); + } + + return 0; +} + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + if (arr[i].a == 178) + abort(); + } + + main1 (arr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c new file mode 100644 index 00000000000..8827ee10178 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c @@ -0,0 +1,84 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include <stdio.h> +#include "tree-vect.h" + +#define N 16 + +typedef struct { + unsigned char a; + unsigned char b; + unsigned char c; + unsigned char d; + unsigned char e; + unsigned char f; + unsigned char g; + unsigned char h; +} s; + +int +main1 (s *arr) +{ + int i; + s *ptr = arr; + s res[N]; + + for (i = 0; i < N; i++) + { + res[i].c = ptr->b; + res[i].a = ptr->f + ptr->b; + res[i].d = ptr->f - ptr->b; + res[i].b = ptr->f; + res[i].f = ptr->b; + res[i].e = ptr->f - ptr->b; + res[i].h = ptr->f; + res[i].g = ptr->f - ptr->b; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].c != arr[i].b + || res[i].a != arr[i].f + arr[i].b + || res[i].d != arr[i].f - arr[i].b + || res[i].b != arr[i].f + || res[i].f != arr[i].b + || res[i].e != arr[i].f - arr[i].b + || res[i].h != arr[i].f + || res[i].g != arr[i].f - arr[i].b) + abort(); + } +} + + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + arr[i].c = 17; + arr[i].d = i+34; + arr[i].e = i + 5; + arr[i].f = i * 2 + 2; + arr[i].g = i - 3; + arr[i].h = 56; + if (arr[i].a == 178) + abort(); + } + + main1 (arr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c new file mode 100644 index 00000000000..c176b3264f4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c @@ -0,0 +1,85 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 16 + +typedef struct { + unsigned char a; + unsigned char b; + unsigned char c; + unsigned char d; + unsigned char e; + unsigned char f; + unsigned char g; + unsigned char h; +} s; + +int +main1 (s *arr) +{ + int i; + s *ptr = arr; + s res[N]; + unsigned char x; + + for (i = 0; i < N; i++) + { + res[i].c = ptr->b + ptr->c; + x = ptr->c + ptr->f; + res[i].a = x + ptr->b; + res[i].d = ptr->b + ptr->c; + res[i].b = ptr->c; + res[i].f = ptr->f + ptr->e; + res[i].e = ptr->b + ptr->e; + res[i].h = ptr->c; + res[i].g = ptr->b + ptr->c; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].c != arr[i].b + arr[i].c + || res[i].a != arr[i].c + arr[i].f + arr[i].b + || res[i].d != arr[i].b + arr[i].c + || res[i].b != arr[i].c + || res[i].f != arr[i].f + arr[i].e + || res[i].e != arr[i].b + arr[i].e + || res[i].h != arr[i].c + || res[i].g != arr[i].b + arr[i].c) + abort(); + } +} + + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + arr[i].c = 17; + arr[i].d = i+34; + arr[i].e = i * 3 + 5; + arr[i].f = i * 5; + arr[i].g = i - 3; + arr[i].h = 56; + if (arr[i].a == 178) + abort(); + } + + main1 (arr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c new file mode 100644 index 00000000000..317fe039f6c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c @@ -0,0 +1,88 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 16 + +typedef struct { + unsigned char a; + unsigned char b; + unsigned char c; + unsigned char d; + unsigned char e; + unsigned char f; + unsigned char g; + unsigned char h; +} s; + +int +main1 (s *arr) +{ + int i; + s *ptr = arr; + s res[N]; + unsigned char u, t, s, x, y, z, w; + + for (i = 0; i < N; i++) + { + u = ptr->b - ptr->a; + t = ptr->d - ptr->c; + res[i].c = u + t; + x = ptr->b + ptr->d; + res[i].a = ptr->a + x; + res[i].d = u + t; + s = ptr->h - ptr->a; + res[i].b = s + t; + res[i].f = ptr->f + ptr->h; + res[i].e = ptr->b + ptr->e; + res[i].h = ptr->d; + res[i].g = u + t; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].a != arr[i].a + arr[i].b + arr[i].d + || res[i].d != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].b != arr[i].h - arr[i].a + arr[i].d - arr[i].c + || res[i].f != arr[i].f + arr[i].h + || res[i].e != arr[i].b + arr[i].e + || res[i].h != arr[i].d + || res[i].g != arr[i].b - arr[i].a + arr[i].d - arr[i].c) + abort(); + } +} + + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + arr[i].c = 17; + arr[i].d = i+34; + arr[i].e = i * 3 + 5; + arr[i].f = i * 5; + arr[i].g = i - 3; + arr[i].h = 67; + if (arr[i].a == 178) + abort(); + } + + main1 (arr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c new file mode 100644 index 00000000000..77a67e0dcfc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c @@ -0,0 +1,91 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include <stdio.h> +#include "tree-vect.h" + +#define N 32 + +typedef struct { + unsigned char a; + unsigned char b; + unsigned char c; + unsigned char d; + unsigned char e; + unsigned char f; + unsigned char g; + unsigned char h; +} s; + +int +main1 (s *arr) +{ + int i; + s *ptr = arr; + s res[N]; + unsigned char u, t, s, x, y, z, w; + + for (i = 0; i < N; i++) + { + u = ptr->b - ptr->a; + t = ptr->d - ptr->c; + res[i].c = u + t; + s = ptr->a + ptr->g; + x = ptr->b + ptr->d; + res[i].a = s + x; + res[i].d = u + t; + s = ptr->h - ptr->a; + x = ptr->d - ptr->c; + res[i].b = s + x; + res[i].f = ptr->f + ptr->h; + res[i].e = ptr->b + ptr->e; + res[i].h = ptr->d - ptr->g; + res[i].g = u + t; + ptr++; + } + + /* check results: */ + for (i = 0; i < N; i++) + { + if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].a != arr[i].a + arr[i].g + arr[i].b + arr[i].d + || res[i].d != arr[i].b - arr[i].a + arr[i].d - arr[i].c + || res[i].b != arr[i].h - arr[i].a + arr[i].d - arr[i].c + || res[i].f != arr[i].f + arr[i].h + || res[i].e != arr[i].b + arr[i].e + || res[i].h != arr[i].d - arr[i].g + || res[i].g != arr[i].b - arr[i].a + arr[i].d - arr[i].c + ) + abort(); + } +} + +int main (void) +{ + int i; + s arr[N]; + + check_vect (); + + for (i = 0; i < N; i++) + { + arr[i].a = i; + arr[i].b = i * 2; + arr[i].c = 17; + arr[i].d = i+34; + arr[i].e = i; + arr[i].f = i + 5; + arr[i].g = i + 3; + arr[i].h = 67; + if (arr[i].a == 178) + abort(); + } + + main1 (arr); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 2947c082719..e52139db9f9 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -1841,6 +1841,44 @@ proc check_effective_target_vect_int_mult { } { return $et_vect_int_mult_saved } +# Return 1 if the target supports vector even/odd elements extraction, 0 otherwise. + +proc check_effective_target_vect_extract_even_odd { } { + global et_vect_extract_even_odd_saved + + if [info exists et_vect_extract_even_odd_saved] { + verbose "check_effective_target_vect_extract_even_odd: using cached result" 2 + } else { + set et_vect_extract_even_odd_saved 0 + if { [istarget powerpc*-*-*] } { + set et_vect_extract_even_odd_saved 1 + } + } + + verbose "check_effective_target_vect_extract_even_odd: returning $et_vect_extract_even_odd_saved" 2 + return $et_vect_extract_even_odd_saved +} + +# Return 1 if the target supports vector interleaving, 0 otherwise. + +proc check_effective_target_vect_interleave { } { + global et_vect_interleave_saved + + if [info exists et_vect_interleave_saved] { + verbose "check_effective_target_vect_interleave: using cached result" 2 + } else { + set et_vect_interleave_saved 0 + if { [istarget powerpc*-*-*] + || [istarget i?86-*-*] + || [istarget x86_64-*-*] } { + set et_vect_interleave_saved 1 + } + } + + verbose "check_effective_target_vect_interleave: returning $et_vect_interleave_saved" 2 + return $et_vect_interleave_saved +} + # Return 1 if the target supports section-anchors proc check_effective_target_section_anchors { } { diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 7d6f9f94cb4..d80be31e3d0 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -24,25 +24,27 @@ Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA #include "lambda.h" -/** {base_address + offset + init} is the first location accessed by data-ref - in the loop, and step is the stride of data-ref in the loop in bytes; - e.g.: - +/* + The first location accessed by data-ref in the loop is the address of data-ref's + base (BASE_ADDRESS) plus the initial offset from the base. We divide the initial offset + into two parts: loop invariant offset (OFFSET) and constant offset (INIT). + STEP is the stride of data-ref in the loop in bytes. + Example 1 Example 2 data-ref a[j].b[i][j] a + x + 16B (a is int*) -First location info: + First location info: base_address &a a - offset j_0*D_j + i_0*D_i + C_a x - init C_b 16 + offset j_0*D_j + i_0*D_i x + init C_b + C_a 16 step D_j 4 access_fn NULL {16, +, 1} -Base object info: + Base object info: base_object a NULL access_fn <access_fns of indexes of b> NULL - **/ + */ struct first_location_in_loop { tree base_address; @@ -51,7 +53,6 @@ struct first_location_in_loop tree step; /* Access function related to first location in the loop. */ VEC(tree,heap) *access_fns; - }; struct base_object_info @@ -97,10 +98,39 @@ struct data_reference struct ptr_info_def *ptr_info; subvar_t subvars; - /* Alignment information. */ - /* The offset of the data-reference from its base in bytes. */ + /* Alignment information. + MISALIGNMENT is the offset of the data-reference from its base in bytes. + ALIGNED_TO is the maximum data-ref's alignment. + + Example 1, + for i + for (j = 3; j < N; j++) + a[j].b[i][j] = 0; + + For a[j].b[i][j], the offset from base (calculated in get_inner_reference() + will be 'i * C_i + j * C_j + C'. + We try to substitute the variables of the offset expression + with initial_condition of the corresponding access_fn in the loop. + 'i' cannot be substituted, since its access_fn in the inner loop is i. 'j' + will be substituted with 3. + + Example 2 + for (j = 3; j < N; j++) + a[j].b[5][j] = 0; + + Here the offset expression (j * C_j + C) will not contain variables after + subsitution of j=3 (3*C_j + C). + + Misalignment can be calculated only if all the variables can be + substituted with constants, otherwise, we record maximum possible alignment + in ALIGNED_TO. In Example 1, since 'i' cannot be substituted, + MISALIGNMENT will be NULL_TREE, and the biggest divider of C_i (a power of + 2) will be recorded in ALIGNED_TO. + + In Example 2, MISALIGNMENT will be the value of 3*C_j + C in bytes, and + ALIGNED_TO will be NULL_TREE. + */ tree misalignment; - /* The maximum data-ref's alignment. */ tree aligned_to; /* The type of the data-ref. */ diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 8ce32db366b..ad8a8bc6301 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -1771,6 +1771,11 @@ estimate_num_insns_1 (tree *tp, int *walk_subtrees, void *data) case WIDEN_MULT_EXPR: + case VEC_EXTRACT_EVEN_EXPR: + case VEC_EXTRACT_ODD_EXPR: + case VEC_INTERLEAVE_HIGH_EXPR: + case VEC_INTERLEAVE_LOW_EXPR: + case RESX_EXPR: *count += 1; break; diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index 77d1fb91b71..d33d8315ada 100644 --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -1957,6 +1957,38 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags, } break; + case VEC_EXTRACT_EVEN_EXPR: + pp_string (buffer, " VEC_EXTRACT_EVEN_EXPR < "); + dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false); + pp_string (buffer, ", "); + dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false); + pp_string (buffer, " > "); + break; + + case VEC_EXTRACT_ODD_EXPR: + pp_string (buffer, " VEC_EXTRACT_ODD_EXPR < "); + dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false); + pp_string (buffer, ", "); + dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false); + pp_string (buffer, " > "); + break; + + case VEC_INTERLEAVE_HIGH_EXPR: + pp_string (buffer, " VEC_INTERLEAVE_HIGH_EXPR < "); + dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false); + pp_string (buffer, ", "); + dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false); + pp_string (buffer, " > "); + break; + + case VEC_INTERLEAVE_LOW_EXPR: + pp_string (buffer, " VEC_INTERLEAVE_LOW_EXPR < "); + dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false); + pp_string (buffer, ", "); + dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false); + pp_string (buffer, " > "); + break; + default: NIY; } diff --git a/gcc/tree-vect-analyze.c b/gcc/tree-vect-analyze.c index 190e7dc8bad..4ea7b15dde5 100644 --- a/gcc/tree-vect-analyze.c +++ b/gcc/tree-vect-analyze.c @@ -38,6 +38,7 @@ Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA #include "tree-data-ref.h" #include "tree-scalar-evolution.h" #include "tree-vectorizer.h" +#include "toplev.h" /* Main analysis functions. */ static loop_vec_info vect_analyze_loop_form (struct loop *); @@ -56,13 +57,12 @@ static bool vect_determine_vectorization_factor (loop_vec_info); static bool exist_non_indexing_operands_for_use_p (tree, tree); static tree vect_get_loop_niters (struct loop *, tree *); static bool vect_analyze_data_ref_dependence - (struct data_dependence_relation *, loop_vec_info); + (struct data_dependence_relation *, loop_vec_info, bool); static bool vect_compute_data_ref_alignment (struct data_reference *); static bool vect_analyze_data_ref_access (struct data_reference *); static bool vect_can_advance_ivs_p (loop_vec_info); static void vect_update_misalignment_for_peel (struct data_reference *, struct data_reference *, int npeel); - /* Function vect_determine_vectorization_factor @@ -185,9 +185,10 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "nunits = %d", nunits); - if (!vectorization_factor - || (nunits > vectorization_factor)) - vectorization_factor = nunits; + if (!vectorization_factor + || (nunits > vectorization_factor)) + vectorization_factor = nunits; + } } @@ -559,6 +560,295 @@ vect_analyze_scalar_cycles (loop_vec_info loop_vinfo) } +/* Function vect_insert_into_interleaving_chain. + + Insert DRA into the interleaving chain of DRB according to DRA's INIT. */ + +static void +vect_insert_into_interleaving_chain (struct data_reference *dra, + struct data_reference *drb) +{ + tree prev, next, next_init; + stmt_vec_info stmtinfo_a = vinfo_for_stmt (DR_STMT (dra)); + stmt_vec_info stmtinfo_b = vinfo_for_stmt (DR_STMT (drb)); + + prev = DR_GROUP_FIRST_DR (stmtinfo_b); + next = DR_GROUP_NEXT_DR (vinfo_for_stmt (prev)); + while (next) + { + next_init = DR_INIT (STMT_VINFO_DATA_REF (vinfo_for_stmt (next))); + if (tree_int_cst_compare (next_init, DR_INIT (dra)) > 0) + { + /* Insert here. */ + DR_GROUP_NEXT_DR (vinfo_for_stmt (prev)) = DR_STMT (dra); + DR_GROUP_NEXT_DR (stmtinfo_a) = next; + return; + } + prev = next; + next = DR_GROUP_NEXT_DR (vinfo_for_stmt (prev)); + } + + /* We got to the end of the list. Insert here. */ + DR_GROUP_NEXT_DR (vinfo_for_stmt (prev)) = DR_STMT (dra); + DR_GROUP_NEXT_DR (stmtinfo_a) = NULL_TREE; +} + + +/* Function vect_update_interleaving_chain. + + For two data-refs DRA and DRB that are a part of a chain interleaved data + accesses, update the interleaving chain. DRB's INIT is smaller than DRA's. + + There are four possible cases: + 1. New stmts - both DRA and DRB are not a part of any chain: + FIRST_DR = DRB + NEXT_DR (DRB) = DRA + 2. DRB is a part of a chain and DRA is not: + no need to update FIRST_DR + no need to insert DRB + insert DRA according to init + 3. DRA is a part of a chain and DRB is not: + if (init of FIRST_DR > init of DRB) + FIRST_DR = DRB + NEXT(FIRST_DR) = previous FIRST_DR + else + insert DRB according to its init + 4. both DRA and DRB are in some interleaving chains: + choose the chain with the smallest init of FIRST_DR + insert the nodes of the second chain into the first one. */ + +static void +vect_update_interleaving_chain (struct data_reference *drb, + struct data_reference *dra) +{ + stmt_vec_info stmtinfo_a = vinfo_for_stmt (DR_STMT (dra)); + stmt_vec_info stmtinfo_b = vinfo_for_stmt (DR_STMT (drb)); + tree next_init, init_dra_chain, init_drb_chain, first_a, first_b; + tree node, prev, next, node_init, first_stmt; + + /* 1. New stmts - both DRA and DRB are not a part of any chain. */ + if (!DR_GROUP_FIRST_DR (stmtinfo_a) && !DR_GROUP_FIRST_DR (stmtinfo_b)) + { + DR_GROUP_FIRST_DR (stmtinfo_a) = DR_STMT (drb); + DR_GROUP_FIRST_DR (stmtinfo_b) = DR_STMT (drb); + DR_GROUP_NEXT_DR (stmtinfo_b) = DR_STMT (dra); + return; + } + + /* 2. DRB is a part of a chain and DRA is not. */ + if (!DR_GROUP_FIRST_DR (stmtinfo_a) && DR_GROUP_FIRST_DR (stmtinfo_b)) + { + DR_GROUP_FIRST_DR (stmtinfo_a) = DR_GROUP_FIRST_DR (stmtinfo_b); + /* Insert DRA into the chain of DRB. */ + vect_insert_into_interleaving_chain (dra, drb); + return; + } + + /* 3. DRA is a part of a chain and DRB is not. */ + if (DR_GROUP_FIRST_DR (stmtinfo_a) && !DR_GROUP_FIRST_DR (stmtinfo_b)) + { + tree old_first_stmt = DR_GROUP_FIRST_DR (stmtinfo_a); + tree init_old = DR_INIT (STMT_VINFO_DATA_REF (vinfo_for_stmt ( + old_first_stmt))); + tree tmp; + + if (tree_int_cst_compare (init_old, DR_INIT (drb)) > 0) + { + /* DRB's init is smaller than the init of the stmt previously marked + as the first stmt of the interleaving chain of DRA. Therefore, we + update FIRST_STMT and put DRB in the head of the list. */ + DR_GROUP_FIRST_DR (stmtinfo_b) = DR_STMT (drb); + DR_GROUP_NEXT_DR (stmtinfo_b) = old_first_stmt; + + /* Update all the stmts in the list to point to the new FIRST_STMT. */ + tmp = old_first_stmt; + while (tmp) + { + DR_GROUP_FIRST_DR (vinfo_for_stmt (tmp)) = DR_STMT (drb); + tmp = DR_GROUP_NEXT_DR (vinfo_for_stmt (tmp)); + } + } + else + { + /* Insert DRB in the list of DRA. */ + vect_insert_into_interleaving_chain (drb, dra); + DR_GROUP_FIRST_DR (stmtinfo_b) = DR_GROUP_FIRST_DR (stmtinfo_a); + } + return; + } + + /* 4. both DRA and DRB are in some interleaving chains. */ + first_a = DR_GROUP_FIRST_DR (stmtinfo_a); + first_b = DR_GROUP_FIRST_DR (stmtinfo_b); + if (first_a == first_b) + return; + init_dra_chain = DR_INIT (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_a))); + init_drb_chain = DR_INIT (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_b))); + + if (tree_int_cst_compare (init_dra_chain, init_drb_chain) > 0) + { + /* Insert the nodes of DRA chain into the DRB chain. + After inserting a node, continue from this node of the DRB chain (don't + start from the beginning. */ + node = DR_GROUP_FIRST_DR (stmtinfo_a); + prev = DR_GROUP_FIRST_DR (stmtinfo_b); + first_stmt = first_b; + } + else + { + /* Insert the nodes of DRB chain into the DRA chain. + After inserting a node, continue from this node of the DRA chain (don't + start from the beginning. */ + node = DR_GROUP_FIRST_DR (stmtinfo_b); + prev = DR_GROUP_FIRST_DR (stmtinfo_a); + first_stmt = first_a; + } + + while (node) + { + node_init = DR_INIT (STMT_VINFO_DATA_REF (vinfo_for_stmt (node))); + next = DR_GROUP_NEXT_DR (vinfo_for_stmt (prev)); + while (next) + { + next_init = DR_INIT (STMT_VINFO_DATA_REF (vinfo_for_stmt (next))); + if (tree_int_cst_compare (next_init, node_init) > 0) + { + /* Insert here. */ + DR_GROUP_NEXT_DR (vinfo_for_stmt (prev)) = node; + DR_GROUP_NEXT_DR (vinfo_for_stmt (node)) = next; + prev = node; + break; + } + prev = next; + next = DR_GROUP_NEXT_DR (vinfo_for_stmt (prev)); + } + if (!next) + { + /* We got to the end of the list. Insert here. */ + DR_GROUP_NEXT_DR (vinfo_for_stmt (prev)) = node; + DR_GROUP_NEXT_DR (vinfo_for_stmt (node)) = NULL_TREE; + prev = node; + } + DR_GROUP_FIRST_DR (vinfo_for_stmt (node)) = first_stmt; + node = DR_GROUP_NEXT_DR (vinfo_for_stmt (node)); + } +} + + +/* Function vect_equal_offsets. + + Check if OFFSET1 and OFFSET2 are identical expressions. */ + +static bool +vect_equal_offsets (tree offset1, tree offset2) +{ + bool res0, res1; + + STRIP_NOPS (offset1); + STRIP_NOPS (offset2); + + if (offset1 == offset2) + return true; + + if (TREE_CODE (offset1) != TREE_CODE (offset2) + || !BINARY_CLASS_P (offset1) + || !BINARY_CLASS_P (offset2)) + return false; + + res0 = vect_equal_offsets (TREE_OPERAND (offset1, 0), + TREE_OPERAND (offset2, 0)); + res1 = vect_equal_offsets (TREE_OPERAND (offset1, 1), + TREE_OPERAND (offset2, 1)); + + return (res0 && res1); +} + + +/* Function vect_check_interleaving. + + Check if DRA and DRB are a part of interleaving. In case they are, insert + DRA and DRB in an interleaving chain. */ + +static void +vect_check_interleaving (struct data_reference *dra, + struct data_reference *drb) +{ + HOST_WIDE_INT type_size_a, type_size_b, diff_mod_size, step, init_a, init_b; + + /* Check that the data-refs have same first location (except init) and they + are both either store or load (not load and store). */ + if ((DR_BASE_ADDRESS (dra) != DR_BASE_ADDRESS (drb) + && (TREE_CODE (DR_BASE_ADDRESS (dra)) != ADDR_EXPR + || TREE_CODE (DR_BASE_ADDRESS (drb)) != ADDR_EXPR + || TREE_OPERAND (DR_BASE_ADDRESS (dra), 0) + != TREE_OPERAND (DR_BASE_ADDRESS (drb),0))) + || !vect_equal_offsets (DR_OFFSET (dra), DR_OFFSET (drb)) + || !tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb)) + || DR_IS_READ (dra) != DR_IS_READ (drb)) + return; + + /* Check: + 1. data-refs are of the same type + 2. their steps are equal + 3. the step is greater than the difference between data-refs' inits */ + type_size_a = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dra)))); + type_size_b = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (drb)))); + + if (type_size_a != type_size_b + || tree_int_cst_compare (DR_STEP (dra), DR_STEP (drb))) + return; + + init_a = TREE_INT_CST_LOW (DR_INIT (dra)); + init_b = TREE_INT_CST_LOW (DR_INIT (drb)); + step = TREE_INT_CST_LOW (DR_STEP (dra)); + + if (init_a > init_b) + { + /* If init_a == init_b + the size of the type * k, we have an interleaving, + and DRB is accessed before DRA. */ + diff_mod_size = (init_a - init_b) % type_size_a; + + if ((init_a - init_b) > step) + return; + + if (diff_mod_size == 0) + { + vect_update_interleaving_chain (drb, dra); + if (vect_print_dump_info (REPORT_DR_DETAILS)) + { + fprintf (vect_dump, "Detected interleaving "); + print_generic_expr (vect_dump, DR_REF (dra), TDF_SLIM); + fprintf (vect_dump, " and "); + print_generic_expr (vect_dump, DR_REF (drb), TDF_SLIM); + } + return; + } + } + else + { + /* If init_b == init_a + the size of the type * k, we have an + interleaving, and DRA is accessed before DRB. */ + diff_mod_size = (init_b - init_a) % type_size_a; + + if ((init_b - init_a) > step) + return; + + if (diff_mod_size == 0) + { + vect_update_interleaving_chain (dra, drb); + if (vect_print_dump_info (REPORT_DR_DETAILS)) + { + fprintf (vect_dump, "Detected interleaving "); + print_generic_expr (vect_dump, DR_REF (dra), TDF_SLIM); + fprintf (vect_dump, " and "); + print_generic_expr (vect_dump, DR_REF (drb), TDF_SLIM); + } + return; + } + } +} + + /* Function vect_analyze_data_ref_dependence. Return TRUE if there (might) exist a dependence between a memory-reference @@ -566,7 +856,8 @@ vect_analyze_scalar_cycles (loop_vec_info loop_vinfo) static bool vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr, - loop_vec_info loop_vinfo) + loop_vec_info loop_vinfo, + bool check_interleaving) { unsigned int i; struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); @@ -581,6 +872,14 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr, unsigned int loop_depth; if (DDR_ARE_DEPENDENT (ddr) == chrec_known) + { + /* Independent data accesses. */ + if (check_interleaving) + vect_check_interleaving (dra, drb); + return false; + } + + if ((DR_IS_READ (dra) && DR_IS_READ (drb)) || dra == drb) return false; if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know) @@ -659,6 +958,36 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr, } +/* Function vect_check_dependences. + + Return TRUE if there is a store-store or load-store dependence between + data-refs in DDR, otherwise return FALSE. */ + +static bool +vect_check_dependences (struct data_dependence_relation *ddr) +{ + struct data_reference *dra = DDR_A (ddr); + struct data_reference *drb = DDR_B (ddr); + + if (DDR_ARE_DEPENDENT (ddr) == chrec_known || dra == drb) + /* Independent or same data accesses. */ + return false; + + if (DR_IS_READ (dra) == DR_IS_READ (drb) && DR_IS_READ (dra)) + /* Two loads. */ + return false; + + if (vect_print_dump_info (REPORT_DR_DETAILS)) + { + fprintf (vect_dump, "possible store or store/load dependence between "); + print_generic_expr (vect_dump, DR_REF (dra), TDF_SLIM); + fprintf (vect_dump, " and "); + print_generic_expr (vect_dump, DR_REF (drb), TDF_SLIM); + } + return true; +} + + /* Function vect_analyze_data_ref_dependences. Examine all the data references in the loop, and make sure there do not @@ -670,12 +999,24 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo) unsigned int i; VEC (ddr_p, heap) *ddrs = LOOP_VINFO_DDRS (loop_vinfo); struct data_dependence_relation *ddr; + bool check_interleaving = true; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vect_analyze_dependences ==="); + /* We allow interleaving only if there are no store-store and load-store + dependencies in the loop. */ for (i = 0; VEC_iterate (ddr_p, ddrs, i, ddr); i++) - if (vect_analyze_data_ref_dependence (ddr, loop_vinfo)) + { + if (vect_check_dependences (ddr)) + { + check_interleaving = false; + break; + } + } + + for (i = 0; VEC_iterate (ddr_p, ddrs, i, ddr); i++) + if (vect_analyze_data_ref_dependence (ddr, loop_vinfo, check_interleaving)) return false; return true; @@ -830,11 +1171,20 @@ vect_update_misalignment_for_peel (struct data_reference *dr, struct data_reference *current_dr; int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr)))); int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr_peel)))); + stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr)); + stmt_vec_info peel_stmt_info = vinfo_for_stmt (DR_STMT (dr_peel)); + + /* For interleaved data accesses the step in the loop must be multiplied by + the size of the interleaving group. */ + if (DR_GROUP_FIRST_DR (stmt_info)) + dr_size *= DR_GROUP_SIZE (vinfo_for_stmt (DR_GROUP_FIRST_DR (stmt_info))); + if (DR_GROUP_FIRST_DR (peel_stmt_info)) + dr_peel_size *= DR_GROUP_SIZE (peel_stmt_info); if (known_alignment_for_access_p (dr) && known_alignment_for_access_p (dr_peel) - && (DR_MISALIGNMENT (dr)/dr_size == - DR_MISALIGNMENT (dr_peel)/dr_peel_size)) + && (DR_MISALIGNMENT (dr) / dr_size == + DR_MISALIGNMENT (dr_peel) / dr_peel_size)) { DR_MISALIGNMENT (dr) = 0; return; @@ -848,15 +1198,15 @@ vect_update_misalignment_for_peel (struct data_reference *dr, { if (current_dr != dr) continue; - gcc_assert (DR_MISALIGNMENT (dr)/dr_size == - DR_MISALIGNMENT (dr_peel)/dr_peel_size); + gcc_assert (DR_MISALIGNMENT (dr) / dr_size == + DR_MISALIGNMENT (dr_peel) / dr_peel_size); DR_MISALIGNMENT (dr) = 0; return; } if (known_alignment_for_access_p (dr) && known_alignment_for_access_p (dr_peel)) - { + { DR_MISALIGNMENT (dr) += npeel * dr_size; DR_MISALIGNMENT (dr) %= UNITS_PER_SIMD_WORD; return; @@ -883,6 +1233,14 @@ vect_verify_datarefs_alignment (loop_vec_info loop_vinfo) for (i = 0; VEC_iterate (data_reference_p, datarefs, i, dr); i++) { + tree stmt = DR_STMT (dr); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + + /* For interleaving, only the alignment of the first access matters. */ + if (DR_GROUP_FIRST_DR (stmt_info) + && DR_GROUP_FIRST_DR (stmt_info) != stmt) + continue; + supportable_dr_alignment = vect_supportable_dr_alignment (dr); if (!supportable_dr_alignment) { @@ -1007,6 +1365,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) bool do_peeling = false; bool do_versioning = false; bool stat; + tree stmt; + stmt_vec_info stmt_info; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vect_enhance_data_refs_alignment ==="); @@ -1051,12 +1411,47 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) TODO: Use a cost model. */ for (i = 0; VEC_iterate (data_reference_p, datarefs, i, dr); i++) - if (!DR_IS_READ (dr) && !aligned_access_p (dr)) - { - dr0 = dr; - do_peeling = true; - break; - } + { + stmt = DR_STMT (dr); + stmt_info = vinfo_for_stmt (stmt); + + /* For interleaving, only the alignment of the first access + matters. */ + if (DR_GROUP_FIRST_DR (stmt_info) + && DR_GROUP_FIRST_DR (stmt_info) != stmt) + continue; + + if (!DR_IS_READ (dr) && !aligned_access_p (dr)) + { + if (DR_GROUP_FIRST_DR (stmt_info)) + { + /* For interleaved access we peel only if number of iterations in + the prolog loop ({VF - misalignment}), is a multiple of the + number of the interelaved accesses. */ + int elem_size, mis_in_elements; + int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + + /* FORNOW: handle only known alignment. */ + if (!known_alignment_for_access_p (dr)) + { + do_peeling = false; + break; + } + + elem_size = UNITS_PER_SIMD_WORD / vf; + mis_in_elements = DR_MISALIGNMENT (dr) / elem_size; + + if ((vf - mis_in_elements) % DR_GROUP_SIZE (stmt_info)) + { + do_peeling = false; + break; + } + } + dr0 = dr; + do_peeling = true; + break; + } + } /* Often peeling for alignment will require peeling for loop-bound, which in turn requires that we know how to adjust the loop ivs after the loop. */ @@ -1077,8 +1472,16 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) mis = DR_MISALIGNMENT (dr0); mis /= GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr0)))); npeel = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - mis; + + /* For interleaved data access every iteration accesses all the + members of the group, therefore we divide the number of iterations + by the group size. */ + stmt_info = vinfo_for_stmt (DR_STMT (dr0)); + if (DR_GROUP_FIRST_DR (stmt_info)) + npeel /= DR_GROUP_SIZE (stmt_info); + if (vect_print_dump_info (REPORT_DETAILS)) - fprintf (vect_dump, "Try peeling by %d",npeel); + fprintf (vect_dump, "Try peeling by %d", npeel); } /* Ensure that all data refs can be vectorized after the peel. */ @@ -1089,6 +1492,14 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) if (dr == dr0) continue; + stmt = DR_STMT (dr); + stmt_info = vinfo_for_stmt (stmt); + /* For interleaving, only the alignment of the first access + matters. */ + if (DR_GROUP_FIRST_DR (stmt_info) + && DR_GROUP_FIRST_DR (stmt_info) != stmt) + continue; + save_misalignment = DR_MISALIGNMENT (dr); vect_update_misalignment_for_peel (dr, dr0, npeel); supportable_dr_alignment = vect_supportable_dr_alignment (dr); @@ -1146,10 +1557,17 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) { for (i = 0; VEC_iterate (data_reference_p, datarefs, i, dr); i++) { - if (aligned_access_p (dr)) - continue; + stmt = DR_STMT (dr); + stmt_info = vinfo_for_stmt (stmt); + + /* For interleaving, only the alignment of the first access + matters. */ + if (aligned_access_p (dr) + || (DR_GROUP_FIRST_DR (stmt_info) + && DR_GROUP_FIRST_DR (stmt_info) != stmt)) + continue; - supportable_dr_alignment = vect_supportable_dr_alignment (dr); + supportable_dr_alignment = vect_supportable_dr_alignment (dr); if (!supportable_dr_alignment) { @@ -1266,14 +1684,157 @@ static bool vect_analyze_data_ref_access (struct data_reference *dr) { tree step = DR_STEP (dr); + HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step); tree scalar_type = TREE_TYPE (DR_REF (dr)); + HOST_WIDE_INT type_size = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (scalar_type)); + tree stmt = DR_STMT (dr); + /* For interleaving, STRIDE is STEP counted in elements, i.e., the size of the + interleaving group (including gaps). */ + HOST_WIDE_INT stride = dr_step / type_size; + + if (!step) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "bad data-ref access"); + return false; + } - if (!step || tree_int_cst_compare (step, TYPE_SIZE_UNIT (scalar_type))) + /* Consecutive? */ + if (!tree_int_cst_compare (step, TYPE_SIZE_UNIT (scalar_type))) { + /* Mark that it is not interleaving. */ + DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)) = NULL_TREE; + return true; + } + + /* Not consecutive access is possible only if it is a part of interleaving. */ + if (!DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt))) + { + /* Check if it this DR is a part of interleaving, and is a single + element of the group that is accessed in the loop. */ + + /* Gaps are supported only for loads. STEP must be a multiple of the type + size. The size of the group must be a power of 2. */ + if (DR_IS_READ (dr) + && (dr_step % type_size) == 0 + && stride > 0 + && exact_log2 (stride) != -1) + { + DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)) = stmt; + DR_GROUP_SIZE (vinfo_for_stmt (stmt)) = stride; + if (vect_print_dump_info (REPORT_DR_DETAILS)) + { + fprintf (vect_dump, "Detected single element interleaving %d ", + DR_GROUP_SIZE (vinfo_for_stmt (stmt))); + print_generic_expr (vect_dump, DR_REF (dr), TDF_SLIM); + fprintf (vect_dump, " step "); + print_generic_expr (vect_dump, step, TDF_SLIM); + } + return true; + } if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "not consecutive access"); return false; } + + if (DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)) == stmt) + { + /* First stmt in the interleaving chain. Check the chain. */ + tree next = DR_GROUP_NEXT_DR (vinfo_for_stmt (stmt)); + struct data_reference *data_ref = dr; + unsigned int count = 1; + tree next_step; + tree prev_init = DR_INIT (data_ref); + tree prev = stmt; + HOST_WIDE_INT diff, count_in_bytes; + + while (next) + { + /* Skip same data-refs. In case that two or more stmts share data-ref + (supported only for loads), we vectorize only the first stmt, and + the rest get their vectorized loads from the the first one. */ + if (!tree_int_cst_compare (DR_INIT (data_ref), + DR_INIT (STMT_VINFO_DATA_REF ( + vinfo_for_stmt (next))))) + { + /* For load use the same data-ref load. (We check in + vect_check_dependences() that there are no two stores to the + same location). */ + DR_GROUP_SAME_DR_STMT (vinfo_for_stmt (next)) = prev; + + prev = next; + next = DR_GROUP_NEXT_DR (vinfo_for_stmt (next)); + continue; + } + prev = next; + + /* Check that all the accesses have the same STEP. */ + next_step = DR_STEP (STMT_VINFO_DATA_REF (vinfo_for_stmt (next))); + if (tree_int_cst_compare (step, next_step)) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "not consecutive access in interleaving"); + return false; + } + + data_ref = STMT_VINFO_DATA_REF (vinfo_for_stmt (next)); + /* Check that the distance between two accesses is equal to the type + size. Otherwise, we have gaps. */ + diff = (TREE_INT_CST_LOW (DR_INIT (data_ref)) + - TREE_INT_CST_LOW (prev_init)) / type_size; + if (!DR_IS_READ (data_ref) && diff != 1) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "interleaved store with gaps"); + return false; + } + /* Store the gap from the previous member of the group. If there is no + gap in the access, DR_GROUP_GAP is always 1. */ + DR_GROUP_GAP (vinfo_for_stmt (next)) = diff; + + prev_init = DR_INIT (data_ref); + next = DR_GROUP_NEXT_DR (vinfo_for_stmt (next)); + /* Count the number of data-refs in the chain. */ + count++; + } + + /* COUNT is the number of accesses found, we multiply it by the size of + the type to get COUNT_IN_BYTES. */ + count_in_bytes = type_size * count; + /* Check the size of the interleaving is not greater than STEP. */ + if (dr_step < count_in_bytes) + { + if (vect_print_dump_info (REPORT_DETAILS)) + { + fprintf (vect_dump, "interleaving size is greater than step for "); + print_generic_expr (vect_dump, DR_REF (dr), TDF_SLIM); + } + return false; + } + + /* Check that STEP is a multiple of type size. */ + if ((dr_step % type_size) != 0) + { + if (vect_print_dump_info (REPORT_DETAILS)) + { + fprintf (vect_dump, "step is not a multiple of type size: step "); + print_generic_expr (vect_dump, step, TDF_SLIM); + fprintf (vect_dump, " size "); + print_generic_expr (vect_dump, TYPE_SIZE_UNIT (scalar_type), + TDF_SLIM); + } + return false; + } + + /* FORNOW: we handle only interleaving that is a power of 2. */ + if (exact_log2 (stride) == -1) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "interleaving is not a power of 2"); + return false; + } + DR_GROUP_SIZE (vinfo_for_stmt (stmt)) = stride; + } return true; } @@ -1335,7 +1896,7 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo) if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vect_analyze_data_refs ==="); - compute_data_dependences_for_loop (loop, false, + compute_data_dependences_for_loop (loop, true, &LOOP_VINFO_DATAREFS (loop_vinfo), &LOOP_VINFO_DDRS (loop_vinfo)); diff --git a/gcc/tree-vect-transform.c b/gcc/tree-vect-transform.c index b57208e2182..43a55f98ecf 100644 --- a/gcc/tree-vect-transform.c +++ b/gcc/tree-vect-transform.c @@ -46,7 +46,7 @@ Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA #include "real.h" /* Utility functions for the code transformation. */ -static bool vect_transform_stmt (tree, block_stmt_iterator *); +static bool vect_transform_stmt (tree, block_stmt_iterator *, bool *); static tree vect_create_destination_var (tree, tree); static tree vect_create_data_ref_ptr (tree, block_stmt_iterator *, tree, tree *, tree *, bool); @@ -160,9 +160,19 @@ vect_create_addr_base_for_vector_ref (tree stmt, if (offset) { tree tmp = create_tmp_var (TREE_TYPE (base_offset), "offset"); + tree step; + + /* For interleaved access step we divide STEP by the size of the + interleaving group. */ + if (DR_GROUP_SIZE (stmt_info)) + step = fold_build2 (TRUNC_DIV_EXPR, TREE_TYPE (offset), DR_STEP (dr), + build_int_cst (TREE_TYPE (offset), + DR_GROUP_SIZE (stmt_info))); + else + step = DR_STEP (dr); + add_referenced_var (tmp); - offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset), offset, - DR_STEP (dr)); + offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset), offset, step); base_offset = fold_build2 (PLUS_EXPR, TREE_TYPE (base_offset), base_offset, offset); base_offset = force_gimple_operand (base_offset, &new_stmt, false, tmp); @@ -2294,6 +2304,164 @@ vectorizable_type_promotion (tree stmt, block_stmt_iterator *bsi, } +/* Function vect_strided_store_supported. + + Returns TRUE is INTERLEAVE_HIGH and INTERLEAVE_LOW operations are supported, + and FALSE otherwise. */ + +static bool +vect_strided_store_supported (tree vectype) +{ + optab interleave_high_optab, interleave_low_optab; + int mode; + + mode = (int) TYPE_MODE (vectype); + + /* Check that the operation is supported. */ + interleave_high_optab = optab_for_tree_code (VEC_INTERLEAVE_HIGH_EXPR, + vectype); + interleave_low_optab = optab_for_tree_code (VEC_INTERLEAVE_LOW_EXPR, + vectype); + if (!interleave_high_optab || !interleave_low_optab) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "no optab for interleave."); + return false; + } + + if (interleave_high_optab->handlers[(int) mode].insn_code + == CODE_FOR_nothing + || interleave_low_optab->handlers[(int) mode].insn_code + == CODE_FOR_nothing) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "interleave op not supported by target."); + return false; + } + return true; +} + + +/* Function vect_permute_store_chain. + + Given a chain of interleaved strores in DR_CHAIN of LENGTH that must be + a power of 2, generate interleave_high/low stmts to reorder the data + correctly for the stores. Return the final references for stores in + RESULT_CHAIN. + + E.g., LENGTH is 4 and the scalar type is short, i.e., VF is 8. + The input is 4 vectors each containg 8 elements. We assign a number to each + element, the input sequence is: + + 1st vec: 0 1 2 3 4 5 6 7 + 2nd vec: 8 9 10 11 12 13 14 15 + 3rd vec: 16 17 18 19 20 21 22 23 + 4th vec: 24 25 26 27 28 29 30 31 + + The output sequence should be: + + 1st vec: 0 8 16 24 1 9 17 25 + 2nd vec: 2 10 18 26 3 11 19 27 + 3rd vec: 4 12 20 28 5 13 21 30 + 4th vec: 6 14 22 30 7 15 23 31 + + i.e., we interleave the contents of the four vectors in their order. + + We use interleave_high/low instructions to create such output. The input of + each interleave_high/low operation is two vectors: + 1st vec 2nd vec + 0 1 2 3 4 5 6 7 + the even elements of the result vector are obtained left-to-right from the + high/low elements of the first vector. The odd elements of the result are + obtained left-to-right from the high/low elements of the second vector. + The output of interleave_high will be: 0 4 1 5 + and of interleave_low: 2 6 3 7 + + + The permutaion is done in log LENGTH stages. In each stage interleave_high + and interleave_low stmts are created for each pair of vectors in DR_CHAIN, + where the first argument is taken from the first half of DR_CHAIN and the + second argument from it's second half. + In our example, + + I1: interleave_high (1st vec, 3rd vec) + I2: interleave_low (1st vec, 3rd vec) + I3: interleave_high (2nd vec, 4th vec) + I4: interleave_low (2nd vec, 4th vec) + + The output for the first stage is: + + I1: 0 16 1 17 2 18 3 19 + I2: 4 20 5 21 6 22 7 23 + I3: 8 24 9 25 10 26 11 27 + I4: 12 28 13 29 14 30 15 31 + + The output of the second stage, i.e. the final result is: + + I1: 0 8 16 24 1 9 17 25 + I2: 2 10 18 26 3 11 19 27 + I3: 4 12 20 28 5 13 21 30 + I4: 6 14 22 30 7 15 23 31. */ + +static bool +vect_permute_store_chain (VEC(tree,heap) *dr_chain, + unsigned int length, + tree stmt, + block_stmt_iterator *bsi, + VEC(tree,heap) **result_chain) +{ + tree perm_dest, perm_stmt, vect1, vect2, high, low; + tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)); + tree scalar_dest; + int i; + unsigned int j; + VEC(tree,heap) *first, *second; + + scalar_dest = TREE_OPERAND (stmt, 0); + first = VEC_alloc (tree, heap, length/2); + second = VEC_alloc (tree, heap, length/2); + + /* Check that the operation is supported. */ + if (!vect_strided_store_supported (vectype)) + return false; + + *result_chain = VEC_copy (tree, heap, dr_chain); + + for (i = 0; i < exact_log2 (length); i++) + { + for (j = 0; j < length/2; j++) + { + vect1 = VEC_index (tree, dr_chain, j); + vect2 = VEC_index (tree, dr_chain, j+length/2); + + /* high = interleave_high (vect1, vect2); */ + perm_dest = create_tmp_var (vectype, "vect_inter_high"); + add_referenced_var (perm_dest); + perm_stmt = build2 (MODIFY_EXPR, vectype, perm_dest, + build2 (VEC_INTERLEAVE_HIGH_EXPR, vectype, vect1, + vect2)); + high = make_ssa_name (perm_dest, perm_stmt); + TREE_OPERAND (perm_stmt, 0) = high; + vect_finish_stmt_generation (stmt, perm_stmt, bsi); + VEC_replace (tree, *result_chain, 2*j, high); + + /* low = interleave_low (vect1, vect2); */ + perm_dest = create_tmp_var (vectype, "vect_inter_low"); + add_referenced_var (perm_dest); + perm_stmt = build2 (MODIFY_EXPR, vectype, perm_dest, + build2 (VEC_INTERLEAVE_LOW_EXPR, vectype, vect1, + vect2)); + low = make_ssa_name (perm_dest, perm_stmt); + TREE_OPERAND (perm_stmt, 0) = low; + vect_finish_stmt_generation (stmt, perm_stmt, bsi); + VEC_replace (tree, *result_chain, 2*j+1, low); + } + dr_chain = VEC_copy (tree, heap, *result_chain); + } + return true; +} + + /* Function vectorizable_store. Check if STMT defines a non scalar data-ref (array/pointer/structure) that @@ -2310,7 +2478,7 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) tree op; tree vec_oprnd = NULL_TREE; stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL; tree vectype = STMT_VINFO_VECTYPE (stmt_info); loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); enum machine_mode vec_mode; @@ -2320,12 +2488,15 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) def_operand_p def_p; tree def, def_stmt; enum vect_def_type dt; - stmt_vec_info prev_stmt_info; + stmt_vec_info prev_stmt_info = NULL; tree dataref_ptr = NULL_TREE; int nunits = TYPE_VECTOR_SUBPARTS (vectype); int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; int j; - + tree next_stmt, first_stmt; + bool strided_store = false; + unsigned int group_size, i; + VEC(tree,heap) *dr_chain = NULL, *oprnds = NULL, *result_chain = NULL; gcc_assert (ncopies >= 1); /* Is vectorizable store? */ @@ -2335,7 +2506,8 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) scalar_dest = TREE_OPERAND (stmt, 0); if (TREE_CODE (scalar_dest) != ARRAY_REF - && TREE_CODE (scalar_dest) != INDIRECT_REF) + && TREE_CODE (scalar_dest) != INDIRECT_REF + && !DR_GROUP_FIRST_DR (stmt_info)) return false; op = TREE_OPERAND (stmt, 1); @@ -2355,6 +2527,12 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) if (!STMT_VINFO_DATA_REF (stmt_info)) return false; + if (DR_GROUP_FIRST_DR (stmt_info)) + { + strided_store = true; + if (!vect_strided_store_supported (vectype)) + return false; + } if (!vec_stmt) /* transformation not required. */ { @@ -2367,7 +2545,34 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "transform store. ncopies = %d",ncopies); - alignment_support_cheme = vect_supportable_dr_alignment (dr); + if (strided_store) + { + first_stmt = DR_GROUP_FIRST_DR (stmt_info); + first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt)); + group_size = DR_GROUP_SIZE (vinfo_for_stmt (first_stmt)); + + DR_GROUP_STORE_COUNT (vinfo_for_stmt (first_stmt))++; + + /* We vectorize all the stmts of the interleaving group when we + reach the last stmt in the group. */ + if (DR_GROUP_STORE_COUNT (vinfo_for_stmt (first_stmt)) + < DR_GROUP_SIZE (vinfo_for_stmt (first_stmt))) + { + *vec_stmt = NULL_TREE; + return true; + } + } + else + { + first_stmt = stmt; + first_dr = dr; + group_size = 1; + } + + dr_chain = VEC_alloc (tree, heap, group_size); + oprnds = VEC_alloc (tree, heap, group_size); + + alignment_support_cheme = vect_supportable_dr_alignment (first_dr); gcc_assert (alignment_support_cheme); gcc_assert (alignment_support_cheme == dr_aligned); /* FORNOW */ @@ -2377,6 +2582,39 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) vector stmt by a factor VF/nunits. For more details see documentation in vect_get_vec_def_for_copy_stmt. */ + /* In case of interleaving (non-unit strided access): + + S1: &base + 2 = x2 + S2: &base = x0 + S3: &base + 1 = x1 + S4: &base + 3 = x3 + + We create vectorized storess starting from base address (the access of the + first stmt in the chain (S2 in the above example), when the last store stmt + of the chain (S4) is reached: + + VS1: &base = vx2 + VS2: &base + vec_size*1 = vx0 + VS3: &base + vec_size*2 = vx1 + VS4: &base + vec_size*3 = vx3 + + Then permutation statements are generated: + + VS5: vx5 = VEC_INTERLEAVE_HIGH_EXPR < vx0, vx3 > + VS6: vx6 = VEC_INTERLEAVE_LOW_EXPR < vx0, vx3 > + ... + + And they are put in STMT_VINFO_VEC_STMT of the corresponding scalar stmts + (the order of the data-refs in the output of vect_permute_store_chain + corresponds to the order of scalar stmts in the interleaving chain - see + the documentaion of vect_permute_store_chain()). + + In case of both multiple types and interleaving, above vector stores and + permutation stmts are created for every copy. The result vector stmts are + put in STMT_VINFO_VEC_STMT for the first copy and in the corresponding + STMT_VINFO_RELATED_STMT for the next copies. + */ + prev_stmt_info = NULL; for (j = 0; j < ncopies; j++) { @@ -2385,52 +2623,111 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) if (j == 0) { - vec_oprnd = vect_get_vec_def_for_operand (op, stmt, NULL); - dataref_ptr = vect_create_data_ref_ptr (stmt, bsi, NULL_TREE, &dummy, - &ptr_incr, false); + /* For interleaved stores we collect vectorized defs for all the + stores in the group in DR_CHAIN and OPRNDS. DR_CHAIN is then used + as an input to vect_permute_store_chain(), and OPRNDS as an input + to vect_get_vec_def_for_stmt_copy() for the next copy. + If the store is not strided, GROUP_SIZE is 1, and DR_CHAIN and + OPRNDS are of size 1. + */ + next_stmt = first_stmt; + for (i = 0; i < group_size; i++) + { + /* Since gaps are not supported for interleaved stores, GROUP_SIZE + is the exact number of stmts in the chain. Therefore, NEXT_STMT + can't be NULL_TREE. In case that there is no interleaving, + GROUP_SIZE is 1, and only one iteration of the loop will be + executed. + */ + gcc_assert (next_stmt); + op = TREE_OPERAND (next_stmt, 1); + vec_oprnd = vect_get_vec_def_for_operand (op, next_stmt, NULL); + VEC_quick_push(tree, dr_chain, vec_oprnd); + VEC_quick_push(tree, oprnds, vec_oprnd); + next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt)); + } + dataref_ptr = vect_create_data_ref_ptr (first_stmt, bsi, NULL_TREE, + &dummy, &ptr_incr, false); } else { - vec_oprnd = vect_get_vec_def_for_stmt_copy (dt, vec_oprnd); + /* For interleaved stores we created vectorized defs for all the + defs stored in OPRNDS in the previous iteration (previous copy). + DR_CHAIN is then used as an input to vect_permute_store_chain(), + and OPRNDS as an input to vect_get_vec_def_for_stmt_copy() for the + next copy. + If the store is not strided, GROUP_SIZE is 1, and DR_CHAIN and + OPRNDS are of size 1. + */ + for (i = 0; i < group_size; i++) + { + vec_oprnd = vect_get_vec_def_for_stmt_copy (dt, + VEC_index (tree, oprnds, i)); + VEC_replace(tree, dr_chain, i, vec_oprnd); + VEC_replace(tree, oprnds, i, vec_oprnd); + } dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt); } - /* Arguments are ready. create the new vector stmt. */ - data_ref = build_fold_indirect_ref (dataref_ptr); - new_stmt = build2 (MODIFY_EXPR, vectype, data_ref, vec_oprnd); - vect_finish_stmt_generation (stmt, new_stmt, bsi); + if (strided_store) + { + result_chain = VEC_alloc (tree, heap, group_size); + /* Permute. */ + if (!vect_permute_store_chain (dr_chain, group_size, stmt, bsi, + &result_chain)) + return false; + } - /* Set the V_MAY_DEFS for the vector pointer. If this virtual def has a - use outside the loop and a loop peel is performed then the def may be - renamed by the peel. Mark it for renaming so the later use will also - be renamed. */ - copy_virtual_operands (new_stmt, stmt); - if (j == 0) + next_stmt = first_stmt; + for (i = 0; i < group_size; i++) { - /* The original store is deleted so the same SSA_NAMEs can be used. - */ - FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_VMAYDEF) + /* For strided stores vectorized defs are interleaved in + vect_permute_store_chain(). */ + if (strided_store) + vec_oprnd = VEC_index(tree, result_chain, i); + + data_ref = build_fold_indirect_ref (dataref_ptr); + /* Arguments are ready. Create the new vector stmt. */ + new_stmt = build2 (MODIFY_EXPR, vectype, data_ref, vec_oprnd); + vect_finish_stmt_generation (stmt, new_stmt, bsi); + + /* Set the V_MAY_DEFS for the vector pointer. If this virtual def has a + use outside the loop and a loop peel is performed then the def may be + renamed by the peel. Mark it for renaming so the later use will also + be renamed. */ + copy_virtual_operands (new_stmt, next_stmt); + if (j == 0) { - SSA_NAME_DEF_STMT (def) = new_stmt; - mark_sym_for_renaming (SSA_NAME_VAR (def)); + /* The original store is deleted so the same SSA_NAMEs can be used. + */ + FOR_EACH_SSA_TREE_OPERAND (def, next_stmt, iter, SSA_OP_VMAYDEF) + { + SSA_NAME_DEF_STMT (def) = new_stmt; + mark_sym_for_renaming (SSA_NAME_VAR (def)); + } + + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; } - - STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; - } - else - { - /* Create new names for all the definitions created by COPY and - add replacement mappings for each new name. */ - FOR_EACH_SSA_DEF_OPERAND (def_p, new_stmt, iter, SSA_OP_VMAYDEF) + else { - create_new_def_for (DEF_FROM_PTR (def_p), new_stmt, def_p); - mark_sym_for_renaming (SSA_NAME_VAR (DEF_FROM_PTR (def_p))); + /* Create new names for all the definitions created by COPY and + add replacement mappings for each new name. */ + FOR_EACH_SSA_DEF_OPERAND (def_p, new_stmt, iter, SSA_OP_VMAYDEF) + { + create_new_def_for (DEF_FROM_PTR (def_p), new_stmt, def_p); + mark_sym_for_renaming (SSA_NAME_VAR (DEF_FROM_PTR (def_p))); + } + + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; } - STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt)); + if (!next_stmt) + break; + /* Bump the vector pointer. */ + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt); } - - prev_stmt_info = vinfo_for_stmt (new_stmt); } return true; @@ -2473,8 +2770,7 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) Output: REALIGNMENT_TOKEN - the result of a call to the builtin_mask_for_load target hook, if defined. - Return value - the result of the loop-header phi node. -*/ + Return value - the result of the loop-header phi node. */ static tree vect_setup_realignment (tree stmt, block_stmt_iterator *bsi, @@ -2547,6 +2843,268 @@ vect_setup_realignment (tree stmt, block_stmt_iterator *bsi, } +/* Function vect_strided_load_supported. + + Returns TRUE is EXTRACT_EVEN and EXTRACT_ODD operations are supported, + and FALSE otherwise. */ + +static bool +vect_strided_load_supported (tree vectype) +{ + optab perm_even_optab, perm_odd_optab; + int mode; + + mode = (int) TYPE_MODE (vectype); + + perm_even_optab = optab_for_tree_code (VEC_EXTRACT_EVEN_EXPR, vectype); + if (!perm_even_optab) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "no optab for perm_even."); + return false; + } + + if (perm_even_optab->handlers[mode].insn_code == CODE_FOR_nothing) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "perm_even op not supported by target."); + return false; + } + + perm_odd_optab = optab_for_tree_code (VEC_EXTRACT_ODD_EXPR, vectype); + if (!perm_odd_optab) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "no optab for perm_odd."); + return false; + } + + if (perm_odd_optab->handlers[mode].insn_code == CODE_FOR_nothing) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "perm_odd op not supported by target."); + return false; + } + return true; +} + + +/* Function vect_permute_load_chain. + + Given a chain of interleaved loads in DR_CHAIN of LENGTH that must be + a power of 2, generate extract_even/odd stmts to reorder the input data + correctly. Return the final references for loads in RESULT_CHAIN. + + E.g., LENGTH is 4 and the scalar type is short, i.e., VF is 8. + The input is 4 vectors each containg 8 elements. We assign a number to each + element, the input sequence is: + + 1st vec: 0 1 2 3 4 5 6 7 + 2nd vec: 8 9 10 11 12 13 14 15 + 3rd vec: 16 17 18 19 20 21 22 23 + 4th vec: 24 25 26 27 28 29 30 31 + + The output sequence should be: + + 1st vec: 0 4 8 12 16 20 24 28 + 2nd vec: 1 5 9 13 17 21 25 29 + 3rd vec: 2 6 10 14 18 22 26 30 + 4th vec: 3 7 11 15 19 23 27 31 + + i.e., the first output vector should contain the first elements of each + interleaving group, etc. + + We use extract_even/odd instructions to create such output. The input of each + extract_even/odd operation is two vectors + 1st vec 2nd vec + 0 1 2 3 4 5 6 7 + + and the output is the vector of extracted even/odd elements. The output of + extract_even will be: 0 2 4 6 + and of extract_odd: 1 3 5 7 + + + The permutaion is done in log LENGTH stages. In each stage extract_even and + extract_odd stmts are created for each pair of vectors in DR_CHAIN in their + order. In our example, + + E1: extract_even (1st vec, 2nd vec) + E2: extract_odd (1st vec, 2nd vec) + E3: extract_even (3rd vec, 4th vec) + E4: extract_odd (3rd vec, 4th vec) + + The output for the first stage will be: + + E1: 0 2 4 6 8 10 12 14 + E2: 1 3 5 7 9 11 13 15 + E3: 16 18 20 22 24 26 28 30 + E4: 17 19 21 23 25 27 29 31 + + In order to proceed and create the correct sequence for the next stage (or + for the correct output, if the second stage is the last one, as in our + example), we first put the output of extract_even operation and then the + output of extract_odd in RESULT_CHAIN (which is then copied to DR_CHAIN). + The input for the second stage is: + + 1st vec (E1): 0 2 4 6 8 10 12 14 + 2nd vec (E3): 16 18 20 22 24 26 28 30 + 3rd vec (E2): 1 3 5 7 9 11 13 15 + 4th vec (E4): 17 19 21 23 25 27 29 31 + + The output of the second stage: + + E1: 0 4 8 12 16 20 24 28 + E2: 2 6 10 14 18 22 26 30 + E3: 1 5 9 13 17 21 25 29 + E4: 3 7 11 15 19 23 27 31 + + And RESULT_CHAIN after reordering: + + 1st vec (E1): 0 4 8 12 16 20 24 28 + 2nd vec (E3): 1 5 9 13 17 21 25 29 + 3rd vec (E2): 2 6 10 14 18 22 26 30 + 4th vec (E4): 3 7 11 15 19 23 27 31. */ + +static bool +vect_permute_load_chain (VEC(tree,heap) *dr_chain, + unsigned int length, + tree stmt, + block_stmt_iterator *bsi, + VEC(tree,heap) **result_chain) +{ + tree perm_dest, perm_stmt, data_ref, first_vect, second_vect; + tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)); + int i; + unsigned int j; + + /* Check that the operation is supported. */ + if (!vect_strided_load_supported (vectype)) + return false; + + *result_chain = VEC_copy (tree, heap, dr_chain); + for (i = 0; i < exact_log2 (length); i++) + { + for (j = 0; j < length; j +=2) + { + first_vect = VEC_index (tree, dr_chain, j); + second_vect = VEC_index (tree, dr_chain, j+1); + + /* data_ref = permute_even (first_data_ref, second_data_ref); */ + perm_dest = create_tmp_var (vectype, "vect_perm_even"); + add_referenced_var (perm_dest); + + perm_stmt = build2 (MODIFY_EXPR, vectype, perm_dest, + build2 (VEC_EXTRACT_EVEN_EXPR, vectype, + first_vect, second_vect)); + + data_ref = make_ssa_name (perm_dest, perm_stmt); + TREE_OPERAND (perm_stmt, 0) = data_ref; + vect_finish_stmt_generation (stmt, perm_stmt, bsi); + mark_new_vars_to_rename (perm_stmt); + + VEC_replace (tree, *result_chain, j/2, data_ref); + + /* data_ref = permute_odd (first_data_ref, second_data_ref); */ + perm_dest = create_tmp_var (vectype, "vect_perm_odd"); + add_referenced_var (perm_dest); + + perm_stmt = build2 (MODIFY_EXPR, vectype, perm_dest, + build2 (VEC_EXTRACT_ODD_EXPR, vectype, + first_vect, second_vect)); + data_ref = make_ssa_name (perm_dest, perm_stmt); + TREE_OPERAND (perm_stmt, 0) = data_ref; + vect_finish_stmt_generation (stmt, perm_stmt, bsi); + mark_new_vars_to_rename (perm_stmt); + + VEC_replace (tree, *result_chain, j/2+length/2, data_ref); + } + dr_chain = VEC_copy (tree, heap, *result_chain); + } + return true; +} + + +/* Function vect_transform_strided_load. + + Given a chain of input interleaved data-refs (in DR_CHAIN), build statements + to perform their permutation and ascribe the result vectorized statements to + the scalar statements. +*/ + +static bool +vect_transform_strided_load (tree stmt, VEC(tree,heap) *dr_chain, int size, + block_stmt_iterator *bsi) +{ + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + tree first_stmt = DR_GROUP_FIRST_DR (stmt_info); + tree next_stmt, new_stmt; + VEC(tree,heap) *result_chain = NULL; + unsigned int i, gap_count; + tree tmp_data_ref; + + /* DR_CHAIN contains input data-refs that are a part of the interleaving. + RESULT_CHAIN is the output of vect_permute_load_chain, it contains permuted + vectors, that are ready for vector computation. */ + result_chain = VEC_alloc (tree, heap, size); + /* Permute. */ + if (!vect_permute_load_chain (dr_chain, size, stmt, bsi, &result_chain)) + return false; + + /* Put a permuted data-ref in the VECTORIZED_STMT field. + Since we scan the chain starting from it's first node, their order + corresponds the order of data-refs in RESULT_CHAIN. */ + next_stmt = first_stmt; + gap_count = 1; + for (i = 0; VEC_iterate(tree, result_chain, i, tmp_data_ref); i++) + { + if (!next_stmt) + break; + + /* Skip the gaps. Loads created for the gaps will be removed by dead + code elimination pass later. + DR_GROUP_GAP is the number of steps in elements from the previous + access (if there is no gap DR_GROUP_GAP is 1). We skip loads that + correspond to the gaps. + */ + if (gap_count < DR_GROUP_GAP (vinfo_for_stmt (next_stmt))) + { + gap_count++; + continue; + } + + while (next_stmt) + { + new_stmt = SSA_NAME_DEF_STMT (tmp_data_ref); + /* We assume that if VEC_STMT is not NULL, this is a case of multiple + copies, and we put the new vector statement in the first available + RELATED_STMT. */ + if (!STMT_VINFO_VEC_STMT (vinfo_for_stmt (next_stmt))) + STMT_VINFO_VEC_STMT (vinfo_for_stmt (next_stmt)) = new_stmt; + else + { + tree prev_stmt = STMT_VINFO_VEC_STMT (vinfo_for_stmt (next_stmt)); + tree rel_stmt = STMT_VINFO_RELATED_STMT ( + vinfo_for_stmt (prev_stmt)); + while (rel_stmt) + { + prev_stmt = rel_stmt; + rel_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (rel_stmt)); + } + STMT_VINFO_RELATED_STMT (vinfo_for_stmt (prev_stmt)) = new_stmt; + } + next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt)); + gap_count = 1; + /* If NEXT_STMT accesses the same DR as the previous statement, + put the same TMP_DATA_REF as its vectorized statement; otherwise + get the next data-ref from RESULT_CHAIN. */ + if (!next_stmt || !DR_GROUP_SAME_DR_STMT (vinfo_for_stmt (next_stmt))) + break; + } + } + return true; +} + + /* vectorizable_load. Check if STMT reads a non scalar data-ref (array/pointer/structure) that @@ -2566,22 +3124,25 @@ vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) stmt_vec_info prev_stmt_info; loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr; tree vectype = STMT_VINFO_VECTYPE (stmt_info); tree new_temp; int mode; - tree new_stmt; + tree new_stmt = NULL_TREE; tree dummy; enum dr_alignment_support alignment_support_cheme; tree dataref_ptr = NULL_TREE; tree ptr_incr; int nunits = TYPE_VECTOR_SUBPARTS (vectype); int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; - int j; + int i, j, group_size; tree msq = NULL_TREE, lsq; tree offset = NULL_TREE; tree realignment_token = NULL_TREE; tree phi_stmt = NULL_TREE; + VEC(tree,heap) *dr_chain = NULL; + bool strided_load = false; + tree first_stmt; /* Is vectorizable load? */ if (!STMT_VINFO_RELEVANT_P (stmt_info)) @@ -2605,7 +3166,9 @@ vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) return false; op = TREE_OPERAND (stmt, 1); - if (TREE_CODE (op) != ARRAY_REF && TREE_CODE (op) != INDIRECT_REF) + if (TREE_CODE (op) != ARRAY_REF + && TREE_CODE (op) != INDIRECT_REF + && !DR_GROUP_FIRST_DR (stmt_info)) return false; if (!STMT_VINFO_DATA_REF (stmt_info)) @@ -2622,6 +3185,16 @@ vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) return false; } + /* Check if the load is a part of an interleaving chain. */ + if (DR_GROUP_FIRST_DR (stmt_info)) + { + strided_load = true; + + /* Check if interleaving is supported. */ + if (!vect_strided_load_supported (vectype)) + return false; + } + if (!vec_stmt) /* transformation not required. */ { STMT_VINFO_TYPE (stmt_info) = load_vec_info_type; @@ -2633,9 +3206,30 @@ vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "transform load."); - alignment_support_cheme = vect_supportable_dr_alignment (dr); + if (strided_load) + { + first_stmt = DR_GROUP_FIRST_DR (stmt_info); + /* Check if the chain of loads is already vectorized. */ + if (STMT_VINFO_VEC_STMT (vinfo_for_stmt (first_stmt))) + { + *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info); + return true; + } + first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt)); + group_size = DR_GROUP_SIZE (vinfo_for_stmt (first_stmt)); + dr_chain = VEC_alloc (tree, heap, group_size); + } + else + { + first_stmt = stmt; + first_dr = dr; + group_size = 1; + } + + alignment_support_cheme = vect_supportable_dr_alignment (first_dr); gcc_assert (alignment_support_cheme); + /* In case the vectorization factor (VF) is bigger than the number of elements that we can fit in a vectype (nunits), we have to generate more than one vector stmt - i.e - we need to "unroll" the @@ -2671,6 +3265,39 @@ vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) information we recorded in RELATED_STMT field is used to vectorize stmt S2. */ + /* In case of interleaving (non-unit strided access): + + S1: x2 = &base + 2 + S2: x0 = &base + S3: x1 = &base + 1 + S4: x3 = &base + 3 + + Vectorized loads are created in the order of memory accesses + starting from the access of the first stmt of the chain: + + VS1: vx0 = &base + VS2: vx1 = &base + vec_size*1 + VS3: vx3 = &base + vec_size*2 + VS4: vx4 = &base + vec_size*3 + + Then permutation statements are generated: + + VS5: vx5 = VEC_EXTRACT_EVEN_EXPR < vx0, vx1 > + VS6: vx6 = VEC_EXTRACT_ODD_EXPR < vx0, vx1 > + ... + + And they are put in STMT_VINFO_VEC_STMT of the corresponding scalar stmts + (the order of the data-refs in the output of vect_permute_load_chain + corresponds to the order of scalar stmts in the interleaving chain - see + the documentaion of vect_permute_load_chain()). + The generation of permutation stmts and recording them in + STMT_VINFO_VEC_STMT is done in vect_transform_strided_load(). + + In case of both multiple types and interleaving, the vector loads and + permutation stmts above are created for every copy. The result vector stmts + are put in STMT_VINFO_VEC_STMT for the first copy and in the corresponding + STMT_VINFO_RELATED_STMT for the next copies. */ + /* If the data reference is aligned (dr_aligned) or potentially unaligned on a target that supports unaligned accesses (dr_unaligned_supported) we generate the following code: @@ -2698,12 +3325,11 @@ vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) vec_dest = realign_load (msq, lsq, realignment_token) indx = indx + 1; msq = lsq; - } - */ + } */ if (alignment_support_cheme == dr_unaligned_software_pipeline) { - msq = vect_setup_realignment (stmt, bsi, &realignment_token); + msq = vect_setup_realignment (first_stmt, bsi, &realignment_token); phi_stmt = SSA_NAME_DEF_STMT (msq); offset = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1); } @@ -2713,69 +3339,86 @@ vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) { /* 1. Create the vector pointer update chain. */ if (j == 0) - dataref_ptr = vect_create_data_ref_ptr (stmt, bsi, offset, + dataref_ptr = vect_create_data_ref_ptr (first_stmt, bsi, offset, &dummy, &ptr_incr, false); else dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt); - /* 2. Create the vector-load in the loop. */ - switch (alignment_support_cheme) - { - case dr_aligned: - gcc_assert (aligned_access_p (dr)); - data_ref = build_fold_indirect_ref (dataref_ptr); - break; - case dr_unaligned_supported: - { - int mis = DR_MISALIGNMENT (dr); - tree tmis = (mis == -1 ? size_zero_node : size_int (mis)); - - gcc_assert (!aligned_access_p (dr)); - tmis = size_binop (MULT_EXPR, tmis, size_int(BITS_PER_UNIT)); - data_ref = - build2 (MISALIGNED_INDIRECT_REF, vectype, dataref_ptr, tmis); - break; - } - case dr_unaligned_software_pipeline: - gcc_assert (!aligned_access_p (dr)); - data_ref = build1 (ALIGN_INDIRECT_REF, vectype, dataref_ptr); - break; - default: - gcc_unreachable (); - } - vec_dest = vect_create_destination_var (scalar_dest, vectype); - new_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, data_ref); - new_temp = make_ssa_name (vec_dest, new_stmt); - TREE_OPERAND (new_stmt, 0) = new_temp; - vect_finish_stmt_generation (stmt, new_stmt, bsi); - copy_virtual_operands (new_stmt, stmt); - mark_new_vars_to_rename (new_stmt); - - /* 3. Handle explicit realignment if necessary/supported. */ - if (alignment_support_cheme == dr_unaligned_software_pipeline) - { - /* Create in loop: - <vec_dest = realign_load (msq, lsq, realignment_token)> */ - lsq = TREE_OPERAND (new_stmt, 0); - if (!realignment_token) - realignment_token = dataref_ptr; - vec_dest = vect_create_destination_var (scalar_dest, vectype); - new_stmt = - build3 (REALIGN_LOAD_EXPR, vectype, msq, lsq, realignment_token); - new_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, new_stmt); - new_temp = make_ssa_name (vec_dest, new_stmt); - TREE_OPERAND (new_stmt, 0) = new_temp; - vect_finish_stmt_generation (stmt, new_stmt, bsi); - if (j == ncopies - 1) - add_phi_arg (phi_stmt, lsq, loop_latch_edge (loop)); - msq = lsq; - } + for (i = 0; i < group_size; i++) + { + /* 2. Create the vector-load in the loop. */ + switch (alignment_support_cheme) + { + case dr_aligned: + gcc_assert (aligned_access_p (first_dr)); + data_ref = build_fold_indirect_ref (dataref_ptr); + break; + case dr_unaligned_supported: + { + int mis = DR_MISALIGNMENT (first_dr); + tree tmis = (mis == -1 ? size_zero_node : size_int (mis)); + + gcc_assert (!aligned_access_p (first_dr)); + tmis = size_binop (MULT_EXPR, tmis, size_int(BITS_PER_UNIT)); + data_ref = + build2 (MISALIGNED_INDIRECT_REF, vectype, dataref_ptr, tmis); + break; + } + case dr_unaligned_software_pipeline: + gcc_assert (!aligned_access_p (first_dr)); + data_ref = build1 (ALIGN_INDIRECT_REF, vectype, dataref_ptr); + break; + default: + gcc_unreachable (); + } + vec_dest = vect_create_destination_var (scalar_dest, vectype); + new_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, data_ref); + new_temp = make_ssa_name (vec_dest, new_stmt); + TREE_OPERAND (new_stmt, 0) = new_temp; + vect_finish_stmt_generation (stmt, new_stmt, bsi); + copy_virtual_operands (new_stmt, stmt); + mark_new_vars_to_rename (new_stmt); + + /* 3. Handle explicit realignment if necessary/supported. */ + if (alignment_support_cheme == dr_unaligned_software_pipeline) + { + /* Create in loop: + <vec_dest = realign_load (msq, lsq, realignment_token)> */ + lsq = TREE_OPERAND (new_stmt, 0); + if (!realignment_token) + realignment_token = dataref_ptr; + vec_dest = vect_create_destination_var (scalar_dest, vectype); + new_stmt = + build3 (REALIGN_LOAD_EXPR, vectype, msq, lsq, realignment_token); + new_stmt = build2 (MODIFY_EXPR, vectype, vec_dest, new_stmt); + new_temp = make_ssa_name (vec_dest, new_stmt); + TREE_OPERAND (new_stmt, 0) = new_temp; + vect_finish_stmt_generation (stmt, new_stmt, bsi); + if (i == group_size - 1 && j == ncopies - 1) + add_phi_arg (phi_stmt, lsq, loop_latch_edge (loop)); + msq = lsq; + } + if (strided_load) + VEC_quick_push (tree, dr_chain, new_temp); + if (i < group_size - 1) + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt); + } - if (j == 0) - STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + if (strided_load) + { + if (!vect_transform_strided_load (stmt, dr_chain, group_size, bsi)) + return false; + *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info); + dr_chain = VEC_alloc (tree, heap, group_size); + } else - STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; - prev_stmt_info = vinfo_for_stmt (new_stmt); + { + if (j == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + } } return true; @@ -3011,7 +3654,7 @@ vectorizable_condition (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt) Create a vectorized stmt to replace STMT, and insert it at BSI. */ bool -vect_transform_stmt (tree stmt, block_stmt_iterator *bsi) +vect_transform_stmt (tree stmt, block_stmt_iterator *bsi, bool *strided_store) { bool is_store = false; tree vec_stmt = NULL_TREE; @@ -3051,7 +3694,18 @@ vect_transform_stmt (tree stmt, block_stmt_iterator *bsi) case store_vec_info_type: done = vectorizable_store (stmt, bsi, &vec_stmt); gcc_assert (done); - is_store = true; + if (DR_GROUP_FIRST_DR (stmt_info)) + { + /* In case of interleaving, the whole chain is vectorized when the + last store in the chain is reached. Store stmts before the last + one are skipped, and there vec_stmt_info shoudn't be freed + meanwhile. */ + *strided_store = true; + if (STMT_VINFO_VEC_STMT (stmt_info)) + is_store = true; + } + else + is_store = true; break; case condition_vec_info_type: @@ -3065,25 +3719,29 @@ vect_transform_stmt (tree stmt, block_stmt_iterator *bsi) gcc_unreachable (); } - gcc_assert (vec_stmt); - STMT_VINFO_VEC_STMT (stmt_info) = vec_stmt; - orig_stmt_in_pattern = STMT_VINFO_RELATED_STMT (stmt_info); - if (orig_stmt_in_pattern) - { - stmt_vec_info stmt_vinfo = vinfo_for_stmt (orig_stmt_in_pattern); - if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo)) - { - gcc_assert (STMT_VINFO_RELATED_STMT (stmt_vinfo) == stmt); - - /* STMT was inserted by the vectorizer to replace a computation - idiom. ORIG_STMT_IN_PATTERN is a stmt in the original - sequence that computed this idiom. We need to record a pointer - to VEC_STMT in the stmt_info of ORIG_STMT_IN_PATTERN. See more - detail in the documentation of vect_pattern_recog. */ - - STMT_VINFO_VEC_STMT (stmt_vinfo) = vec_stmt; - } - } + gcc_assert (vec_stmt || *strided_store); + if (vec_stmt) + { + STMT_VINFO_VEC_STMT (stmt_info) = vec_stmt; + orig_stmt_in_pattern = STMT_VINFO_RELATED_STMT (stmt_info); + if (orig_stmt_in_pattern) + { + stmt_vec_info stmt_vinfo = vinfo_for_stmt (orig_stmt_in_pattern); + if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo)) + { + gcc_assert (STMT_VINFO_RELATED_STMT (stmt_vinfo) == stmt); + + /* STMT was inserted by the vectorizer to replace a + computation idiom. ORIG_STMT_IN_PATTERN is a stmt in the + original sequence that computed this idiom. We need to + record a pointer to VEC_STMT in the stmt_info of + ORIG_STMT_IN_PATTERN. See more details in the + documentation of vect_pattern_recog. */ + + STMT_VINFO_VEC_STMT (stmt_vinfo) = vec_stmt; + } + } + } } if (STMT_VINFO_LIVE_P (stmt_info)) @@ -3485,7 +4143,14 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio, prolog_niters = min ( LOOP_NITERS , (VF - addr_mis/elem_size)&(VF-1) ) (elem_size = element type size; an element is the scalar element - whose type is the inner type of the vectype) */ + whose type is the inner type of the vectype) + + For interleaving, + + prolog_niters = min ( LOOP_NITERS , + (VF/group_size - addr_mis/elem_size)&(VF/group_size-1) ) + where group_size is the size of the interleaved group. +*/ static tree vect_gen_niters_for_prolog_loop (loop_vec_info loop_vinfo, tree loop_niters) @@ -3502,18 +4167,29 @@ vect_gen_niters_for_prolog_loop (loop_vec_info loop_vinfo, tree loop_niters) tree vectype = STMT_VINFO_VECTYPE (stmt_info); int vectype_align = TYPE_ALIGN (vectype) / BITS_PER_UNIT; tree niters_type = TREE_TYPE (loop_niters); + int group_size = 1; + int element_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr)))); + + if (DR_GROUP_FIRST_DR (stmt_info)) + { + /* For interleaved access element size must be multipled by the size of + the interleaved group. */ + group_size = DR_GROUP_SIZE (vinfo_for_stmt ( + DR_GROUP_FIRST_DR (stmt_info))); + element_size *= group_size; + } pe = loop_preheader_edge (loop); if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0) { int byte_misalign = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo); - int element_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr)))); int elem_misalign = byte_misalign / element_size; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "known alignment = %d.", byte_misalign); - iters = build_int_cst (niters_type, (vf - elem_misalign)&(vf-1)); + iters = build_int_cst (niters_type, + (vf - elem_misalign)&(vf/group_size-1)); } else { @@ -3806,6 +4482,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, int vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); bitmap_iterator bi; unsigned int j; + bool strided_store; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vec_transform_loop ==="); @@ -3936,17 +4613,53 @@ vect_transform_loop (loop_vec_info loop_vinfo, if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "transform statement."); - is_store = vect_transform_stmt (stmt, &si); - if (is_store) - { - /* Free the attached stmt_vec_info and remove the stmt. */ - stmt_ann_t ann = stmt_ann (stmt); - free (stmt_info); - set_stmt_info (ann, NULL); - bsi_remove (&si, true); - continue; + strided_store = false; + is_store = vect_transform_stmt (stmt, &si, &strided_store); + if (is_store) + { + stmt_ann_t ann; + if (DR_GROUP_FIRST_DR (stmt_info)) + { + /* Interleaving. If IS_STORE is TRUE, the vectorization of the + interleaving chain was completed - free all the stores in + the chain. */ + tree next = DR_GROUP_FIRST_DR (stmt_info); + tree tmp; + stmt_vec_info next_stmt_info; + + while (next) + { + next_stmt_info = vinfo_for_stmt (next); + /* Free the attached stmt_vec_info and remove the stmt. */ + ann = stmt_ann (next); + tmp = DR_GROUP_NEXT_DR (next_stmt_info); + free (next_stmt_info); + set_stmt_info (ann, NULL); + next = tmp; + } + bsi_remove (&si, true); + continue; + } + else + { + /* Free the attached stmt_vec_info and remove the stmt. */ + ann = stmt_ann (stmt); + free (stmt_info); + set_stmt_info (ann, NULL); + bsi_remove (&si, true); + continue; + } } - + else + { + if (strided_store) + { + /* This is case of skipped interleaved store. We don't free + its stmt_vec_info. */ + bsi_remove (&si, true); + continue; + } + } bsi_next (&si); } /* stmts in BB */ } /* BBs in loop */ diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 7fa5d668e24..b349c89cb7e 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -1370,6 +1370,12 @@ new_stmt_vec_info (tree stmt, loop_vec_info loop_vinfo) else STMT_VINFO_DEF_TYPE (res) = vect_loop_def; STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5); + DR_GROUP_FIRST_DR (res) = NULL_TREE; + DR_GROUP_NEXT_DR (res) = NULL_TREE; + DR_GROUP_SIZE (res) = 0; + DR_GROUP_STORE_COUNT (res) = 0; + DR_GROUP_GAP (res) = 0; + DR_GROUP_SAME_DR_STMT (res) = NULL_TREE; return res; } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 8341ad00a07..9d75133443f 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -235,21 +235,50 @@ typedef struct _stmt_vec_info { /* Classify the def of this stmt. */ enum vect_def_type def_type; + /* Interleaving info. */ + /* First data-ref in the interleaving group. */ + tree first_dr; + /* Pointer to the next data-ref in the group. */ + tree next_dr; + /* The size of the interleaving group. */ + unsigned int size; + /* For stores, number of stores from this group seen. We vectorize the last + one. */ + unsigned int store_count; + /* For loads only, the gap from the previous load. For consecutive loads, GAP + is 1. */ + unsigned int gap; + /* In case that two or more stmts share data-ref, this is the pointer to the + previously detected stmt with the same dr. */ + tree same_dr_stmt; } *stmt_vec_info; /* Access Functions. */ -#define STMT_VINFO_TYPE(S) (S)->type -#define STMT_VINFO_STMT(S) (S)->stmt -#define STMT_VINFO_LOOP_VINFO(S) (S)->loop_vinfo -#define STMT_VINFO_RELEVANT(S) (S)->relevant -#define STMT_VINFO_LIVE_P(S) (S)->live -#define STMT_VINFO_VECTYPE(S) (S)->vectype -#define STMT_VINFO_VEC_STMT(S) (S)->vectorized_stmt -#define STMT_VINFO_DATA_REF(S) (S)->data_ref_info -#define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p -#define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt -#define STMT_VINFO_SAME_ALIGN_REFS(S) (S)->same_align_refs -#define STMT_VINFO_DEF_TYPE(S) (S)->def_type +#define STMT_VINFO_TYPE(S) (S)->type +#define STMT_VINFO_STMT(S) (S)->stmt +#define STMT_VINFO_LOOP_VINFO(S) (S)->loop_vinfo +#define STMT_VINFO_RELEVANT(S) (S)->relevant +#define STMT_VINFO_LIVE_P(S) (S)->live +#define STMT_VINFO_VECTYPE(S) (S)->vectype +#define STMT_VINFO_VEC_STMT(S) (S)->vectorized_stmt +#define STMT_VINFO_DATA_REF(S) (S)->data_ref_info +#define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p +#define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt +#define STMT_VINFO_SAME_ALIGN_REFS(S) (S)->same_align_refs +#define STMT_VINFO_DEF_TYPE(S) (S)->def_type +#define STMT_VINFO_DR_GROUP_FIRST_DR(S) (S)->first_dr +#define STMT_VINFO_DR_GROUP_NEXT_DR(S) (S)->next_dr +#define STMT_VINFO_DR_GROUP_SIZE(S) (S)->size +#define STMT_VINFO_DR_GROUP_STORE_COUNT(S) (S)->store_count +#define STMT_VINFO_DR_GROUP_GAP(S) (S)->gap +#define STMT_VINFO_DR_GROUP_SAME_DR_STMT(S)(S)->same_dr_stmt + +#define DR_GROUP_FIRST_DR(S) (S)->first_dr +#define DR_GROUP_NEXT_DR(S) (S)->next_dr +#define DR_GROUP_SIZE(S) (S)->size +#define DR_GROUP_STORE_COUNT(S) (S)->store_count +#define DR_GROUP_GAP(S) (S)->gap +#define DR_GROUP_SAME_DR_STMT(S) (S)->same_dr_stmt #define STMT_VINFO_RELEVANT_P(S) ((S)->relevant != vect_unused_in_loop) diff --git a/gcc/tree.def b/gcc/tree.def index 3c4068849e5..d5df9f26125 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1088,6 +1088,14 @@ DEFTREECODE (VEC_UNPACK_LO_EXPR, "vec_unpack_lo_expr", tcc_unary, 1) DEFTREECODE (VEC_PACK_MOD_EXPR, "vec_pack_mod_expr", tcc_binary, 2) DEFTREECODE (VEC_PACK_SAT_EXPR, "vec_pack_sat_expr", tcc_binary, 2) +/* Extract even/odd fields from vectors. */ +DEFTREECODE (VEC_EXTRACT_EVEN_EXPR, "vec_extracteven_expr", tcc_binary, 2) +DEFTREECODE (VEC_EXTRACT_ODD_EXPR, "vec_extractodd_expr", tcc_binary, 2) + +/* Merge input vectors interleaving their fields. */ +DEFTREECODE (VEC_INTERLEAVE_HIGH_EXPR, "vec_interleavehigh_expr", tcc_binary, 2) +DEFTREECODE (VEC_INTERLEAVE_LOW_EXPR, "vec_interleavelow_expr", tcc_binary, 2) + /* Local variables: mode:c |